Machine Learning based Classification of Online News Data for Disaster Management

L. Gopal, R. Prabha, Divya Pullarkatt, M. Ramesh
{"title":"Machine Learning based Classification of Online News Data for Disaster Management","authors":"L. Gopal, R. Prabha, Divya Pullarkatt, M. Ramesh","doi":"10.1109/GHTC46280.2020.9342921","DOIUrl":null,"url":null,"abstract":"The exponential escalation of disaster loss in our country has led to the awareness that disaster risks are presumably increasing. As per statistics, India has confronted 371 natural hazards over the past few decades and severe casualties, infrastructural, agricultural and economic damages were recorded [1]. Credible and real time data such as news content are accessible liberally in legitimate websites and its analysis may provide assistance in administering hazard emergencies, preparedness and relief efficiently. On this grounds, a data scraping approach is proposed to gather hazard relevant news stories from the web by building a crawler software and incorporate machine learning approaches to filter out insightful information. The developed crawler software visits news reporting web pages and extracts news stories related to hazards. News illustrations are often unstructured as it includes less newsworthy content such as author’s opinions, interview responses and past studies. Hence, a supervised learning based text classification is performed to classify newsworthy content from news articles and approximately 70 percent accuracy was achieved.","PeriodicalId":314837,"journal":{"name":"2020 IEEE Global Humanitarian Technology Conference (GHTC)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE Global Humanitarian Technology Conference (GHTC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/GHTC46280.2020.9342921","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

Abstract

The exponential escalation of disaster loss in our country has led to the awareness that disaster risks are presumably increasing. As per statistics, India has confronted 371 natural hazards over the past few decades and severe casualties, infrastructural, agricultural and economic damages were recorded [1]. Credible and real time data such as news content are accessible liberally in legitimate websites and its analysis may provide assistance in administering hazard emergencies, preparedness and relief efficiently. On this grounds, a data scraping approach is proposed to gather hazard relevant news stories from the web by building a crawler software and incorporate machine learning approaches to filter out insightful information. The developed crawler software visits news reporting web pages and extracts news stories related to hazards. News illustrations are often unstructured as it includes less newsworthy content such as author’s opinions, interview responses and past studies. Hence, a supervised learning based text classification is performed to classify newsworthy content from news articles and approximately 70 percent accuracy was achieved.
基于机器学习的灾害管理在线新闻数据分类
我国的灾害损失呈指数级上升,使人们意识到灾害风险可能正在增加。据统计,在过去几十年里,印度共遭遇371次自然灾害,造成了严重的人员伤亡、基础设施、农业和经济损失[1]。可信和实时的数据,如新闻内容,可在合法网站上自由获取,其分析可有助于有效地管理灾害紧急情况、防备和救济。基于此,提出了一种数据抓取方法,通过构建爬虫软件从网络上收集危险相关的新闻故事,并结合机器学习方法过滤出有洞察力的信息。开发的爬虫软件访问新闻报道网页,提取与危险有关的新闻故事。新闻插图通常是非结构化的,因为它包括不太有新闻价值的内容,如作者的观点、采访回答和过去的研究。因此,执行基于监督学习的文本分类,从新闻文章中对有新闻价值的内容进行分类,准确率达到约70%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信