Disaster-news datasets for multi-label document classification, sentence classification, and abstractive document summarization tasks

Sumanta Banerjee, S. Mukherjee, Sivaji Bandyopadhyay
{"title":"Disaster-news datasets for multi-label document classification, sentence classification, and abstractive document summarization tasks","authors":"Sumanta Banerjee, S. Mukherjee, Sivaji Bandyopadhyay","doi":"10.1109/WiSPNET57748.2023.10134469","DOIUrl":null,"url":null,"abstract":"Mining of disaster-news articles can deliver highly useful information for the authorities in critical decision making and also for awareness dissemination among people, at disaster situations. A set of disaster-news articles containing more than seven thousand and six hundred news documents on COVID-19, storm, flood, heavy rain, cloudburst, landslide, earthquake, and tsunami is put forward for interested researchers to explore. It includes more than three thousand news articles only on COVID-19 considering it a disaster event. It also includes more than 4.5 thousand articles on natural disaster events prevalent in India. A dataset has been prepared for the sentence classification task from the COVID-19 articles. An abstractive summarization dataset has also been prepared for the task of automatic generation of a suitable headline for a disaster-news article. It is done by considering the articles and their titles as text and summaries. Another dataset has been prepared by combining COVID-19 and the natural disaster articles for three tasks; first, identification of the events in an article, second and third, identification of the sentences containing disaster-location and disaster-impact information respectively. The Precision, Recall, F-measure, and Accuracy scores after applying the Random Forest classifier on both datasets are presented in this paper that show impressive results.","PeriodicalId":150576,"journal":{"name":"2023 International Conference on Wireless Communications Signal Processing and Networking (WiSPNET)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 International Conference on Wireless Communications Signal Processing and Networking (WiSPNET)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WiSPNET57748.2023.10134469","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Mining of disaster-news articles can deliver highly useful information for the authorities in critical decision making and also for awareness dissemination among people, at disaster situations. A set of disaster-news articles containing more than seven thousand and six hundred news documents on COVID-19, storm, flood, heavy rain, cloudburst, landslide, earthquake, and tsunami is put forward for interested researchers to explore. It includes more than three thousand news articles only on COVID-19 considering it a disaster event. It also includes more than 4.5 thousand articles on natural disaster events prevalent in India. A dataset has been prepared for the sentence classification task from the COVID-19 articles. An abstractive summarization dataset has also been prepared for the task of automatic generation of a suitable headline for a disaster-news article. It is done by considering the articles and their titles as text and summaries. Another dataset has been prepared by combining COVID-19 and the natural disaster articles for three tasks; first, identification of the events in an article, second and third, identification of the sentences containing disaster-location and disaster-impact information respectively. The Precision, Recall, F-measure, and Accuracy scores after applying the Random Forest classifier on both datasets are presented in this paper that show impressive results.
用于多标签文档分类、句子分类和抽象文档摘要任务的灾难新闻数据集
灾害新闻文章的挖掘可以为当局在关键决策中提供非常有用的信息,也可以在灾害情况下向人们传播认识。一套包含超过七千六百份关于新冠肺炎、风暴、洪水、暴雨、暴雨、山体滑坡、地震和海啸的新闻文件的灾难新闻文章,供感兴趣的研究人员探索。它收录了3000多篇将新冠肺炎视为灾难事件的新闻报道。它还包括4500多篇关于印度普遍存在的自然灾害事件的文章。已经为COVID-19文章的句子分类任务准备了一个数据集。还准备了一个抽象摘要数据集,用于为灾难新闻文章自动生成合适的标题。它是通过将文章及其标题视为文本和摘要来完成的。将COVID-19和自然灾害文章结合起来,为三个任务准备了另一个数据集;首先,识别文章中的事件,其次,识别包含灾害位置和灾害影响信息的句子。本文给出了在两个数据集上应用随机森林分类器后的Precision, Recall, F-measure和Accuracy分数,显示出令人印象深刻的结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信