Sumanta Banerjee, S. Mukherjee, Sivaji Bandyopadhyay
{"title":"Disaster-news datasets for multi-label document classification, sentence classification, and abstractive document summarization tasks","authors":"Sumanta Banerjee, S. Mukherjee, Sivaji Bandyopadhyay","doi":"10.1109/WiSPNET57748.2023.10134469","DOIUrl":null,"url":null,"abstract":"Mining of disaster-news articles can deliver highly useful information for the authorities in critical decision making and also for awareness dissemination among people, at disaster situations. A set of disaster-news articles containing more than seven thousand and six hundred news documents on COVID-19, storm, flood, heavy rain, cloudburst, landslide, earthquake, and tsunami is put forward for interested researchers to explore. It includes more than three thousand news articles only on COVID-19 considering it a disaster event. It also includes more than 4.5 thousand articles on natural disaster events prevalent in India. A dataset has been prepared for the sentence classification task from the COVID-19 articles. An abstractive summarization dataset has also been prepared for the task of automatic generation of a suitable headline for a disaster-news article. It is done by considering the articles and their titles as text and summaries. Another dataset has been prepared by combining COVID-19 and the natural disaster articles for three tasks; first, identification of the events in an article, second and third, identification of the sentences containing disaster-location and disaster-impact information respectively. The Precision, Recall, F-measure, and Accuracy scores after applying the Random Forest classifier on both datasets are presented in this paper that show impressive results.","PeriodicalId":150576,"journal":{"name":"2023 International Conference on Wireless Communications Signal Processing and Networking (WiSPNET)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 International Conference on Wireless Communications Signal Processing and Networking (WiSPNET)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WiSPNET57748.2023.10134469","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Mining of disaster-news articles can deliver highly useful information for the authorities in critical decision making and also for awareness dissemination among people, at disaster situations. A set of disaster-news articles containing more than seven thousand and six hundred news documents on COVID-19, storm, flood, heavy rain, cloudburst, landslide, earthquake, and tsunami is put forward for interested researchers to explore. It includes more than three thousand news articles only on COVID-19 considering it a disaster event. It also includes more than 4.5 thousand articles on natural disaster events prevalent in India. A dataset has been prepared for the sentence classification task from the COVID-19 articles. An abstractive summarization dataset has also been prepared for the task of automatic generation of a suitable headline for a disaster-news article. It is done by considering the articles and their titles as text and summaries. Another dataset has been prepared by combining COVID-19 and the natural disaster articles for three tasks; first, identification of the events in an article, second and third, identification of the sentences containing disaster-location and disaster-impact information respectively. The Precision, Recall, F-measure, and Accuracy scores after applying the Random Forest classifier on both datasets are presented in this paper that show impressive results.