{"title":"IKDSumm: Incorporating key-phrases into BERT for extractive disaster tweet summarization","authors":"Piyush Kumar Garg , Roshni Chakraborty , Srishti Gupta , Sourav Kumar Dandapat","doi":"10.1016/j.csl.2024.101649","DOIUrl":null,"url":null,"abstract":"<div><p>Online social media platforms, such as Twitter, are one of the most valuable sources of information during disaster events. Humanitarian organizations, government agencies, and volunteers rely on a concise compilation of such information for effective disaster management. Existing methods to make such compilations are mostly generic summarization approaches that do not exploit domain knowledge. In this paper, we propose a disaster-specific tweet summarization framework, <em>IKDSumm</em>, which initially identifies the crucial and important information from each tweet related to a disaster through key-phrases of that tweet. We identify these key-phrases by utilizing the domain knowledge (using existing ontology) of disasters without any human intervention. Further, we utilize these key-phrases to automatically generate a summary of the tweets. Therefore, given tweets related to a disaster, <em>IKDSumm</em> ensures fulfillment of the summarization key objectives, such as information coverage, relevance, and diversity in summary without any human intervention. We evaluate the performance of <em>IKDSumm</em> with 8 state-of-the-art techniques on 12 disaster datasets. The evaluation results show that <em>IKDSumm</em> outperforms existing techniques by approximately <span><math><mrow><mn>2</mn><mo>−</mo><mn>79</mn><mtext>%</mtext></mrow></math></span> in terms of ROUGE-N F1-score.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"87 ","pages":"Article 101649"},"PeriodicalIF":3.1000,"publicationDate":"2024-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Speech and Language","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0885230824000329","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Online social media platforms, such as Twitter, are one of the most valuable sources of information during disaster events. Humanitarian organizations, government agencies, and volunteers rely on a concise compilation of such information for effective disaster management. Existing methods to make such compilations are mostly generic summarization approaches that do not exploit domain knowledge. In this paper, we propose a disaster-specific tweet summarization framework, IKDSumm, which initially identifies the crucial and important information from each tweet related to a disaster through key-phrases of that tweet. We identify these key-phrases by utilizing the domain knowledge (using existing ontology) of disasters without any human intervention. Further, we utilize these key-phrases to automatically generate a summary of the tweets. Therefore, given tweets related to a disaster, IKDSumm ensures fulfillment of the summarization key objectives, such as information coverage, relevance, and diversity in summary without any human intervention. We evaluate the performance of IKDSumm with 8 state-of-the-art techniques on 12 disaster datasets. The evaluation results show that IKDSumm outperforms existing techniques by approximately in terms of ROUGE-N F1-score.
期刊介绍:
Computer Speech & Language publishes reports of original research related to the recognition, understanding, production, coding and mining of speech and language.
The speech and language sciences have a long history, but it is only relatively recently that large-scale implementation of and experimentation with complex models of speech and language processing has become feasible. Such research is often carried out somewhat separately by practitioners of artificial intelligence, computer science, electronic engineering, information retrieval, linguistics, phonetics, or psychology.