Utilising crowdsourcing and text mining to enhance information extraction from social media: A case study in handling COVID-19 supply requests in Thailand

IF 1.7 4区管理学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of Information Science Pub Date : 2024-01-06 DOI:10.1177/01655515231220164

Prapaporn Rattanatamrong, Yutthana Boonpalit, Manassanan Boonnavasin

{"title":"Utilising crowdsourcing and text mining to enhance information extraction from social media: A case study in handling COVID-19 supply requests in Thailand","authors":"Prapaporn Rattanatamrong, Yutthana Boonpalit, Manassanan Boonnavasin","doi":"10.1177/01655515231220164","DOIUrl":null,"url":null,"abstract":"Social media platforms are critical for disaster communication and relief efforts. Rapid and precise social media post analysis is required for effective disaster response. This article presents a comprehensive study of a framework that combines crowdsourcing and text mining techniques to enhance data extraction from social media. The research focuses on a particular case study of COVID-19 pandemic medical supply request, which shows several key findings. First, the incorporation of domain-specific data during the training of named entity recognition (NER) models is essential for accurately identifying and retrieving important entities, such as the names of medical supplies and hospitals. Second, the implementation of a hybrid system leads to improvement in the extraction of information from social media posts. Finally, the involvement of crowdsourcing is found to be significant in the validation, verification, and filtering of disorganised information within the hybrid system. Our performance analysis demonstrates that the use of hybrid models has the potential to significantly improve the extraction of supply names (by up to 37%) and hospital names (by up to 66%), especially in the absence of a comprehensive vocabulary or specially trained NER models. During the COVID-19 supply shortage in Thailand, volunteers utilised hybrid models to expedite the identification of the necessary information. Experiment results demonstrated significant improvement in the accuracy of extracted data, the ability to acquire relevant information in real-time, the capacity to handle a substantial number of posts and the practical benefit of the proposed framework.","PeriodicalId":54796,"journal":{"name":"Journal of Information Science","volume":"4 1","pages":""},"PeriodicalIF":1.7000,"publicationDate":"2024-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Information Science","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1177/01655515231220164","RegionNum":4,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Social media platforms are critical for disaster communication and relief efforts. Rapid and precise social media post analysis is required for effective disaster response. This article presents a comprehensive study of a framework that combines crowdsourcing and text mining techniques to enhance data extraction from social media. The research focuses on a particular case study of COVID-19 pandemic medical supply request, which shows several key findings. First, the incorporation of domain-specific data during the training of named entity recognition (NER) models is essential for accurately identifying and retrieving important entities, such as the names of medical supplies and hospitals. Second, the implementation of a hybrid system leads to improvement in the extraction of information from social media posts. Finally, the involvement of crowdsourcing is found to be significant in the validation, verification, and filtering of disorganised information within the hybrid system. Our performance analysis demonstrates that the use of hybrid models has the potential to significantly improve the extraction of supply names (by up to 37%) and hospital names (by up to 66%), especially in the absence of a comprehensive vocabulary or specially trained NER models. During the COVID-19 supply shortage in Thailand, volunteers utilised hybrid models to expedite the identification of the necessary information. Experiment results demonstrated significant improvement in the accuracy of extracted data, the ability to acquire relevant information in real-time, the capacity to handle a substantial number of posts and the practical benefit of the proposed framework.

查看原文本刊更多论文

利用众包和文本挖掘加强社交媒体的信息提取：处理泰国 COVID-19 供应请求的案例研究

社交媒体平台对于灾难沟通和救援工作至关重要。要有效地应对灾难，就必须对社交媒体帖子进行快速而精确的分析。本文全面研究了一个结合众包和文本挖掘技术的框架，以加强从社交媒体中提取数据的能力。研究重点关注 COVID-19 大流行病医疗供应请求这一特殊案例，并得出了几项重要发现。首先，在训练命名实体识别（NER）模型时纳入特定领域的数据对于准确识别和检索重要实体（如医疗用品和医院名称）至关重要。其次，混合系统的实施改进了从社交媒体帖子中提取信息的工作。最后，在混合系统中，众包的参与在验证、核实和过滤杂乱信息方面发挥了重要作用。我们的性能分析表明，使用混合模型有可能显著提高供应品名称（最高提高 37%）和医院名称（最高提高 66%）的提取率，尤其是在缺乏综合词汇或经过专门训练的 NER 模型的情况下。在泰国 COVID-19 供应短缺期间，志愿者利用混合模型加快了必要信息的识别。实验结果表明，所提取数据的准确性、实时获取相关信息的能力、处理大量帖子的能力以及所建议框架的实用性都得到了显著提高。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Information Science 工程技术-计算机：信息系统

CiteScore

6.80

自引率

8.30%

发文量

121

审稿时长

4 months

期刊介绍： The Journal of Information Science is a peer-reviewed international journal of high repute covering topics of interest to all those researching and working in the sciences of information and knowledge management. The Editors welcome material on any aspect of information science theory, policy, application or practice that will advance thinking in the field.