利用众包和文本挖掘加强社交媒体的信息提取:处理泰国 COVID-19 供应请求的案例研究

IF 1.8 4区 管理学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS
Prapaporn Rattanatamrong, Yutthana Boonpalit, Manassanan Boonnavasin
{"title":"利用众包和文本挖掘加强社交媒体的信息提取:处理泰国 COVID-19 供应请求的案例研究","authors":"Prapaporn Rattanatamrong, Yutthana Boonpalit, Manassanan Boonnavasin","doi":"10.1177/01655515231220164","DOIUrl":null,"url":null,"abstract":"Social media platforms are critical for disaster communication and relief efforts. Rapid and precise social media post analysis is required for effective disaster response. This article presents a comprehensive study of a framework that combines crowdsourcing and text mining techniques to enhance data extraction from social media. The research focuses on a particular case study of COVID-19 pandemic medical supply request, which shows several key findings. First, the incorporation of domain-specific data during the training of named entity recognition (NER) models is essential for accurately identifying and retrieving important entities, such as the names of medical supplies and hospitals. Second, the implementation of a hybrid system leads to improvement in the extraction of information from social media posts. Finally, the involvement of crowdsourcing is found to be significant in the validation, verification, and filtering of disorganised information within the hybrid system. Our performance analysis demonstrates that the use of hybrid models has the potential to significantly improve the extraction of supply names (by up to 37%) and hospital names (by up to 66%), especially in the absence of a comprehensive vocabulary or specially trained NER models. During the COVID-19 supply shortage in Thailand, volunteers utilised hybrid models to expedite the identification of the necessary information. Experiment results demonstrated significant improvement in the accuracy of extracted data, the ability to acquire relevant information in real-time, the capacity to handle a substantial number of posts and the practical benefit of the proposed framework.","PeriodicalId":54796,"journal":{"name":"Journal of Information Science","volume":"4 1","pages":""},"PeriodicalIF":1.8000,"publicationDate":"2024-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Utilising crowdsourcing and text mining to enhance information extraction from social media: A case study in handling COVID-19 supply requests in Thailand\",\"authors\":\"Prapaporn Rattanatamrong, Yutthana Boonpalit, Manassanan Boonnavasin\",\"doi\":\"10.1177/01655515231220164\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Social media platforms are critical for disaster communication and relief efforts. Rapid and precise social media post analysis is required for effective disaster response. This article presents a comprehensive study of a framework that combines crowdsourcing and text mining techniques to enhance data extraction from social media. The research focuses on a particular case study of COVID-19 pandemic medical supply request, which shows several key findings. First, the incorporation of domain-specific data during the training of named entity recognition (NER) models is essential for accurately identifying and retrieving important entities, such as the names of medical supplies and hospitals. Second, the implementation of a hybrid system leads to improvement in the extraction of information from social media posts. Finally, the involvement of crowdsourcing is found to be significant in the validation, verification, and filtering of disorganised information within the hybrid system. Our performance analysis demonstrates that the use of hybrid models has the potential to significantly improve the extraction of supply names (by up to 37%) and hospital names (by up to 66%), especially in the absence of a comprehensive vocabulary or specially trained NER models. During the COVID-19 supply shortage in Thailand, volunteers utilised hybrid models to expedite the identification of the necessary information. Experiment results demonstrated significant improvement in the accuracy of extracted data, the ability to acquire relevant information in real-time, the capacity to handle a substantial number of posts and the practical benefit of the proposed framework.\",\"PeriodicalId\":54796,\"journal\":{\"name\":\"Journal of Information Science\",\"volume\":\"4 1\",\"pages\":\"\"},\"PeriodicalIF\":1.8000,\"publicationDate\":\"2024-01-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Information Science\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1177/01655515231220164\",\"RegionNum\":4,\"RegionCategory\":\"管理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Information Science","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1177/01655515231220164","RegionNum":4,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

摘要

社交媒体平台对于灾难沟通和救援工作至关重要。要有效地应对灾难,就必须对社交媒体帖子进行快速而精确的分析。本文全面研究了一个结合众包和文本挖掘技术的框架,以加强从社交媒体中提取数据的能力。研究重点关注 COVID-19 大流行病医疗供应请求这一特殊案例,并得出了几项重要发现。首先,在训练命名实体识别(NER)模型时纳入特定领域的数据对于准确识别和检索重要实体(如医疗用品和医院名称)至关重要。其次,混合系统的实施改进了从社交媒体帖子中提取信息的工作。最后,在混合系统中,众包的参与在验证、核实和过滤杂乱信息方面发挥了重要作用。我们的性能分析表明,使用混合模型有可能显著提高供应品名称(最高提高 37%)和医院名称(最高提高 66%)的提取率,尤其是在缺乏综合词汇或经过专门训练的 NER 模型的情况下。在泰国 COVID-19 供应短缺期间,志愿者利用混合模型加快了必要信息的识别。实验结果表明,所提取数据的准确性、实时获取相关信息的能力、处理大量帖子的能力以及所建议框架的实用性都得到了显著提高。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Utilising crowdsourcing and text mining to enhance information extraction from social media: A case study in handling COVID-19 supply requests in Thailand
Social media platforms are critical for disaster communication and relief efforts. Rapid and precise social media post analysis is required for effective disaster response. This article presents a comprehensive study of a framework that combines crowdsourcing and text mining techniques to enhance data extraction from social media. The research focuses on a particular case study of COVID-19 pandemic medical supply request, which shows several key findings. First, the incorporation of domain-specific data during the training of named entity recognition (NER) models is essential for accurately identifying and retrieving important entities, such as the names of medical supplies and hospitals. Second, the implementation of a hybrid system leads to improvement in the extraction of information from social media posts. Finally, the involvement of crowdsourcing is found to be significant in the validation, verification, and filtering of disorganised information within the hybrid system. Our performance analysis demonstrates that the use of hybrid models has the potential to significantly improve the extraction of supply names (by up to 37%) and hospital names (by up to 66%), especially in the absence of a comprehensive vocabulary or specially trained NER models. During the COVID-19 supply shortage in Thailand, volunteers utilised hybrid models to expedite the identification of the necessary information. Experiment results demonstrated significant improvement in the accuracy of extracted data, the ability to acquire relevant information in real-time, the capacity to handle a substantial number of posts and the practical benefit of the proposed framework.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of Information Science
Journal of Information Science 工程技术-计算机:信息系统
CiteScore
6.80
自引率
8.30%
发文量
121
审稿时长
4 months
期刊介绍: The Journal of Information Science is a peer-reviewed international journal of high repute covering topics of interest to all those researching and working in the sciences of information and knowledge management. The Editors welcome material on any aspect of information science theory, policy, application or practice that will advance thinking in the field.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信