GELL: Automatic Extraction of Epidemiological Line Lists from Open Sources

Saurav Ghosh, Prithwish Chakraborty, B. Lewis, M. Majumder, E. Cohn, J. Brownstein, M. Marathe, Naren Ramakrishnan
{"title":"GELL: Automatic Extraction of Epidemiological Line Lists from Open Sources","authors":"Saurav Ghosh, Prithwish Chakraborty, B. Lewis, M. Majumder, E. Cohn, J. Brownstein, M. Marathe, Naren Ramakrishnan","doi":"10.1145/3097983.3098073","DOIUrl":null,"url":null,"abstract":"Real-time monitoring and responses to emerging public health threats rely on the availability of timely surveillance data. During the early stages of an epidemic, the ready availability of line lists with detailed tabular information about laboratory-confirmed cases can assist epidemiologists in making reliable inferences and forecasts. Such inferences are crucial to understand the epidemiology of a specific disease early enough to stop or control the outbreak. However, construction of such line lists requires considerable human supervision and therefore, difficult to generate in real-time. In this paper, we motivate Guided Epidemiological Line List (GELL), the first tool for building automated line lists (in near real-time) from open source reports of emerging disease outbreaks. Specifically, we focus on deriving epidemiological characteristics of an emerging disease and the affected population from reports of illness. GELL uses distributed vector representations (ala word2vec) to discover a set of indicators for each line list feature. This discovery of indicators is followed by the use of dependency parsing based techniques for final extraction in tabular form. We evaluate the performance of GELL against a human annotated line list provided by HealthMap corresponding to MERS outbreaks in Saudi Arabia. We demonstrate that GELL extracts line list features with increased accuracy compared to a baseline method. We further show how these automatically extracted line list features can be used for making epidemiological inferences, such as inferring demographics and symptoms-to-hospitalization period of affected individuals.","PeriodicalId":314049,"journal":{"name":"Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3097983.3098073","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

Abstract

Real-time monitoring and responses to emerging public health threats rely on the availability of timely surveillance data. During the early stages of an epidemic, the ready availability of line lists with detailed tabular information about laboratory-confirmed cases can assist epidemiologists in making reliable inferences and forecasts. Such inferences are crucial to understand the epidemiology of a specific disease early enough to stop or control the outbreak. However, construction of such line lists requires considerable human supervision and therefore, difficult to generate in real-time. In this paper, we motivate Guided Epidemiological Line List (GELL), the first tool for building automated line lists (in near real-time) from open source reports of emerging disease outbreaks. Specifically, we focus on deriving epidemiological characteristics of an emerging disease and the affected population from reports of illness. GELL uses distributed vector representations (ala word2vec) to discover a set of indicators for each line list feature. This discovery of indicators is followed by the use of dependency parsing based techniques for final extraction in tabular form. We evaluate the performance of GELL against a human annotated line list provided by HealthMap corresponding to MERS outbreaks in Saudi Arabia. We demonstrate that GELL extracts line list features with increased accuracy compared to a baseline method. We further show how these automatically extracted line list features can be used for making epidemiological inferences, such as inferring demographics and symptoms-to-hospitalization period of affected individuals.
GELL:从开放资源中自动提取流行病学线列表
实时监测和应对新出现的公共卫生威胁取决于能否获得及时的监测数据。在流行病的早期阶段,现成的带有实验室确诊病例详细表格信息的清单可帮助流行病学家作出可靠的推断和预测。这样的推断对于及早了解特定疾病的流行病学以阻止或控制疫情至关重要。然而,这种行列表的构建需要大量的人工监督,因此难以实时生成。在本文中,我们激发了引导流行病学线列表(GELL),这是第一个从新出现的疾病暴发的开源报告中构建自动化线列表(近乎实时)的工具。具体而言,我们侧重于从疾病报告中得出新出现疾病和受影响人群的流行病学特征。GELL使用分布式向量表示(类似于word2vec)为每个行列表特征发现一组指标。在发现指示器之后,使用基于依赖项解析的技术以表格形式进行最终提取。我们根据HealthMap提供的与沙特阿拉伯中东呼吸综合征爆发相对应的人类注释线列表评估了GELL的性能。我们证明,与基线方法相比,GELL提取行列表特征的准确性更高。我们进一步展示了如何使用这些自动提取的线列表特征进行流行病学推断,例如推断受影响个体的人口统计学和症状到住院时间。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信