链接分析以发现关于COVID-19的结构化和非结构化数据的见解

Ying Zhao, Charles C. Zhou
{"title":"链接分析以发现关于COVID-19的结构化和非结构化数据的见解","authors":"Ying Zhao, Charles C. Zhou","doi":"10.1145/3388440.3415990","DOIUrl":null,"url":null,"abstract":"SARS-CoV-2, the deadly and novel virus, which has caused a worldwide pandemic and drastic loss of human lives and economic activities. An open data set called the COVID-19 Open Research Dataset or CORD-19 contains large set full text scientific literature on SARS-CoV-2. The Next Strain consists of a database of SARS-CoV-2 viral genomes from 12/3/2019. We applied unique information mining method named lexical link analysis (LLA) to answer the call to action and help the science community answer high-priority scientific questions related to SARS-CoV-2. We first text-mined the CORD-19. We also data-mined the next strain database. Finally, we linked two databases: The linked databases and information can be used to discover the insights and help the research community to address high-priority questions related to the SARS-CoV-2's genetics, tests, and prevention. For example, we showed the clusters of COVID-19 cases that are consistent in terms of clinical symptoms (unstructured text descriptions from CORD-19) and genomics data (structured data from Next Strain). The genomics difference of clades A1 and A2 as shown in the Next Strain mapping may be the causes for clinical symptoms difference that are also grouped into two using the LLA method: Cases in the west coast in the United States are similar to the ones in Asia, while the more contagious and virulent ones in the east Coast in the United States are similar to ones in Europe.","PeriodicalId":411338,"journal":{"name":"Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Link Analysis to Discover Insights from Structured and Unstructured Data on COVID-19\",\"authors\":\"Ying Zhao, Charles C. Zhou\",\"doi\":\"10.1145/3388440.3415990\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"SARS-CoV-2, the deadly and novel virus, which has caused a worldwide pandemic and drastic loss of human lives and economic activities. An open data set called the COVID-19 Open Research Dataset or CORD-19 contains large set full text scientific literature on SARS-CoV-2. The Next Strain consists of a database of SARS-CoV-2 viral genomes from 12/3/2019. We applied unique information mining method named lexical link analysis (LLA) to answer the call to action and help the science community answer high-priority scientific questions related to SARS-CoV-2. We first text-mined the CORD-19. We also data-mined the next strain database. Finally, we linked two databases: The linked databases and information can be used to discover the insights and help the research community to address high-priority questions related to the SARS-CoV-2's genetics, tests, and prevention. For example, we showed the clusters of COVID-19 cases that are consistent in terms of clinical symptoms (unstructured text descriptions from CORD-19) and genomics data (structured data from Next Strain). The genomics difference of clades A1 and A2 as shown in the Next Strain mapping may be the causes for clinical symptoms difference that are also grouped into two using the LLA method: Cases in the west coast in the United States are similar to the ones in Asia, while the more contagious and virulent ones in the east Coast in the United States are similar to ones in Europe.\",\"PeriodicalId\":411338,\"journal\":{\"name\":\"Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-09-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3388440.3415990\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3388440.3415990","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

SARS-CoV-2是一种致命的新型病毒,在全球范围内造成了大流行,并给人类生命和经济活动造成了巨大损失。一个名为COVID-19开放研究数据集(CORD-19)的开放数据集包含大量关于SARS-CoV-2的全文科学文献。下一个菌株由2019年3月12日的SARS-CoV-2病毒基因组数据库组成。我们采用独特的信息挖掘方法——词汇链接分析(LLA),响应行动呼吁,帮助科学界回答与新冠肺炎相关的高优先级科学问题。我们首先对CORD-19进行文本挖掘。我们还挖掘了下一个菌株数据库。最后,我们链接了两个数据库:链接的数据库和信息可用于发现见解,并帮助研究界解决与SARS-CoV-2的遗传、检测和预防相关的高优先级问题。例如,我们展示了在临床症状(来自CORD-19的非结构化文本描述)和基因组学数据(来自Next Strain的结构化数据)方面一致的COVID-19病例聚集性。Next Strain图谱中显示的A1和A2分支的基因组学差异可能是导致临床症状差异的原因,使用LLA方法也将其分为两类:美国西海岸的病例与亚洲的病例相似,而美国东海岸更具传染性和毒性的病例与欧洲的病例相似。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Link Analysis to Discover Insights from Structured and Unstructured Data on COVID-19
SARS-CoV-2, the deadly and novel virus, which has caused a worldwide pandemic and drastic loss of human lives and economic activities. An open data set called the COVID-19 Open Research Dataset or CORD-19 contains large set full text scientific literature on SARS-CoV-2. The Next Strain consists of a database of SARS-CoV-2 viral genomes from 12/3/2019. We applied unique information mining method named lexical link analysis (LLA) to answer the call to action and help the science community answer high-priority scientific questions related to SARS-CoV-2. We first text-mined the CORD-19. We also data-mined the next strain database. Finally, we linked two databases: The linked databases and information can be used to discover the insights and help the research community to address high-priority questions related to the SARS-CoV-2's genetics, tests, and prevention. For example, we showed the clusters of COVID-19 cases that are consistent in terms of clinical symptoms (unstructured text descriptions from CORD-19) and genomics data (structured data from Next Strain). The genomics difference of clades A1 and A2 as shown in the Next Strain mapping may be the causes for clinical symptoms difference that are also grouped into two using the LLA method: Cases in the west coast in the United States are similar to the ones in Asia, while the more contagious and virulent ones in the east Coast in the United States are similar to ones in Europe.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信