{"title":"链接分析以发现关于COVID-19的结构化和非结构化数据的见解","authors":"Ying Zhao, Charles C. Zhou","doi":"10.1145/3388440.3415990","DOIUrl":null,"url":null,"abstract":"SARS-CoV-2, the deadly and novel virus, which has caused a worldwide pandemic and drastic loss of human lives and economic activities. An open data set called the COVID-19 Open Research Dataset or CORD-19 contains large set full text scientific literature on SARS-CoV-2. The Next Strain consists of a database of SARS-CoV-2 viral genomes from 12/3/2019. We applied unique information mining method named lexical link analysis (LLA) to answer the call to action and help the science community answer high-priority scientific questions related to SARS-CoV-2. We first text-mined the CORD-19. We also data-mined the next strain database. Finally, we linked two databases: The linked databases and information can be used to discover the insights and help the research community to address high-priority questions related to the SARS-CoV-2's genetics, tests, and prevention. For example, we showed the clusters of COVID-19 cases that are consistent in terms of clinical symptoms (unstructured text descriptions from CORD-19) and genomics data (structured data from Next Strain). The genomics difference of clades A1 and A2 as shown in the Next Strain mapping may be the causes for clinical symptoms difference that are also grouped into two using the LLA method: Cases in the west coast in the United States are similar to the ones in Asia, while the more contagious and virulent ones in the east Coast in the United States are similar to ones in Europe.","PeriodicalId":411338,"journal":{"name":"Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Link Analysis to Discover Insights from Structured and Unstructured Data on COVID-19\",\"authors\":\"Ying Zhao, Charles C. Zhou\",\"doi\":\"10.1145/3388440.3415990\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"SARS-CoV-2, the deadly and novel virus, which has caused a worldwide pandemic and drastic loss of human lives and economic activities. An open data set called the COVID-19 Open Research Dataset or CORD-19 contains large set full text scientific literature on SARS-CoV-2. The Next Strain consists of a database of SARS-CoV-2 viral genomes from 12/3/2019. We applied unique information mining method named lexical link analysis (LLA) to answer the call to action and help the science community answer high-priority scientific questions related to SARS-CoV-2. We first text-mined the CORD-19. We also data-mined the next strain database. Finally, we linked two databases: The linked databases and information can be used to discover the insights and help the research community to address high-priority questions related to the SARS-CoV-2's genetics, tests, and prevention. For example, we showed the clusters of COVID-19 cases that are consistent in terms of clinical symptoms (unstructured text descriptions from CORD-19) and genomics data (structured data from Next Strain). The genomics difference of clades A1 and A2 as shown in the Next Strain mapping may be the causes for clinical symptoms difference that are also grouped into two using the LLA method: Cases in the west coast in the United States are similar to the ones in Asia, while the more contagious and virulent ones in the east Coast in the United States are similar to ones in Europe.\",\"PeriodicalId\":411338,\"journal\":{\"name\":\"Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-09-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3388440.3415990\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3388440.3415990","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Link Analysis to Discover Insights from Structured and Unstructured Data on COVID-19
SARS-CoV-2, the deadly and novel virus, which has caused a worldwide pandemic and drastic loss of human lives and economic activities. An open data set called the COVID-19 Open Research Dataset or CORD-19 contains large set full text scientific literature on SARS-CoV-2. The Next Strain consists of a database of SARS-CoV-2 viral genomes from 12/3/2019. We applied unique information mining method named lexical link analysis (LLA) to answer the call to action and help the science community answer high-priority scientific questions related to SARS-CoV-2. We first text-mined the CORD-19. We also data-mined the next strain database. Finally, we linked two databases: The linked databases and information can be used to discover the insights and help the research community to address high-priority questions related to the SARS-CoV-2's genetics, tests, and prevention. For example, we showed the clusters of COVID-19 cases that are consistent in terms of clinical symptoms (unstructured text descriptions from CORD-19) and genomics data (structured data from Next Strain). The genomics difference of clades A1 and A2 as shown in the Next Strain mapping may be the causes for clinical symptoms difference that are also grouped into two using the LLA method: Cases in the west coast in the United States are similar to the ones in Asia, while the more contagious and virulent ones in the east Coast in the United States are similar to ones in Europe.