{"title":"基于索引模型和公共数据库的文本挖掘技术的应用","authors":"Xiao Fu","doi":"10.1109/ICETCI53161.2021.9563523","DOIUrl":null,"url":null,"abstract":"To explore associated clinical tests with pancreatic cancer and determine most relevant publications. In this analysis study, an indexing model is used to retrieve literature from PubMed from 2002 to 2017 associated with pancreatic cancer. We implement experiments on 6466 publications associated with pancreatic cancer risk searched from PubMed from 2002 to 2017. The number of testing terms and genes used in this paper is 3880 and 375. Clinical tests are searched from http://www.mayomedicallaboratories.com which are constantly updated by Mayo Medical Laboratories and 375 genes are produced by incorporating four gene-disease databases, including OMIM, Orphanet, ClinVar and GWAS Catalog which may be expanded in the future. This study integrates literature, databases, clinical information and interpretation for clinical tests and statistical methods. We find associated clinical-test terms with pancreatic cancer risk using an indexing model and rank documents on our knowledge-based language model. 21 clinical-test terms involved with 186 publications and 106 genes involved with 732 documents are found after retrieving 6466 publications. 15 documents which both genes and clinical-test terms appear in (PubMed ID: 25481712(KRAS, Secretin), 25058882(KRAS, Cholecystokinin), 26764183(Whole-Exome Sequencing, APC, MLH1, MSH6, POLE, TP53, KRAS), etc.) are ranked according to our Knowledge-based Language Model (KLM), which integrates additional knowledge of genes and the language generation process into original language model.","PeriodicalId":170858,"journal":{"name":"2021 IEEE International Conference on Electronic Technology, Communication and Information (ICETCI)","volume":"81 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Application of Text Mining Technologies Based on Indexing Model & Public Databases\",\"authors\":\"Xiao Fu\",\"doi\":\"10.1109/ICETCI53161.2021.9563523\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"To explore associated clinical tests with pancreatic cancer and determine most relevant publications. In this analysis study, an indexing model is used to retrieve literature from PubMed from 2002 to 2017 associated with pancreatic cancer. We implement experiments on 6466 publications associated with pancreatic cancer risk searched from PubMed from 2002 to 2017. The number of testing terms and genes used in this paper is 3880 and 375. Clinical tests are searched from http://www.mayomedicallaboratories.com which are constantly updated by Mayo Medical Laboratories and 375 genes are produced by incorporating four gene-disease databases, including OMIM, Orphanet, ClinVar and GWAS Catalog which may be expanded in the future. This study integrates literature, databases, clinical information and interpretation for clinical tests and statistical methods. We find associated clinical-test terms with pancreatic cancer risk using an indexing model and rank documents on our knowledge-based language model. 21 clinical-test terms involved with 186 publications and 106 genes involved with 732 documents are found after retrieving 6466 publications. 15 documents which both genes and clinical-test terms appear in (PubMed ID: 25481712(KRAS, Secretin), 25058882(KRAS, Cholecystokinin), 26764183(Whole-Exome Sequencing, APC, MLH1, MSH6, POLE, TP53, KRAS), etc.) are ranked according to our Knowledge-based Language Model (KLM), which integrates additional knowledge of genes and the language generation process into original language model.\",\"PeriodicalId\":170858,\"journal\":{\"name\":\"2021 IEEE International Conference on Electronic Technology, Communication and Information (ICETCI)\",\"volume\":\"81 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-08-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE International Conference on Electronic Technology, Communication and Information (ICETCI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICETCI53161.2021.9563523\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Conference on Electronic Technology, Communication and Information (ICETCI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICETCI53161.2021.9563523","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Application of Text Mining Technologies Based on Indexing Model & Public Databases
To explore associated clinical tests with pancreatic cancer and determine most relevant publications. In this analysis study, an indexing model is used to retrieve literature from PubMed from 2002 to 2017 associated with pancreatic cancer. We implement experiments on 6466 publications associated with pancreatic cancer risk searched from PubMed from 2002 to 2017. The number of testing terms and genes used in this paper is 3880 and 375. Clinical tests are searched from http://www.mayomedicallaboratories.com which are constantly updated by Mayo Medical Laboratories and 375 genes are produced by incorporating four gene-disease databases, including OMIM, Orphanet, ClinVar and GWAS Catalog which may be expanded in the future. This study integrates literature, databases, clinical information and interpretation for clinical tests and statistical methods. We find associated clinical-test terms with pancreatic cancer risk using an indexing model and rank documents on our knowledge-based language model. 21 clinical-test terms involved with 186 publications and 106 genes involved with 732 documents are found after retrieving 6466 publications. 15 documents which both genes and clinical-test terms appear in (PubMed ID: 25481712(KRAS, Secretin), 25058882(KRAS, Cholecystokinin), 26764183(Whole-Exome Sequencing, APC, MLH1, MSH6, POLE, TP53, KRAS), etc.) are ranked according to our Knowledge-based Language Model (KLM), which integrates additional knowledge of genes and the language generation process into original language model.