{"title":"Development of fecal microbial diagnostic marker sets of colorectal cancer using natural language processing method.","authors":"Houcong Liu, Changpu Song, Jidong Wang, Zhufang Chen, Xiaohong Zhang, Hekai Zhou, Linhong Yao, Dan Chen, Wenhao Gu, Rui-Kun Huang, Bing-Kun Huang, Bo-Wei Han, Jihui Du","doi":"10.1177/03936155231210881","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Cancer screening and early detection greatly increase the chances of successful treatment. However, most cancer types lack effective early screening biomarkers. In recent years, natural language processing (NLP)-based text-mining methods have proven effective in searching the scientific literature and identifying promising associations between potential biomarkers and disease, but unfortunately few are widely used.</p><p><strong>Methods: </strong>In this study, we used an NLP-enabled text-mining system, MarkerGenie, to identify potential stool bacterial markers for early detection and screening of colorectal cancer. After filtering markers based on text-mining results, we validated bacterial markers using multiplex digital droplet polymerase chain reaction (ddPCR). Classifiers were built based on ddPCR results, and sensitivity, specificity, and area under the curve (AUC) were used to evaluate the performance.</p><p><strong>Results: </strong>A total of 7 of the 14 bacterial markers showed significantly increased abundance in the stools of colorectal cancer patients. A five-bacteria classifier for colorectal cancer diagnosis was built, and achieved an AUC of 0.852, with a sensitivity of 0.692 and specificity of 0.935. When combined with the fecal immunochemical test (FIT), our classifier achieved an AUC of 0.959 and increased the sensitivity of FIT (0.929 vs. 0.872) at a specificity of 0.900.</p><p><strong>Conclusions: </strong>Our study provides a valuable case example of the use of NLP-based marker mining for biomarker identification.</p>","PeriodicalId":50334,"journal":{"name":"International Journal of Biological Markers","volume":" ","pages":"31-39"},"PeriodicalIF":2.3000,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Biological Markers","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1177/03936155231210881","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/12/21 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"BIOTECHNOLOGY & APPLIED MICROBIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Cancer screening and early detection greatly increase the chances of successful treatment. However, most cancer types lack effective early screening biomarkers. In recent years, natural language processing (NLP)-based text-mining methods have proven effective in searching the scientific literature and identifying promising associations between potential biomarkers and disease, but unfortunately few are widely used.
Methods: In this study, we used an NLP-enabled text-mining system, MarkerGenie, to identify potential stool bacterial markers for early detection and screening of colorectal cancer. After filtering markers based on text-mining results, we validated bacterial markers using multiplex digital droplet polymerase chain reaction (ddPCR). Classifiers were built based on ddPCR results, and sensitivity, specificity, and area under the curve (AUC) were used to evaluate the performance.
Results: A total of 7 of the 14 bacterial markers showed significantly increased abundance in the stools of colorectal cancer patients. A five-bacteria classifier for colorectal cancer diagnosis was built, and achieved an AUC of 0.852, with a sensitivity of 0.692 and specificity of 0.935. When combined with the fecal immunochemical test (FIT), our classifier achieved an AUC of 0.959 and increased the sensitivity of FIT (0.929 vs. 0.872) at a specificity of 0.900.
Conclusions: Our study provides a valuable case example of the use of NLP-based marker mining for biomarker identification.
期刊介绍:
IJBM is an international, online only, peer-reviewed Journal, which publishes original research and critical reviews primarily focused on cancer biomarkers. IJBM targets advanced topics regarding the application of biomarkers in oncology and is dedicated to solid tumors in adult subjects. The clinical scenarios of interests are screening and early diagnosis of cancer, prognostic assessment, prediction of the response to and monitoring of treatment.