Comparative study of machine-and deep-learning based classification algorithms for biomedical Raman spectroscopy (RS): case study of RS based pathogenic microbe identification
{"title":"Comparative study of machine-and deep-learning based classification algorithms for biomedical Raman spectroscopy (RS): case study of RS based pathogenic microbe identification","authors":"Sisi Guo, Ruoyu Zhang, Tao Wang, Jianfeng Wang","doi":"10.1007/s44211-024-00645-0","DOIUrl":null,"url":null,"abstract":"<div><p>One key aspect pushing the frontiers of biomedical RS is dedicated machine- or deep- learning (ML or DL) algorithms. Yet, systematic comparative study between ML and DL algorithms has not been conducted for biomedical RS, largely due to the limited availability of open-source and large Raman spectra dataset. Therefore we compared typical ML partial least square-discriminant analysis (PLS-DA) and DL one dimensional convolution neural network (1D-CNN) based pathogenic microbe identification on 12,000 Raman spectra from six species of microbe (i.e., <i>K. aerogenes (Klebsiella aerogenes)</i>, <i>C. albicans (Candida albicans)</i>, <i>C. glabrata (Candida glabrata)</i>, <i>Group A Strep.</i> (<i>Group A Streptococcus)</i>, <i>E. coli1 (Escherichia coli1)</i>, <i>E. coli2 (Escherichia coli2)</i>) when 100%, 75%, 50% and 25% of the 12,000 Raman spectra were retained. The total Raman dataset was analyzed with 80% split for training and 20% for testing. The 100% retained testing dataset accuracy, area under curve (AUC) of the receiver operating characteristic (ROC) curve were 95.25% and 0.997 for 1D-CNN, which are higher than those (89.42% and 0.979) of PLS-DA. Yet, PLS-DA outperforms 1D-CNN for 75%, 50% and 25% retained testing dataset. The resultant accuracies and AUCs demonstrated the performance reliance of PLS-DA and 1D-CNN on Raman spectra number. Besides, both loadings on the latent variables of PLS-DA and the saliency maps of 1D-CNN largely captured Raman peaks arising from DNA and proteins with comparable interpretability. The results of the current work indicated that both ML and DL algorithms should be explored for application-wise Raman spectra identification to select whichever with higher accuracies and AUCs.</p><h3>Graphical abstract</h3>\n<div><figure><div><div><picture><source><img></source></picture></div></div></figure></div></div>","PeriodicalId":7802,"journal":{"name":"Analytical Sciences","volume":"40 12","pages":"2101 - 2109"},"PeriodicalIF":1.8000,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Analytical Sciences","FirstCategoryId":"92","ListUrlMain":"https://link.springer.com/article/10.1007/s44211-024-00645-0","RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"CHEMISTRY, ANALYTICAL","Score":null,"Total":0}
引用次数: 0
Abstract
One key aspect pushing the frontiers of biomedical RS is dedicated machine- or deep- learning (ML or DL) algorithms. Yet, systematic comparative study between ML and DL algorithms has not been conducted for biomedical RS, largely due to the limited availability of open-source and large Raman spectra dataset. Therefore we compared typical ML partial least square-discriminant analysis (PLS-DA) and DL one dimensional convolution neural network (1D-CNN) based pathogenic microbe identification on 12,000 Raman spectra from six species of microbe (i.e., K. aerogenes (Klebsiella aerogenes), C. albicans (Candida albicans), C. glabrata (Candida glabrata), Group A Strep. (Group A Streptococcus), E. coli1 (Escherichia coli1), E. coli2 (Escherichia coli2)) when 100%, 75%, 50% and 25% of the 12,000 Raman spectra were retained. The total Raman dataset was analyzed with 80% split for training and 20% for testing. The 100% retained testing dataset accuracy, area under curve (AUC) of the receiver operating characteristic (ROC) curve were 95.25% and 0.997 for 1D-CNN, which are higher than those (89.42% and 0.979) of PLS-DA. Yet, PLS-DA outperforms 1D-CNN for 75%, 50% and 25% retained testing dataset. The resultant accuracies and AUCs demonstrated the performance reliance of PLS-DA and 1D-CNN on Raman spectra number. Besides, both loadings on the latent variables of PLS-DA and the saliency maps of 1D-CNN largely captured Raman peaks arising from DNA and proteins with comparable interpretability. The results of the current work indicated that both ML and DL algorithms should be explored for application-wise Raman spectra identification to select whichever with higher accuracies and AUCs.
期刊介绍:
Analytical Sciences is an international journal published monthly by The Japan Society for Analytical Chemistry. The journal publishes papers on all aspects of the theory and practice of analytical sciences, including fundamental and applied, inorganic and organic, wet chemical and instrumental methods.
This publication is supported in part by the Grant-in-Aid for Publication of Scientific Research Result of the Japanese Ministry of Education, Culture, Sports, Science and Technology.