基于机器和深度学习的生物医学拉曼光谱（RS）分类算法比较研究：基于 RS 的病原微生物识别案例研究。

IF 1.8 4区化学 Q3 CHEMISTRY, ANALYTICAL

Analytical Sciences Pub Date : 2024-08-29 DOI:10.1007/s44211-024-00645-0

Sisi Guo, Ruoyu Zhang, Tao Wang, Jianfeng Wang

{"title":"基于机器和深度学习的生物医学拉曼光谱（RS）分类算法比较研究：基于 RS 的病原微生物识别案例研究。","authors":"Sisi Guo, Ruoyu Zhang, Tao Wang, Jianfeng Wang","doi":"10.1007/s44211-024-00645-0","DOIUrl":null,"url":null,"abstract":"<div><p>One key aspect pushing the frontiers of biomedical RS is dedicated machine- or deep- learning (ML or DL) algorithms. Yet, systematic comparative study between ML and DL algorithms has not been conducted for biomedical RS, largely due to the limited availability of open-source and large Raman spectra dataset. Therefore we compared typical ML partial least square-discriminant analysis (PLS-DA) and DL one dimensional convolution neural network (1D-CNN) based pathogenic microbe identification on 12,000 Raman spectra from six species of microbe (i.e., <i>K. aerogenes (Klebsiella aerogenes)</i>, <i>C. albicans (Candida albicans)</i>, <i>C. glabrata (Candida glabrata)</i>, <i>Group A Strep.</i> (<i>Group A Streptococcus)</i>, <i>E. coli1 (Escherichia coli1)</i>, <i>E. coli2 (Escherichia coli2)</i>) when 100%, 75%, 50% and 25% of the 12,000 Raman spectra were retained. The total Raman dataset was analyzed with 80% split for training and 20% for testing. The 100% retained testing dataset accuracy, area under curve (AUC) of the receiver operating characteristic (ROC) curve were 95.25% and 0.997 for 1D-CNN, which are higher than those (89.42% and 0.979) of PLS-DA. Yet, PLS-DA outperforms 1D-CNN for 75%, 50% and 25% retained testing dataset. The resultant accuracies and AUCs demonstrated the performance reliance of PLS-DA and 1D-CNN on Raman spectra number. Besides, both loadings on the latent variables of PLS-DA and the saliency maps of 1D-CNN largely captured Raman peaks arising from DNA and proteins with comparable interpretability. The results of the current work indicated that both ML and DL algorithms should be explored for application-wise Raman spectra identification to select whichever with higher accuracies and AUCs.</p><h3>Graphical abstract</h3>\n<div><figure><div><div><picture><source><img></source></picture></div></div></figure></div></div>","PeriodicalId":7802,"journal":{"name":"Analytical Sciences","volume":"40 12","pages":"2101 - 2109"},"PeriodicalIF":1.8000,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comparative study of machine-and deep-learning based classification algorithms for biomedical Raman spectroscopy (RS): case study of RS based pathogenic microbe identification\",\"authors\":\"Sisi Guo, Ruoyu Zhang, Tao Wang, Jianfeng Wang\",\"doi\":\"10.1007/s44211-024-00645-0\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>One key aspect pushing the frontiers of biomedical RS is dedicated machine- or deep- learning (ML or DL) algorithms. Yet, systematic comparative study between ML and DL algorithms has not been conducted for biomedical RS, largely due to the limited availability of open-source and large Raman spectra dataset. Therefore we compared typical ML partial least square-discriminant analysis (PLS-DA) and DL one dimensional convolution neural network (1D-CNN) based pathogenic microbe identification on 12,000 Raman spectra from six species of microbe (i.e., <i>K. aerogenes (Klebsiella aerogenes)</i>, <i>C. albicans (Candida albicans)</i>, <i>C. glabrata (Candida glabrata)</i>, <i>Group A Strep.</i> (<i>Group A Streptococcus)</i>, <i>E. coli1 (Escherichia coli1)</i>, <i>E. coli2 (Escherichia coli2)</i>) when 100%, 75%, 50% and 25% of the 12,000 Raman spectra were retained. The total Raman dataset was analyzed with 80% split for training and 20% for testing. The 100% retained testing dataset accuracy, area under curve (AUC) of the receiver operating characteristic (ROC) curve were 95.25% and 0.997 for 1D-CNN, which are higher than those (89.42% and 0.979) of PLS-DA. Yet, PLS-DA outperforms 1D-CNN for 75%, 50% and 25% retained testing dataset. The resultant accuracies and AUCs demonstrated the performance reliance of PLS-DA and 1D-CNN on Raman spectra number. Besides, both loadings on the latent variables of PLS-DA and the saliency maps of 1D-CNN largely captured Raman peaks arising from DNA and proteins with comparable interpretability. The results of the current work indicated that both ML and DL algorithms should be explored for application-wise Raman spectra identification to select whichever with higher accuracies and AUCs.</p><h3>Graphical abstract</h3>\\n<div><figure><div><div><picture><source><img></source></picture></div></div></figure></div></div>\",\"PeriodicalId\":7802,\"journal\":{\"name\":\"Analytical Sciences\",\"volume\":\"40 12\",\"pages\":\"2101 - 2109\"},\"PeriodicalIF\":1.8000,\"publicationDate\":\"2024-08-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Analytical Sciences\",\"FirstCategoryId\":\"92\",\"ListUrlMain\":\"https://link.springer.com/article/10.1007/s44211-024-00645-0\",\"RegionNum\":4,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"CHEMISTRY, ANALYTICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Analytical Sciences","FirstCategoryId":"92","ListUrlMain":"https://link.springer.com/article/10.1007/s44211-024-00645-0","RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"CHEMISTRY, ANALYTICAL","Score":null,"Total":0}

引用次数: 0

摘要

推动生物医学 RS 前沿发展的一个关键方面是专用机器学习或深度学习（ML 或 DL）算法。然而，由于开源的大型拉曼光谱数据集有限，尚未对生物医学 RS 的 ML 算法和 DL 算法进行系统的比较研究。因此，我们比较了基于 ML 偏最小二乘法判别分析（PLS-DA）和 DL 一维卷积神经网络（1D-CNN）的典型病原微生物识别方法，对来自六种微生物（即、A组链球菌）、大肠杆菌1（Escherichia coli1）、大肠杆菌2（Escherichia coli2））的 12,000 拉曼光谱进行病原微生物鉴定。对拉曼数据集进行了分析，其中 80% 用于训练，20% 用于测试。1D-CNN 100%保留测试数据集的准确率和接收者操作特征曲线（ROC）下面积（AUC）分别为 95.25% 和 0.997，高于 PLS-DA 的准确率和接收者操作特征曲线下面积（AUC）（89.42% 和 0.979）。然而，在保留 75%、50% 和 25% 的测试数据集中，PLS-DA 的结果优于 1D-CNN 的结果。由此得出的精确度和 AUC 值表明，PLS-DA 和 1D-CNN 的性能依赖于拉曼光谱数。此外，PLS-DA 的潜在变量负载和 1D-CNN 的显著性图谱在很大程度上都捕捉到了 DNA 和蛋白质产生的拉曼峰，其可解释性相当。本次研究的结果表明，在拉曼光谱的应用识别中，应同时探索 ML 算法和 DL 算法，以选择准确率和 AUC 较高的算法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Comparative study of machine-and deep-learning based classification algorithms for biomedical Raman spectroscopy (RS): case study of RS based pathogenic microbe identification

One key aspect pushing the frontiers of biomedical RS is dedicated machine- or deep- learning (ML or DL) algorithms. Yet, systematic comparative study between ML and DL algorithms has not been conducted for biomedical RS, largely due to the limited availability of open-source and large Raman spectra dataset. Therefore we compared typical ML partial least square-discriminant analysis (PLS-DA) and DL one dimensional convolution neural network (1D-CNN) based pathogenic microbe identification on 12,000 Raman spectra from six species of microbe (i.e., K. aerogenes (Klebsiella aerogenes), C. albicans (Candida albicans), C. glabrata (Candida glabrata), Group A Strep. (Group A Streptococcus), E. coli1 (Escherichia coli1), E. coli2 (Escherichia coli2)) when 100%, 75%, 50% and 25% of the 12,000 Raman spectra were retained. The total Raman dataset was analyzed with 80% split for training and 20% for testing. The 100% retained testing dataset accuracy, area under curve (AUC) of the receiver operating characteristic (ROC) curve were 95.25% and 0.997 for 1D-CNN, which are higher than those (89.42% and 0.979) of PLS-DA. Yet, PLS-DA outperforms 1D-CNN for 75%, 50% and 25% retained testing dataset. The resultant accuracies and AUCs demonstrated the performance reliance of PLS-DA and 1D-CNN on Raman spectra number. Besides, both loadings on the latent variables of PLS-DA and the saliency maps of 1D-CNN largely captured Raman peaks arising from DNA and proteins with comparable interpretability. The results of the current work indicated that both ML and DL algorithms should be explored for application-wise Raman spectra identification to select whichever with higher accuracies and AUCs.

Graphical abstract

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Analytical Sciences 化学-分析化学

CiteScore

2.90

自引率

18.80%

发文量

232

审稿时长

1 months

期刊介绍： Analytical Sciences is an international journal published monthly by The Japan Society for Analytical Chemistry. The journal publishes papers on all aspects of the theory and practice of analytical sciences, including fundamental and applied, inorganic and organic, wet chemical and instrumental methods. This publication is supported in part by the Grant-in-Aid for Publication of Scientific Research Result of the Japanese Ministry of Education, Culture, Sports, Science and Technology.