Comparative study of machine-and deep-learning based classification algorithms for biomedical Raman spectroscopy (RS): case study of RS based pathogenic microbe identification

IF 1.8 4区 化学 Q3 CHEMISTRY, ANALYTICAL
Sisi Guo, Ruoyu Zhang, Tao Wang, Jianfeng Wang
{"title":"Comparative study of machine-and deep-learning based classification algorithms for biomedical Raman spectroscopy (RS): case study of RS based pathogenic microbe identification","authors":"Sisi Guo,&nbsp;Ruoyu Zhang,&nbsp;Tao Wang,&nbsp;Jianfeng Wang","doi":"10.1007/s44211-024-00645-0","DOIUrl":null,"url":null,"abstract":"<div><p>One key aspect pushing the frontiers of biomedical RS is dedicated machine- or deep- learning (ML or DL) algorithms. Yet, systematic comparative study between ML and DL algorithms has not been conducted for biomedical RS, largely due to the limited availability of open-source and large Raman spectra dataset. Therefore we compared typical ML partial least square-discriminant analysis (PLS-DA) and DL one dimensional convolution neural network (1D-CNN) based pathogenic microbe identification on 12,000 Raman spectra from six species of microbe (i.e., <i>K. aerogenes (Klebsiella aerogenes)</i>, <i>C. albicans (Candida albicans)</i>, <i>C. glabrata (Candida glabrata)</i>, <i>Group A Strep.</i> (<i>Group A Streptococcus)</i>, <i>E. coli1 (Escherichia coli1)</i>, <i>E. coli2 (Escherichia coli2)</i>) when 100%, 75%, 50% and 25% of the 12,000 Raman spectra were retained. The total Raman dataset was analyzed with 80% split for training and 20% for testing. The 100% retained testing dataset accuracy, area under curve (AUC) of the receiver operating characteristic (ROC) curve were 95.25% and 0.997 for 1D-CNN, which are higher than those (89.42% and 0.979) of PLS-DA. Yet, PLS-DA outperforms 1D-CNN for 75%, 50% and 25% retained testing dataset. The resultant accuracies and AUCs demonstrated the performance reliance of PLS-DA and 1D-CNN on Raman spectra number. Besides, both loadings on the latent variables of PLS-DA and the saliency maps of 1D-CNN largely captured Raman peaks arising from DNA and proteins with comparable interpretability. The results of the current work indicated that both ML and DL algorithms should be explored for application-wise Raman spectra identification to select whichever with higher accuracies and AUCs.</p><h3>Graphical abstract</h3>\n<div><figure><div><div><picture><source><img></source></picture></div></div></figure></div></div>","PeriodicalId":7802,"journal":{"name":"Analytical Sciences","volume":"40 12","pages":"2101 - 2109"},"PeriodicalIF":1.8000,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Analytical Sciences","FirstCategoryId":"92","ListUrlMain":"https://link.springer.com/article/10.1007/s44211-024-00645-0","RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"CHEMISTRY, ANALYTICAL","Score":null,"Total":0}
引用次数: 0

Abstract

One key aspect pushing the frontiers of biomedical RS is dedicated machine- or deep- learning (ML or DL) algorithms. Yet, systematic comparative study between ML and DL algorithms has not been conducted for biomedical RS, largely due to the limited availability of open-source and large Raman spectra dataset. Therefore we compared typical ML partial least square-discriminant analysis (PLS-DA) and DL one dimensional convolution neural network (1D-CNN) based pathogenic microbe identification on 12,000 Raman spectra from six species of microbe (i.e., K. aerogenes (Klebsiella aerogenes), C. albicans (Candida albicans), C. glabrata (Candida glabrata), Group A Strep. (Group A Streptococcus), E. coli1 (Escherichia coli1), E. coli2 (Escherichia coli2)) when 100%, 75%, 50% and 25% of the 12,000 Raman spectra were retained. The total Raman dataset was analyzed with 80% split for training and 20% for testing. The 100% retained testing dataset accuracy, area under curve (AUC) of the receiver operating characteristic (ROC) curve were 95.25% and 0.997 for 1D-CNN, which are higher than those (89.42% and 0.979) of PLS-DA. Yet, PLS-DA outperforms 1D-CNN for 75%, 50% and 25% retained testing dataset. The resultant accuracies and AUCs demonstrated the performance reliance of PLS-DA and 1D-CNN on Raman spectra number. Besides, both loadings on the latent variables of PLS-DA and the saliency maps of 1D-CNN largely captured Raman peaks arising from DNA and proteins with comparable interpretability. The results of the current work indicated that both ML and DL algorithms should be explored for application-wise Raman spectra identification to select whichever with higher accuracies and AUCs.

Graphical abstract

基于机器和深度学习的生物医学拉曼光谱(RS)分类算法比较研究:基于 RS 的病原微生物识别案例研究。
推动生物医学 RS 前沿发展的一个关键方面是专用机器学习或深度学习(ML 或 DL)算法。然而,由于开源的大型拉曼光谱数据集有限,尚未对生物医学 RS 的 ML 算法和 DL 算法进行系统的比较研究。因此,我们比较了基于 ML 偏最小二乘法判别分析(PLS-DA)和 DL 一维卷积神经网络(1D-CNN)的典型病原微生物识别方法,对来自六种微生物(即、A组链球菌)、大肠杆菌1(Escherichia coli1)、大肠杆菌2(Escherichia coli2))的 12,000 拉曼光谱进行病原微生物鉴定。对拉曼数据集进行了分析,其中 80% 用于训练,20% 用于测试。1D-CNN 100%保留测试数据集的准确率和接收者操作特征曲线(ROC)下面积(AUC)分别为 95.25% 和 0.997,高于 PLS-DA 的准确率和接收者操作特征曲线下面积(AUC)(89.42% 和 0.979)。然而,在保留 75%、50% 和 25% 的测试数据集中,PLS-DA 的结果优于 1D-CNN 的结果。由此得出的精确度和 AUC 值表明,PLS-DA 和 1D-CNN 的性能依赖于拉曼光谱数。此外,PLS-DA 的潜在变量负载和 1D-CNN 的显著性图谱在很大程度上都捕捉到了 DNA 和蛋白质产生的拉曼峰,其可解释性相当。本次研究的结果表明,在拉曼光谱的应用识别中,应同时探索 ML 算法和 DL 算法,以选择准确率和 AUC 较高的算法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Analytical Sciences
Analytical Sciences 化学-分析化学
CiteScore
2.90
自引率
18.80%
发文量
232
审稿时长
1 months
期刊介绍: Analytical Sciences is an international journal published monthly by The Japan Society for Analytical Chemistry. The journal publishes papers on all aspects of the theory and practice of analytical sciences, including fundamental and applied, inorganic and organic, wet chemical and instrumental methods. This publication is supported in part by the Grant-in-Aid for Publication of Scientific Research Result of the Japanese Ministry of Education, Culture, Sports, Science and Technology.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信