Improving Machine Learning–Based Bacterial Discrimination by Learning Single-Cell Raman Data From Multiple Growth Phases

IF 2.4 3区 化学 Q2 SPECTROSCOPY
Nodoka Oda, Nanako Kanno, Shingo Kato, Moriya Ohkuma, Shinsuke Shigeto
{"title":"Improving Machine Learning–Based Bacterial Discrimination by Learning Single-Cell Raman Data From Multiple Growth Phases","authors":"Nodoka Oda,&nbsp;Nanako Kanno,&nbsp;Shingo Kato,&nbsp;Moriya Ohkuma,&nbsp;Shinsuke Shigeto","doi":"10.1002/jrs.6804","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>Bacterial discrimination using single-cell Raman spectroscopy and machine/deep learning techniques has been widely explored for promising applications in medical, environmental, and food sciences. To construct a machine-learning model that can achieve highly accurate and robust discrimination of bacteria in real-world samples, data consisting of Raman spectra of bacterial cells acquired under various physiological conditions are essential. Despite much effort to study the effects of growth phase on bacterial discrimination, it is not yet fully elucidated which growth phase(s) needs to be included in training data to efficiently improve discrimination accuracy and what growth phase-dependent changes in cellular components underlie accurate discrimination. Here, we used random forest (RF), an ensemble machine learning method, to discriminate six model bacterial species, including both Gram-positive and Gram-negative bacteria, at five different growth phases ranging from lag to late stationary phases. We compared four RF classification models that were trained on Raman data from one (either midexponential or late stationary), two (midexponential and late stationary), and all five growth phases. The species discrimination accuracy of the model built on the training data consisting of the two distinctly different growth phases exceeded 80% with a marked increase of 24% and 32.5% relative to the models learning data from a single growth phase. This increase was greater than what we found in going from training data with two growth phases to that with all five growth phases (13%). We also revealed that Raman bands that are relatively invariant (e.g., proteins) and specific to the growth phase (e.g., DNA/RNA and intracellular storage materials) are both important for attaining accurate bacterial discrimination. The present study provides a simple yet effective way to construct training data for good discrimination performance, which could be extended to discriminate bacterial cells under other physiological conditions such as nutrient, temperature, and pH.</p>\n </div>","PeriodicalId":16926,"journal":{"name":"Journal of Raman Spectroscopy","volume":"56 6","pages":"481-490"},"PeriodicalIF":2.4000,"publicationDate":"2025-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Raman Spectroscopy","FirstCategoryId":"92","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/jrs.6804","RegionNum":3,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"SPECTROSCOPY","Score":null,"Total":0}
引用次数: 0

Abstract

Bacterial discrimination using single-cell Raman spectroscopy and machine/deep learning techniques has been widely explored for promising applications in medical, environmental, and food sciences. To construct a machine-learning model that can achieve highly accurate and robust discrimination of bacteria in real-world samples, data consisting of Raman spectra of bacterial cells acquired under various physiological conditions are essential. Despite much effort to study the effects of growth phase on bacterial discrimination, it is not yet fully elucidated which growth phase(s) needs to be included in training data to efficiently improve discrimination accuracy and what growth phase-dependent changes in cellular components underlie accurate discrimination. Here, we used random forest (RF), an ensemble machine learning method, to discriminate six model bacterial species, including both Gram-positive and Gram-negative bacteria, at five different growth phases ranging from lag to late stationary phases. We compared four RF classification models that were trained on Raman data from one (either midexponential or late stationary), two (midexponential and late stationary), and all five growth phases. The species discrimination accuracy of the model built on the training data consisting of the two distinctly different growth phases exceeded 80% with a marked increase of 24% and 32.5% relative to the models learning data from a single growth phase. This increase was greater than what we found in going from training data with two growth phases to that with all five growth phases (13%). We also revealed that Raman bands that are relatively invariant (e.g., proteins) and specific to the growth phase (e.g., DNA/RNA and intracellular storage materials) are both important for attaining accurate bacterial discrimination. The present study provides a simple yet effective way to construct training data for good discrimination performance, which could be extended to discriminate bacterial cells under other physiological conditions such as nutrient, temperature, and pH.

通过学习来自多个生长阶段的单细胞拉曼数据来改进基于机器学习的细菌识别
利用单细胞拉曼光谱和机器/深度学习技术进行细菌鉴别在医学、环境和食品科学中有着广泛的应用前景。为了构建一个能够在现实世界样本中实现高度准确和稳健的细菌识别的机器学习模型,在各种生理条件下获得的细菌细胞拉曼光谱数据是必不可少的。尽管人们努力研究生长阶段对细菌识别的影响,但尚未完全阐明哪些生长阶段需要包括在训练数据中以有效提高识别准确性,以及细胞成分中哪些生长阶段依赖的变化是准确识别的基础。在这里,我们使用随机森林(RF),一种集成机器学习方法,区分六种模式细菌,包括革兰氏阳性和革兰氏阴性细菌,在五个不同的生长阶段,从滞后到后期平稳阶段。我们比较了四种RF分类模型,这些模型是在拉曼数据上训练的,拉曼数据来自一个(中指数或后期平稳)、两个(中指数和后期平稳)和所有五个生长阶段。由两个明显不同生长阶段的训练数据组成的模型的物种识别准确率超过80%,与单一生长阶段的模型相比,准确率分别提高了24%和32.5%。这比我们从包含两个成长阶段的训练数据到包含所有五个成长阶段的训练数据(13%)的增长还要大。我们还发现,相对不变(如蛋白质)和特定于生长阶段(如DNA/RNA和细胞内储存物质)的拉曼带对于获得准确的细菌识别都很重要。本研究提供了一种简单有效的方法来构建具有良好鉴别性能的训练数据,并可推广到其他生理条件下的细菌细胞鉴别,如营养、温度、pH等。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
5.40
自引率
8.00%
发文量
185
审稿时长
3.0 months
期刊介绍: The Journal of Raman Spectroscopy is an international journal dedicated to the publication of original research at the cutting edge of all areas of science and technology related to Raman spectroscopy. The journal seeks to be the central forum for documenting the evolution of the broadly-defined field of Raman spectroscopy that includes an increasing number of rapidly developing techniques and an ever-widening array of interdisciplinary applications. Such topics include time-resolved, coherent and non-linear Raman spectroscopies, nanostructure-based surface-enhanced and tip-enhanced Raman spectroscopies of molecules, resonance Raman to investigate the structure-function relationships and dynamics of biological molecules, linear and nonlinear Raman imaging and microscopy, biomedical applications of Raman, theoretical formalism and advances in quantum computational methodology of all forms of Raman scattering, Raman spectroscopy in archaeology and art, advances in remote Raman sensing and industrial applications, and Raman optical activity of all classes of chiral molecules.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信