{"title":"Improving Machine Learning–Based Bacterial Discrimination by Learning Single-Cell Raman Data From Multiple Growth Phases","authors":"Nodoka Oda, Nanako Kanno, Shingo Kato, Moriya Ohkuma, Shinsuke Shigeto","doi":"10.1002/jrs.6804","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>Bacterial discrimination using single-cell Raman spectroscopy and machine/deep learning techniques has been widely explored for promising applications in medical, environmental, and food sciences. To construct a machine-learning model that can achieve highly accurate and robust discrimination of bacteria in real-world samples, data consisting of Raman spectra of bacterial cells acquired under various physiological conditions are essential. Despite much effort to study the effects of growth phase on bacterial discrimination, it is not yet fully elucidated which growth phase(s) needs to be included in training data to efficiently improve discrimination accuracy and what growth phase-dependent changes in cellular components underlie accurate discrimination. Here, we used random forest (RF), an ensemble machine learning method, to discriminate six model bacterial species, including both Gram-positive and Gram-negative bacteria, at five different growth phases ranging from lag to late stationary phases. We compared four RF classification models that were trained on Raman data from one (either midexponential or late stationary), two (midexponential and late stationary), and all five growth phases. The species discrimination accuracy of the model built on the training data consisting of the two distinctly different growth phases exceeded 80% with a marked increase of 24% and 32.5% relative to the models learning data from a single growth phase. This increase was greater than what we found in going from training data with two growth phases to that with all five growth phases (13%). We also revealed that Raman bands that are relatively invariant (e.g., proteins) and specific to the growth phase (e.g., DNA/RNA and intracellular storage materials) are both important for attaining accurate bacterial discrimination. The present study provides a simple yet effective way to construct training data for good discrimination performance, which could be extended to discriminate bacterial cells under other physiological conditions such as nutrient, temperature, and pH.</p>\n </div>","PeriodicalId":16926,"journal":{"name":"Journal of Raman Spectroscopy","volume":"56 6","pages":"481-490"},"PeriodicalIF":2.4000,"publicationDate":"2025-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Raman Spectroscopy","FirstCategoryId":"92","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/jrs.6804","RegionNum":3,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"SPECTROSCOPY","Score":null,"Total":0}
引用次数: 0
Abstract
Bacterial discrimination using single-cell Raman spectroscopy and machine/deep learning techniques has been widely explored for promising applications in medical, environmental, and food sciences. To construct a machine-learning model that can achieve highly accurate and robust discrimination of bacteria in real-world samples, data consisting of Raman spectra of bacterial cells acquired under various physiological conditions are essential. Despite much effort to study the effects of growth phase on bacterial discrimination, it is not yet fully elucidated which growth phase(s) needs to be included in training data to efficiently improve discrimination accuracy and what growth phase-dependent changes in cellular components underlie accurate discrimination. Here, we used random forest (RF), an ensemble machine learning method, to discriminate six model bacterial species, including both Gram-positive and Gram-negative bacteria, at five different growth phases ranging from lag to late stationary phases. We compared four RF classification models that were trained on Raman data from one (either midexponential or late stationary), two (midexponential and late stationary), and all five growth phases. The species discrimination accuracy of the model built on the training data consisting of the two distinctly different growth phases exceeded 80% with a marked increase of 24% and 32.5% relative to the models learning data from a single growth phase. This increase was greater than what we found in going from training data with two growth phases to that with all five growth phases (13%). We also revealed that Raman bands that are relatively invariant (e.g., proteins) and specific to the growth phase (e.g., DNA/RNA and intracellular storage materials) are both important for attaining accurate bacterial discrimination. The present study provides a simple yet effective way to construct training data for good discrimination performance, which could be extended to discriminate bacterial cells under other physiological conditions such as nutrient, temperature, and pH.
期刊介绍:
The Journal of Raman Spectroscopy is an international journal dedicated to the publication of original research at the cutting edge of all areas of science and technology related to Raman spectroscopy. The journal seeks to be the central forum for documenting the evolution of the broadly-defined field of Raman spectroscopy that includes an increasing number of rapidly developing techniques and an ever-widening array of interdisciplinary applications.
Such topics include time-resolved, coherent and non-linear Raman spectroscopies, nanostructure-based surface-enhanced and tip-enhanced Raman spectroscopies of molecules, resonance Raman to investigate the structure-function relationships and dynamics of biological molecules, linear and nonlinear Raman imaging and microscopy, biomedical applications of Raman, theoretical formalism and advances in quantum computational methodology of all forms of Raman scattering, Raman spectroscopy in archaeology and art, advances in remote Raman sensing and industrial applications, and Raman optical activity of all classes of chiral molecules.