Jizhou Zhong , Hany M. Elsheikha , Ka Lung Andrew Chan
{"title":"优化生物医学研究中机器学习增强光谱分析的超特征选择","authors":"Jizhou Zhong , Hany M. Elsheikha , Ka Lung Andrew Chan","doi":"10.1016/j.saa.2025.126639","DOIUrl":null,"url":null,"abstract":"<div><h3>Purpose</h3><div>Machine-learning-powered label-free infrared spectroscopic methods offer significant potential for diagnostic and biomedical applications. However, their applications have been limited by spectral noise, where critical features are often obscured by overlapping bands and data redundancy. Although various feature selection methods have been proposed, many suffer from limitations in consistency and interpretability. To address these challenges, we introduce a novel multi-model machine learning approach that integrates five distinct algorithms to identify a set of “super-features”—spectral features consistently deemed significant across all models.</div></div><div><h3>Principal results</h3><div>This novel workflow outperforms traditional algorithms, achieving superior classification accuracy (>99%) in distinguishing infected from healthy cells, despite using fewer spectral features. To ensure robustness and generalizability, we developed a comprehensive validation strategy that includes independent classifier evaluations, label randomization, and unsupervised analyses. Importantly, the identified super-features accurately differentiated infection states across multiple time points and enhanced the biological interpretability of infection-associated biochemical changes.</div></div><div><h3>Conclusions</h3><div>These findings highlight the potential of advanced multi-model feature selection techniques to enhance the diagnostic power of spectroscopic data in biomedical research, offering high accuracy and valuable biological insights into infection progression.</div></div>","PeriodicalId":433,"journal":{"name":"Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy","volume":"344 ","pages":"Article 126639"},"PeriodicalIF":4.6000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Optimizing super-feature selection for machine learning-enhanced spectroscopic analysis in biomedical research\",\"authors\":\"Jizhou Zhong , Hany M. Elsheikha , Ka Lung Andrew Chan\",\"doi\":\"10.1016/j.saa.2025.126639\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Purpose</h3><div>Machine-learning-powered label-free infrared spectroscopic methods offer significant potential for diagnostic and biomedical applications. However, their applications have been limited by spectral noise, where critical features are often obscured by overlapping bands and data redundancy. Although various feature selection methods have been proposed, many suffer from limitations in consistency and interpretability. To address these challenges, we introduce a novel multi-model machine learning approach that integrates five distinct algorithms to identify a set of “super-features”—spectral features consistently deemed significant across all models.</div></div><div><h3>Principal results</h3><div>This novel workflow outperforms traditional algorithms, achieving superior classification accuracy (>99%) in distinguishing infected from healthy cells, despite using fewer spectral features. To ensure robustness and generalizability, we developed a comprehensive validation strategy that includes independent classifier evaluations, label randomization, and unsupervised analyses. Importantly, the identified super-features accurately differentiated infection states across multiple time points and enhanced the biological interpretability of infection-associated biochemical changes.</div></div><div><h3>Conclusions</h3><div>These findings highlight the potential of advanced multi-model feature selection techniques to enhance the diagnostic power of spectroscopic data in biomedical research, offering high accuracy and valuable biological insights into infection progression.</div></div>\",\"PeriodicalId\":433,\"journal\":{\"name\":\"Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy\",\"volume\":\"344 \",\"pages\":\"Article 126639\"},\"PeriodicalIF\":4.6000,\"publicationDate\":\"2025-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy\",\"FirstCategoryId\":\"92\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1386142525009461\",\"RegionNum\":2,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"SPECTROSCOPY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy","FirstCategoryId":"92","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1386142525009461","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"SPECTROSCOPY","Score":null,"Total":0}
Optimizing super-feature selection for machine learning-enhanced spectroscopic analysis in biomedical research
Purpose
Machine-learning-powered label-free infrared spectroscopic methods offer significant potential for diagnostic and biomedical applications. However, their applications have been limited by spectral noise, where critical features are often obscured by overlapping bands and data redundancy. Although various feature selection methods have been proposed, many suffer from limitations in consistency and interpretability. To address these challenges, we introduce a novel multi-model machine learning approach that integrates five distinct algorithms to identify a set of “super-features”—spectral features consistently deemed significant across all models.
Principal results
This novel workflow outperforms traditional algorithms, achieving superior classification accuracy (>99%) in distinguishing infected from healthy cells, despite using fewer spectral features. To ensure robustness and generalizability, we developed a comprehensive validation strategy that includes independent classifier evaluations, label randomization, and unsupervised analyses. Importantly, the identified super-features accurately differentiated infection states across multiple time points and enhanced the biological interpretability of infection-associated biochemical changes.
Conclusions
These findings highlight the potential of advanced multi-model feature selection techniques to enhance the diagnostic power of spectroscopic data in biomedical research, offering high accuracy and valuable biological insights into infection progression.
期刊介绍:
Spectrochimica Acta, Part A: Molecular and Biomolecular Spectroscopy (SAA) is an interdisciplinary journal which spans from basic to applied aspects of optical spectroscopy in chemistry, medicine, biology, and materials science.
The journal publishes original scientific papers that feature high-quality spectroscopic data and analysis. From the broad range of optical spectroscopies, the emphasis is on electronic, vibrational or rotational spectra of molecules, rather than on spectroscopy based on magnetic moments.
Criteria for publication in SAA are novelty, uniqueness, and outstanding quality. Routine applications of spectroscopic techniques and computational methods are not appropriate.
Topics of particular interest of Spectrochimica Acta Part A include, but are not limited to:
Spectroscopy and dynamics of bioanalytical, biomedical, environmental, and atmospheric sciences,
Novel experimental techniques or instrumentation for molecular spectroscopy,
Novel theoretical and computational methods,
Novel applications in photochemistry and photobiology,
Novel interpretational approaches as well as advances in data analysis based on electronic or vibrational spectroscopy.