结合机器学习算法的拉曼光谱识别乳腺癌亚型:分析基线的影响

IF 2.4 3区 化学 Q2 SPECTROSCOPY
Chao Yang, Kaisaier Aizezi, Juan Li, Xiaoting Wang, Fengling Li, Wen Lei, Jingjing Xia, Ayitila Maimaitijiang
{"title":"结合机器学习算法的拉曼光谱识别乳腺癌亚型:分析基线的影响","authors":"Chao Yang,&nbsp;Kaisaier Aizezi,&nbsp;Juan Li,&nbsp;Xiaoting Wang,&nbsp;Fengling Li,&nbsp;Wen Lei,&nbsp;Jingjing Xia,&nbsp;Ayitila Maimaitijiang","doi":"10.1002/jrs.6799","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>The question of how the baseline of Raman spectroscopy impacts data models has remained unexplored. In this research, we utilized three spectral datasets—raw, preprocessed, and baseline data—to construct identification models for breast cancer molecular subtypes using four machine learning algorithms and examined and analyzed the influence of baseline data on the performance of these models. In the identification models for cancer cell molecular subtypes, regardless of whether they pertained to normal or breast cancer cells, preprocessed data consistently yielded the most optimal model performance, trailed by raw data, and ultimately followed by baseline data. Despite the baseline data giving the worst classification performance, when coupled with the artificial neural network, it consistently attained a recognition accuracy of approximately 92.50 ± 5.30% in the binary classification and 90.60 ± 1.52% in the five-class classification. The results suggested that baseline data held a notable contribution to the performance of data models. Looking ahead, it could potentially harness the concept of food by-product processing to maximize the utilization of baseline data. Furthermore, when integrated with feature visualization strategies, the UVE-SPA and ICO approaches, employing merely 30 or 258 variables, respectively, were able to yield model results comparable to those of preprocessed data (with 858 variables), attaining an accuracy of 96.00 ± 1.87%. This underscored the pivotal role of the selected Raman spectral regions in distinguishing breast cancer molecular subtypes. Beyond the standard protein, lipid, and nucleic acid regions, the selected features encompassed cysteine, phenylalanine, and carotenoid, all of which, according to established research, had held crucial significance in the development and progression of cancer. This project delved into the impact of Raman baseline on model outcomes, furnishing valuable data to enhance future Raman spectroscopy modeling techniques and igniting discussions on the untapped potential of baseline data in forthcoming endeavors.</p>\n </div>","PeriodicalId":16926,"journal":{"name":"Journal of Raman Spectroscopy","volume":"56 7","pages":"556-566"},"PeriodicalIF":2.4000,"publicationDate":"2025-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"The Identification of Breast Cancer Subtypes by Raman Spectroscopy Integrated With Machine Learning Algorithms: Analyzing the Influence of Baseline\",\"authors\":\"Chao Yang,&nbsp;Kaisaier Aizezi,&nbsp;Juan Li,&nbsp;Xiaoting Wang,&nbsp;Fengling Li,&nbsp;Wen Lei,&nbsp;Jingjing Xia,&nbsp;Ayitila Maimaitijiang\",\"doi\":\"10.1002/jrs.6799\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n <p>The question of how the baseline of Raman spectroscopy impacts data models has remained unexplored. In this research, we utilized three spectral datasets—raw, preprocessed, and baseline data—to construct identification models for breast cancer molecular subtypes using four machine learning algorithms and examined and analyzed the influence of baseline data on the performance of these models. In the identification models for cancer cell molecular subtypes, regardless of whether they pertained to normal or breast cancer cells, preprocessed data consistently yielded the most optimal model performance, trailed by raw data, and ultimately followed by baseline data. Despite the baseline data giving the worst classification performance, when coupled with the artificial neural network, it consistently attained a recognition accuracy of approximately 92.50 ± 5.30% in the binary classification and 90.60 ± 1.52% in the five-class classification. The results suggested that baseline data held a notable contribution to the performance of data models. Looking ahead, it could potentially harness the concept of food by-product processing to maximize the utilization of baseline data. Furthermore, when integrated with feature visualization strategies, the UVE-SPA and ICO approaches, employing merely 30 or 258 variables, respectively, were able to yield model results comparable to those of preprocessed data (with 858 variables), attaining an accuracy of 96.00 ± 1.87%. This underscored the pivotal role of the selected Raman spectral regions in distinguishing breast cancer molecular subtypes. Beyond the standard protein, lipid, and nucleic acid regions, the selected features encompassed cysteine, phenylalanine, and carotenoid, all of which, according to established research, had held crucial significance in the development and progression of cancer. This project delved into the impact of Raman baseline on model outcomes, furnishing valuable data to enhance future Raman spectroscopy modeling techniques and igniting discussions on the untapped potential of baseline data in forthcoming endeavors.</p>\\n </div>\",\"PeriodicalId\":16926,\"journal\":{\"name\":\"Journal of Raman Spectroscopy\",\"volume\":\"56 7\",\"pages\":\"556-566\"},\"PeriodicalIF\":2.4000,\"publicationDate\":\"2025-03-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Raman Spectroscopy\",\"FirstCategoryId\":\"92\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/jrs.6799\",\"RegionNum\":3,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"SPECTROSCOPY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Raman Spectroscopy","FirstCategoryId":"92","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/jrs.6799","RegionNum":3,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"SPECTROSCOPY","Score":null,"Total":0}
引用次数: 0

摘要

拉曼光谱的基线如何影响数据模型的问题仍未被探索。在本研究中,我们利用三种光谱数据集(原始数据、预处理数据和基线数据)使用四种机器学习算法构建乳腺癌分子亚型识别模型,并检查和分析基线数据对这些模型性能的影响。在癌细胞分子亚型的识别模型中,无论它们是属于正常细胞还是乳腺癌细胞,预处理数据始终产生最优的模型性能,然后是原始数据,最后是基线数据。尽管基线数据的分类性能最差,但当与人工神经网络相结合时,其识别准确率在二分类中约为92.50±5.30%,在五类分类中约为90.60±1.52%。结果表明,基线数据对数据模型的性能有显著的贡献。展望未来,它有可能利用食品副产品加工的概念,最大限度地利用基线数据。此外,当与特征可视化策略相结合时,UVE-SPA和ICO方法分别仅使用30或258个变量,能够产生与预处理数据(858个变量)相当的模型结果,达到96.00±1.87%的准确率。这强调了选定的拉曼光谱区域在区分乳腺癌分子亚型中的关键作用。除了标准的蛋白质、脂质和核酸区域外,所选择的特征还包括半胱氨酸、苯丙氨酸和类胡萝卜素,根据已有的研究,所有这些特征在癌症的发生和发展中都具有至关重要的意义。该项目深入研究了拉曼基线对模型结果的影响,为加强未来拉曼光谱建模技术提供了有价值的数据,并引发了对基线数据在未来工作中未开发潜力的讨论。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

The Identification of Breast Cancer Subtypes by Raman Spectroscopy Integrated With Machine Learning Algorithms: Analyzing the Influence of Baseline

The Identification of Breast Cancer Subtypes by Raman Spectroscopy Integrated With Machine Learning Algorithms: Analyzing the Influence of Baseline

The question of how the baseline of Raman spectroscopy impacts data models has remained unexplored. In this research, we utilized three spectral datasets—raw, preprocessed, and baseline data—to construct identification models for breast cancer molecular subtypes using four machine learning algorithms and examined and analyzed the influence of baseline data on the performance of these models. In the identification models for cancer cell molecular subtypes, regardless of whether they pertained to normal or breast cancer cells, preprocessed data consistently yielded the most optimal model performance, trailed by raw data, and ultimately followed by baseline data. Despite the baseline data giving the worst classification performance, when coupled with the artificial neural network, it consistently attained a recognition accuracy of approximately 92.50 ± 5.30% in the binary classification and 90.60 ± 1.52% in the five-class classification. The results suggested that baseline data held a notable contribution to the performance of data models. Looking ahead, it could potentially harness the concept of food by-product processing to maximize the utilization of baseline data. Furthermore, when integrated with feature visualization strategies, the UVE-SPA and ICO approaches, employing merely 30 or 258 variables, respectively, were able to yield model results comparable to those of preprocessed data (with 858 variables), attaining an accuracy of 96.00 ± 1.87%. This underscored the pivotal role of the selected Raman spectral regions in distinguishing breast cancer molecular subtypes. Beyond the standard protein, lipid, and nucleic acid regions, the selected features encompassed cysteine, phenylalanine, and carotenoid, all of which, according to established research, had held crucial significance in the development and progression of cancer. This project delved into the impact of Raman baseline on model outcomes, furnishing valuable data to enhance future Raman spectroscopy modeling techniques and igniting discussions on the untapped potential of baseline data in forthcoming endeavors.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
5.40
自引率
8.00%
发文量
185
审稿时长
3.0 months
期刊介绍: The Journal of Raman Spectroscopy is an international journal dedicated to the publication of original research at the cutting edge of all areas of science and technology related to Raman spectroscopy. The journal seeks to be the central forum for documenting the evolution of the broadly-defined field of Raman spectroscopy that includes an increasing number of rapidly developing techniques and an ever-widening array of interdisciplinary applications. Such topics include time-resolved, coherent and non-linear Raman spectroscopies, nanostructure-based surface-enhanced and tip-enhanced Raman spectroscopies of molecules, resonance Raman to investigate the structure-function relationships and dynamics of biological molecules, linear and nonlinear Raman imaging and microscopy, biomedical applications of Raman, theoretical formalism and advances in quantum computational methodology of all forms of Raman scattering, Raman spectroscopy in archaeology and art, advances in remote Raman sensing and industrial applications, and Raman optical activity of all classes of chiral molecules.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信