随机剪接辅助深度学习的乳腺癌细胞系拉曼光谱分类。

IF 4.1 2区 生物学 Q2 BIOCHEMISTRY & MOLECULAR BIOLOGY
Computational and structural biotechnology journal Pub Date : 2025-05-30 eCollection Date: 2025-01-01 DOI:10.1016/j.csbj.2025.05.051
Yiheng Liu, Junfeng Liu, Jiayi Wan, Hongke Hao, Guangxing Liu, Xia Huang
{"title":"随机剪接辅助深度学习的乳腺癌细胞系拉曼光谱分类。","authors":"Yiheng Liu, Junfeng Liu, Jiayi Wan, Hongke Hao, Guangxing Liu, Xia Huang","doi":"10.1016/j.csbj.2025.05.051","DOIUrl":null,"url":null,"abstract":"<p><p>Raman spectroscopy extracts rich biochemical information on a single cell, demonstrating significant potential for precise cancer identification. While machine learning enhances spectral analysis efficiency, conventional models remain constrained by data volume. Here, we developed Random Splicing-Convolutional Neural Network (RS-CNN), a deep learning framework that addresses data scarcity through spectral concatenation. By randomly splicing Raman spectra from the same cell line, RS-CNN enhances distinctive spectral features while simultaneously expanding dataset size and improving signal quality. Validation across six breast cancer cell lines demonstrated RS-CNN's superiority over five benchmark models (SVM, LDA, PCA-SVM, PCA-LDA, CNN). With 450 spectra per cell line, RS-CNN achieved 98.63 % classification accuracy compared to conventional models' accuracies of around 85 %. Under data-limited conditions (100 spectra/line), RS-CNN maintained 91.47 % accuracy, outperforming CNN's 70.83 %. The RS-CNN's generalizability was further validated by an independently acquired dataset, achieving at least 94 % classification accuracy. SHAP analysis suggested the spectral region around 980 cm⁻¹ was significant for cancer diagnosis, while the 1158-1160 cm⁻¹and 1603-1607 cm⁻¹ regions were particularly valuable for distinguishing between cancer subtypes. These findings establish RS-CNN as a robust analytical model for clinical Raman diagnostics, particularly valuable in applications requiring high accuracy with limited samples.</p>","PeriodicalId":10715,"journal":{"name":"Computational and structural biotechnology journal","volume":"27 ","pages":"2288-2297"},"PeriodicalIF":4.1000,"publicationDate":"2025-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12162052/pdf/","citationCount":"0","resultStr":"{\"title\":\"Random splicing assisted deep learning for breast cancer cell line classification via Raman spectroscopy.\",\"authors\":\"Yiheng Liu, Junfeng Liu, Jiayi Wan, Hongke Hao, Guangxing Liu, Xia Huang\",\"doi\":\"10.1016/j.csbj.2025.05.051\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Raman spectroscopy extracts rich biochemical information on a single cell, demonstrating significant potential for precise cancer identification. While machine learning enhances spectral analysis efficiency, conventional models remain constrained by data volume. Here, we developed Random Splicing-Convolutional Neural Network (RS-CNN), a deep learning framework that addresses data scarcity through spectral concatenation. By randomly splicing Raman spectra from the same cell line, RS-CNN enhances distinctive spectral features while simultaneously expanding dataset size and improving signal quality. Validation across six breast cancer cell lines demonstrated RS-CNN's superiority over five benchmark models (SVM, LDA, PCA-SVM, PCA-LDA, CNN). With 450 spectra per cell line, RS-CNN achieved 98.63 % classification accuracy compared to conventional models' accuracies of around 85 %. Under data-limited conditions (100 spectra/line), RS-CNN maintained 91.47 % accuracy, outperforming CNN's 70.83 %. The RS-CNN's generalizability was further validated by an independently acquired dataset, achieving at least 94 % classification accuracy. SHAP analysis suggested the spectral region around 980 cm⁻¹ was significant for cancer diagnosis, while the 1158-1160 cm⁻¹and 1603-1607 cm⁻¹ regions were particularly valuable for distinguishing between cancer subtypes. These findings establish RS-CNN as a robust analytical model for clinical Raman diagnostics, particularly valuable in applications requiring high accuracy with limited samples.</p>\",\"PeriodicalId\":10715,\"journal\":{\"name\":\"Computational and structural biotechnology journal\",\"volume\":\"27 \",\"pages\":\"2288-2297\"},\"PeriodicalIF\":4.1000,\"publicationDate\":\"2025-05-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12162052/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computational and structural biotechnology journal\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1016/j.csbj.2025.05.051\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational and structural biotechnology journal","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1016/j.csbj.2025.05.051","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

拉曼光谱在单个细胞上提取丰富的生化信息,显示出精确识别癌症的巨大潜力。虽然机器学习提高了光谱分析的效率,但传统模型仍然受到数据量的限制。在这里,我们开发了随机拼接-卷积神经网络(RS-CNN),这是一种深度学习框架,通过频谱拼接解决数据稀缺问题。RS-CNN通过随机拼接来自同一细胞系的拉曼光谱,增强了光谱特征,同时扩大了数据集规模,提高了信号质量。6种乳腺癌细胞系的验证表明,RS-CNN优于5种基准模型(SVM、LDA、PCA-SVM、PCA-LDA、CNN)。每个细胞系450个光谱,RS-CNN的分类准确率达到98.63 %,而传统模型的准确率约为85 %。在数据有限的条件下(100个光谱/线),RS-CNN保持91.47 %的准确率,优于CNN的70.83 %。通过独立获取的数据集进一步验证了RS-CNN的泛化性,达到了至少94% %的分类准确率。SHAP分析表明,980 cm⁻¹ 附近的光谱区域对癌症诊断有重要意义,而1158-1160 cm⁻¹和1603-1607 cm⁻¹ 区域对区分癌症亚型特别有价值。这些发现确立了RS-CNN作为临床拉曼诊断的稳健分析模型,在需要高精度和有限样本的应用中特别有价值。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Random splicing assisted deep learning for breast cancer cell line classification via Raman spectroscopy.

Raman spectroscopy extracts rich biochemical information on a single cell, demonstrating significant potential for precise cancer identification. While machine learning enhances spectral analysis efficiency, conventional models remain constrained by data volume. Here, we developed Random Splicing-Convolutional Neural Network (RS-CNN), a deep learning framework that addresses data scarcity through spectral concatenation. By randomly splicing Raman spectra from the same cell line, RS-CNN enhances distinctive spectral features while simultaneously expanding dataset size and improving signal quality. Validation across six breast cancer cell lines demonstrated RS-CNN's superiority over five benchmark models (SVM, LDA, PCA-SVM, PCA-LDA, CNN). With 450 spectra per cell line, RS-CNN achieved 98.63 % classification accuracy compared to conventional models' accuracies of around 85 %. Under data-limited conditions (100 spectra/line), RS-CNN maintained 91.47 % accuracy, outperforming CNN's 70.83 %. The RS-CNN's generalizability was further validated by an independently acquired dataset, achieving at least 94 % classification accuracy. SHAP analysis suggested the spectral region around 980 cm⁻¹ was significant for cancer diagnosis, while the 1158-1160 cm⁻¹and 1603-1607 cm⁻¹ regions were particularly valuable for distinguishing between cancer subtypes. These findings establish RS-CNN as a robust analytical model for clinical Raman diagnostics, particularly valuable in applications requiring high accuracy with limited samples.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Computational and structural biotechnology journal
Computational and structural biotechnology journal Biochemistry, Genetics and Molecular Biology-Biophysics
CiteScore
9.30
自引率
3.30%
发文量
540
审稿时长
6 weeks
期刊介绍: Computational and Structural Biotechnology Journal (CSBJ) is an online gold open access journal publishing research articles and reviews after full peer review. All articles are published, without barriers to access, immediately upon acceptance. The journal places a strong emphasis on functional and mechanistic understanding of how molecular components in a biological process work together through the application of computational methods. Structural data may provide such insights, but they are not a pre-requisite for publication in the journal. Specific areas of interest include, but are not limited to: Structure and function of proteins, nucleic acids and other macromolecules Structure and function of multi-component complexes Protein folding, processing and degradation Enzymology Computational and structural studies of plant systems Microbial Informatics Genomics Proteomics Metabolomics Algorithms and Hypothesis in Bioinformatics Mathematical and Theoretical Biology Computational Chemistry and Drug Discovery Microscopy and Molecular Imaging Nanotechnology Systems and Synthetic Biology
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信