{"title":"随机剪接辅助深度学习的乳腺癌细胞系拉曼光谱分类。","authors":"Yiheng Liu, Junfeng Liu, Jiayi Wan, Hongke Hao, Guangxing Liu, Xia Huang","doi":"10.1016/j.csbj.2025.05.051","DOIUrl":null,"url":null,"abstract":"<p><p>Raman spectroscopy extracts rich biochemical information on a single cell, demonstrating significant potential for precise cancer identification. While machine learning enhances spectral analysis efficiency, conventional models remain constrained by data volume. Here, we developed Random Splicing-Convolutional Neural Network (RS-CNN), a deep learning framework that addresses data scarcity through spectral concatenation. By randomly splicing Raman spectra from the same cell line, RS-CNN enhances distinctive spectral features while simultaneously expanding dataset size and improving signal quality. Validation across six breast cancer cell lines demonstrated RS-CNN's superiority over five benchmark models (SVM, LDA, PCA-SVM, PCA-LDA, CNN). With 450 spectra per cell line, RS-CNN achieved 98.63 % classification accuracy compared to conventional models' accuracies of around 85 %. Under data-limited conditions (100 spectra/line), RS-CNN maintained 91.47 % accuracy, outperforming CNN's 70.83 %. The RS-CNN's generalizability was further validated by an independently acquired dataset, achieving at least 94 % classification accuracy. SHAP analysis suggested the spectral region around 980 cm⁻¹ was significant for cancer diagnosis, while the 1158-1160 cm⁻¹and 1603-1607 cm⁻¹ regions were particularly valuable for distinguishing between cancer subtypes. These findings establish RS-CNN as a robust analytical model for clinical Raman diagnostics, particularly valuable in applications requiring high accuracy with limited samples.</p>","PeriodicalId":10715,"journal":{"name":"Computational and structural biotechnology journal","volume":"27 ","pages":"2288-2297"},"PeriodicalIF":4.1000,"publicationDate":"2025-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12162052/pdf/","citationCount":"0","resultStr":"{\"title\":\"Random splicing assisted deep learning for breast cancer cell line classification via Raman spectroscopy.\",\"authors\":\"Yiheng Liu, Junfeng Liu, Jiayi Wan, Hongke Hao, Guangxing Liu, Xia Huang\",\"doi\":\"10.1016/j.csbj.2025.05.051\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Raman spectroscopy extracts rich biochemical information on a single cell, demonstrating significant potential for precise cancer identification. While machine learning enhances spectral analysis efficiency, conventional models remain constrained by data volume. Here, we developed Random Splicing-Convolutional Neural Network (RS-CNN), a deep learning framework that addresses data scarcity through spectral concatenation. By randomly splicing Raman spectra from the same cell line, RS-CNN enhances distinctive spectral features while simultaneously expanding dataset size and improving signal quality. Validation across six breast cancer cell lines demonstrated RS-CNN's superiority over five benchmark models (SVM, LDA, PCA-SVM, PCA-LDA, CNN). With 450 spectra per cell line, RS-CNN achieved 98.63 % classification accuracy compared to conventional models' accuracies of around 85 %. Under data-limited conditions (100 spectra/line), RS-CNN maintained 91.47 % accuracy, outperforming CNN's 70.83 %. The RS-CNN's generalizability was further validated by an independently acquired dataset, achieving at least 94 % classification accuracy. SHAP analysis suggested the spectral region around 980 cm⁻¹ was significant for cancer diagnosis, while the 1158-1160 cm⁻¹and 1603-1607 cm⁻¹ regions were particularly valuable for distinguishing between cancer subtypes. These findings establish RS-CNN as a robust analytical model for clinical Raman diagnostics, particularly valuable in applications requiring high accuracy with limited samples.</p>\",\"PeriodicalId\":10715,\"journal\":{\"name\":\"Computational and structural biotechnology journal\",\"volume\":\"27 \",\"pages\":\"2288-2297\"},\"PeriodicalIF\":4.1000,\"publicationDate\":\"2025-05-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12162052/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computational and structural biotechnology journal\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1016/j.csbj.2025.05.051\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational and structural biotechnology journal","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1016/j.csbj.2025.05.051","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
Random splicing assisted deep learning for breast cancer cell line classification via Raman spectroscopy.
Raman spectroscopy extracts rich biochemical information on a single cell, demonstrating significant potential for precise cancer identification. While machine learning enhances spectral analysis efficiency, conventional models remain constrained by data volume. Here, we developed Random Splicing-Convolutional Neural Network (RS-CNN), a deep learning framework that addresses data scarcity through spectral concatenation. By randomly splicing Raman spectra from the same cell line, RS-CNN enhances distinctive spectral features while simultaneously expanding dataset size and improving signal quality. Validation across six breast cancer cell lines demonstrated RS-CNN's superiority over five benchmark models (SVM, LDA, PCA-SVM, PCA-LDA, CNN). With 450 spectra per cell line, RS-CNN achieved 98.63 % classification accuracy compared to conventional models' accuracies of around 85 %. Under data-limited conditions (100 spectra/line), RS-CNN maintained 91.47 % accuracy, outperforming CNN's 70.83 %. The RS-CNN's generalizability was further validated by an independently acquired dataset, achieving at least 94 % classification accuracy. SHAP analysis suggested the spectral region around 980 cm⁻¹ was significant for cancer diagnosis, while the 1158-1160 cm⁻¹and 1603-1607 cm⁻¹ regions were particularly valuable for distinguishing between cancer subtypes. These findings establish RS-CNN as a robust analytical model for clinical Raman diagnostics, particularly valuable in applications requiring high accuracy with limited samples.
期刊介绍:
Computational and Structural Biotechnology Journal (CSBJ) is an online gold open access journal publishing research articles and reviews after full peer review. All articles are published, without barriers to access, immediately upon acceptance. The journal places a strong emphasis on functional and mechanistic understanding of how molecular components in a biological process work together through the application of computational methods. Structural data may provide such insights, but they are not a pre-requisite for publication in the journal. Specific areas of interest include, but are not limited to:
Structure and function of proteins, nucleic acids and other macromolecules
Structure and function of multi-component complexes
Protein folding, processing and degradation
Enzymology
Computational and structural studies of plant systems
Microbial Informatics
Genomics
Proteomics
Metabolomics
Algorithms and Hypothesis in Bioinformatics
Mathematical and Theoretical Biology
Computational Chemistry and Drug Discovery
Microscopy and Molecular Imaging
Nanotechnology
Systems and Synthetic Biology