Songlin Lu, Yuanfang Huang, Wan Xiang Shen, Yu Lin Cao, Mengna Cai, Yan Chen, Ying Tan, Yu Yang Jiang, Yu Zong Chen
{"title":"利用信号聚合表示的拉曼光谱深度学习,增强细胞表型和特征识别能力","authors":"Songlin Lu, Yuanfang Huang, Wan Xiang Shen, Yu Lin Cao, Mengna Cai, Yan Chen, Ying Tan, Yu Yang Jiang, Yu Zong Chen","doi":"10.1093/pnasnexus/pgae268","DOIUrl":null,"url":null,"abstract":"Feature representation is critical for data learning, particularly in learning spectroscopic data. Machine learning (ML) and deep learning (DL) models learn Raman spectra for rapid, non-destructive, and label-free cell phenotype identification, which facilitate diagnostic, therapeutic, forensic, and microbiological applications. But these are challenged by high-dimensional, unordered and low-sample spectroscopic data. Here we introduced novel 2D image-like dual signal and component aggregated representations by restructuring Raman spectra and principal components, which enables spectroscopic DL for enhanced cell phenotype and signature identification. New ConvNet models DSCARNets significantly outperformed the state-of-the-art (SOTA) ML and DL models on six benchmark datasets, mostly with >2% improvement over the SOTA performance of 85%-97% accuracies. DSCARNets also performed well on four additional datasets against SOTA models of extremely high performances (>98%) and two datasets without a published supervised phenotype classification model. Explainable DSCARNets identified Raman signatures consistent with experimental indications.","PeriodicalId":516525,"journal":{"name":"PNAS Nexus","volume":"77 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Raman spectroscopic deep learning with signal aggregated representations for enhanced cell phenotype and signature identification\",\"authors\":\"Songlin Lu, Yuanfang Huang, Wan Xiang Shen, Yu Lin Cao, Mengna Cai, Yan Chen, Ying Tan, Yu Yang Jiang, Yu Zong Chen\",\"doi\":\"10.1093/pnasnexus/pgae268\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Feature representation is critical for data learning, particularly in learning spectroscopic data. Machine learning (ML) and deep learning (DL) models learn Raman spectra for rapid, non-destructive, and label-free cell phenotype identification, which facilitate diagnostic, therapeutic, forensic, and microbiological applications. But these are challenged by high-dimensional, unordered and low-sample spectroscopic data. Here we introduced novel 2D image-like dual signal and component aggregated representations by restructuring Raman spectra and principal components, which enables spectroscopic DL for enhanced cell phenotype and signature identification. New ConvNet models DSCARNets significantly outperformed the state-of-the-art (SOTA) ML and DL models on six benchmark datasets, mostly with >2% improvement over the SOTA performance of 85%-97% accuracies. DSCARNets also performed well on four additional datasets against SOTA models of extremely high performances (>98%) and two datasets without a published supervised phenotype classification model. Explainable DSCARNets identified Raman signatures consistent with experimental indications.\",\"PeriodicalId\":516525,\"journal\":{\"name\":\"PNAS Nexus\",\"volume\":\"77 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-07-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"PNAS Nexus\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1093/pnasnexus/pgae268\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"PNAS Nexus","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/pnasnexus/pgae268","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
特征表示对于数据学习,尤其是光谱数据学习至关重要。机器学习(ML)和深度学习(DL)模型学习拉曼光谱,可用于快速、无损和无标记的细胞表型识别,从而促进诊断、治疗、法医和微生物学应用。但这些技术面临着高维、无序和低样本光谱数据的挑战。在此,我们通过重组拉曼光谱和主成分,引入了新颖的二维图像式双信号和成分聚合表示法,从而利用光谱 DL 增强细胞表型和特征识别。新的 ConvNet 模型 DSCARNets 在六个基准数据集上的表现明显优于最先进的(SOTA)ML 和 DL 模型,与 SOTA 85%-97% 的准确率相比,大多提高了>2%。在另外四个数据集上,DSCARNets 的表现也很出色,与 SOTA 模型的极高表现(>98%)相比,DSCARNets 的表现更胜一筹。可解释的 DSCARNets 识别出的拉曼特征与实验指标一致。
Raman spectroscopic deep learning with signal aggregated representations for enhanced cell phenotype and signature identification
Feature representation is critical for data learning, particularly in learning spectroscopic data. Machine learning (ML) and deep learning (DL) models learn Raman spectra for rapid, non-destructive, and label-free cell phenotype identification, which facilitate diagnostic, therapeutic, forensic, and microbiological applications. But these are challenged by high-dimensional, unordered and low-sample spectroscopic data. Here we introduced novel 2D image-like dual signal and component aggregated representations by restructuring Raman spectra and principal components, which enables spectroscopic DL for enhanced cell phenotype and signature identification. New ConvNet models DSCARNets significantly outperformed the state-of-the-art (SOTA) ML and DL models on six benchmark datasets, mostly with >2% improvement over the SOTA performance of 85%-97% accuracies. DSCARNets also performed well on four additional datasets against SOTA models of extremely high performances (>98%) and two datasets without a published supervised phenotype classification model. Explainable DSCARNets identified Raman signatures consistent with experimental indications.