Ziyang Wang, Jeewan C. Ranasinghe, Wenjing Wu, Dennis C. Y. Chan, Ashley Gomm, Rudolph E. Tanzi, Can Zhang, Nanyin Zhang, Genevera I. Allen, Shengxi Huang
{"title":"使用峰敏感逻辑回归的光学光谱机器学习解释","authors":"Ziyang Wang, Jeewan C. Ranasinghe, Wenjing Wu, Dennis C. Y. Chan, Ashley Gomm, Rudolph E. Tanzi, Can Zhang, Nanyin Zhang, Genevera I. Allen, Shengxi Huang","doi":"10.1021/acsnano.4c16037","DOIUrl":null,"url":null,"abstract":"Optical spectroscopy, a noninvasive molecular sensing technique, offers valuable insights into material characterization, molecule identification, and biosample analysis. Despite the informativeness of high-dimensional optical spectra, their interpretation remains a challenge. Machine learning methods have gained prominence in spectral analyses, efficiently unveiling analyte compositions. However, these methods still face challenges in interpretability, particularly in generating clear feature importance maps that highlight the spectral features specific to each class of data. These limitations arise from feature noise, model complexity, and the lack of optimization for spectroscopy. In this work, we introduce a machine learning algorithm─logistic regression with peak-sensitive elastic-net regularization (PSE-LR)─tailored for spectral analysis. PSE-LR enables classification and interpretability by producing a peak-sensitive feature importance map, achieving an F1-score of 0.93 and a feature sensitivity of 1.0. Its performance is compared with other methods, including k-nearest neighbors (KNN), elastic-net logistic regression (E-LR), support vector machine (SVM), principal component analysis followed by linear discriminant analysis (PCA-LDA), XGBoost, and neural network (NN). Applying PSE-LR to Raman and photoluminescence (PL) spectra, we detected the receptor-binding domain (RBD) of SARS-CoV-2 spike protein in ultralow concentrations, identified neuroprotective solution (NPS) in brain samples, recognized WS<sub>2</sub> monolayer and WSe<sub>2</sub>/WS<sub>2</sub> heterobilayer, analyzed Alzheimer’s disease (AD) brains, and suggested potential disease biomarkers. Our findings demonstrate PSE-LR’s utility in detecting subtle spectral features and generating interpretable feature importance maps. It is beneficial for the spectral characterization of materials, molecules, and biosamples and applicable to other spectroscopic methods. This work also facilitates the development of nanodevices such as nanosensors and miniaturized spectrometers based on nanomaterials.","PeriodicalId":21,"journal":{"name":"ACS Nano","volume":"4 1","pages":""},"PeriodicalIF":15.8000,"publicationDate":"2025-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Machine Learning Interpretation of Optical Spectroscopy Using Peak-Sensitive Logistic Regression\",\"authors\":\"Ziyang Wang, Jeewan C. Ranasinghe, Wenjing Wu, Dennis C. Y. Chan, Ashley Gomm, Rudolph E. Tanzi, Can Zhang, Nanyin Zhang, Genevera I. Allen, Shengxi Huang\",\"doi\":\"10.1021/acsnano.4c16037\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Optical spectroscopy, a noninvasive molecular sensing technique, offers valuable insights into material characterization, molecule identification, and biosample analysis. Despite the informativeness of high-dimensional optical spectra, their interpretation remains a challenge. Machine learning methods have gained prominence in spectral analyses, efficiently unveiling analyte compositions. However, these methods still face challenges in interpretability, particularly in generating clear feature importance maps that highlight the spectral features specific to each class of data. These limitations arise from feature noise, model complexity, and the lack of optimization for spectroscopy. In this work, we introduce a machine learning algorithm─logistic regression with peak-sensitive elastic-net regularization (PSE-LR)─tailored for spectral analysis. PSE-LR enables classification and interpretability by producing a peak-sensitive feature importance map, achieving an F1-score of 0.93 and a feature sensitivity of 1.0. Its performance is compared with other methods, including k-nearest neighbors (KNN), elastic-net logistic regression (E-LR), support vector machine (SVM), principal component analysis followed by linear discriminant analysis (PCA-LDA), XGBoost, and neural network (NN). Applying PSE-LR to Raman and photoluminescence (PL) spectra, we detected the receptor-binding domain (RBD) of SARS-CoV-2 spike protein in ultralow concentrations, identified neuroprotective solution (NPS) in brain samples, recognized WS<sub>2</sub> monolayer and WSe<sub>2</sub>/WS<sub>2</sub> heterobilayer, analyzed Alzheimer’s disease (AD) brains, and suggested potential disease biomarkers. Our findings demonstrate PSE-LR’s utility in detecting subtle spectral features and generating interpretable feature importance maps. It is beneficial for the spectral characterization of materials, molecules, and biosamples and applicable to other spectroscopic methods. This work also facilitates the development of nanodevices such as nanosensors and miniaturized spectrometers based on nanomaterials.\",\"PeriodicalId\":21,\"journal\":{\"name\":\"ACS Nano\",\"volume\":\"4 1\",\"pages\":\"\"},\"PeriodicalIF\":15.8000,\"publicationDate\":\"2025-04-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACS Nano\",\"FirstCategoryId\":\"88\",\"ListUrlMain\":\"https://doi.org/10.1021/acsnano.4c16037\",\"RegionNum\":1,\"RegionCategory\":\"材料科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"CHEMISTRY, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Nano","FirstCategoryId":"88","ListUrlMain":"https://doi.org/10.1021/acsnano.4c16037","RegionNum":1,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
Machine Learning Interpretation of Optical Spectroscopy Using Peak-Sensitive Logistic Regression
Optical spectroscopy, a noninvasive molecular sensing technique, offers valuable insights into material characterization, molecule identification, and biosample analysis. Despite the informativeness of high-dimensional optical spectra, their interpretation remains a challenge. Machine learning methods have gained prominence in spectral analyses, efficiently unveiling analyte compositions. However, these methods still face challenges in interpretability, particularly in generating clear feature importance maps that highlight the spectral features specific to each class of data. These limitations arise from feature noise, model complexity, and the lack of optimization for spectroscopy. In this work, we introduce a machine learning algorithm─logistic regression with peak-sensitive elastic-net regularization (PSE-LR)─tailored for spectral analysis. PSE-LR enables classification and interpretability by producing a peak-sensitive feature importance map, achieving an F1-score of 0.93 and a feature sensitivity of 1.0. Its performance is compared with other methods, including k-nearest neighbors (KNN), elastic-net logistic regression (E-LR), support vector machine (SVM), principal component analysis followed by linear discriminant analysis (PCA-LDA), XGBoost, and neural network (NN). Applying PSE-LR to Raman and photoluminescence (PL) spectra, we detected the receptor-binding domain (RBD) of SARS-CoV-2 spike protein in ultralow concentrations, identified neuroprotective solution (NPS) in brain samples, recognized WS2 monolayer and WSe2/WS2 heterobilayer, analyzed Alzheimer’s disease (AD) brains, and suggested potential disease biomarkers. Our findings demonstrate PSE-LR’s utility in detecting subtle spectral features and generating interpretable feature importance maps. It is beneficial for the spectral characterization of materials, molecules, and biosamples and applicable to other spectroscopic methods. This work also facilitates the development of nanodevices such as nanosensors and miniaturized spectrometers based on nanomaterials.
期刊介绍:
ACS Nano, published monthly, serves as an international forum for comprehensive articles on nanoscience and nanotechnology research at the intersections of chemistry, biology, materials science, physics, and engineering. The journal fosters communication among scientists in these communities, facilitating collaboration, new research opportunities, and advancements through discoveries. ACS Nano covers synthesis, assembly, characterization, theory, and simulation of nanostructures, nanobiotechnology, nanofabrication, methods and tools for nanoscience and nanotechnology, and self- and directed-assembly. Alongside original research articles, it offers thorough reviews, perspectives on cutting-edge research, and discussions envisioning the future of nanoscience and nanotechnology.