A classification method for fluorescence emission spectra of anionic surfactants with few-shot learning

IF 2.5 4区化学 Q4 BIOCHEMISTRY & MOLECULAR BIOLOGY

Journal of Molecular Modeling Pub Date : 2025-07-26 DOI:10.1007/s00894-025-06440-6

Hanyang Ning, Miao Ma, Zhiwei Shi, Liping Ding

{"title":"A classification method for fluorescence emission spectra of anionic surfactants with few-shot learning","authors":"Hanyang Ning, Miao Ma, Zhiwei Shi, Liping Ding","doi":"10.1007/s00894-025-06440-6","DOIUrl":null,"url":null,"abstract":"<div><h3>Context</h3><p>The unregulated use of anionic surfactants poses significant environmental risks, necessitating methods for their rapid and accurate identification. While fluorescence spectroscopy is a powerful tool, its application faces a critical challenge: existing analytical strategies either rely on complex and costly sensor arrays to acquire rich data, or they apply traditional machine learning to simpler, single-spectrum data, which often requires pre-processing steps like PCA that risk information loss. Furthermore, standard deep learning approaches are often unsuitable due to the high cost and effort required to acquire the large datasets they need for training. To address this gap, we propose an end-to-end, few-shot learning method (CNN-PN) for the classification of anionic surfactant fluorescence emission spectra. Our approach leverages a one-dimensional convolutional neural network (1D-CNN) to automatically extract features from the full, raw spectrum, thus avoiding lossy pre-processing. It then employs a prototypical network to perform robust, similarity-based classification, a strategy highly effective for limited sample sizes. We validated our method on our FESS dataset (53 surfactant categories) and a public metal oxides dataset. In our experiments, the CNN-PN method consistently outperformed traditional techniques like LDA, SVM, and KNN. It achieved 76.36% accuracy when trained with only a single sample per class, 95.90% in a multi-sample scenario on our FESS dataset, and 84.86% on the public dataset. This work provides a powerful and data-efficient framework for spectral analysis, facilitating the development of more accessible and rapid fluorescence sensing technologies, particularly for applications where data collection is expensive or constrained.</p><h3>Methods</h3><p>A few-shot learning classification method based on prototypical networks was employed. A one-dimensional convolutional neural network (1D-CNN) was utilized to extract spectral features from the full fluorescence emission spectra. Classification was then performed within the prototypical network framework using Euclidean distance as the similarity metric between features in the learned latent space. The Python programming language and the PyTorch library were used for all model implementations and data analysis.</p></div>","PeriodicalId":651,"journal":{"name":"Journal of Molecular Modeling","volume":"31 8","pages":""},"PeriodicalIF":2.5000,"publicationDate":"2025-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Molecular Modeling","FirstCategoryId":"92","ListUrlMain":"https://link.springer.com/article/10.1007/s00894-025-06440-6","RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Context

The unregulated use of anionic surfactants poses significant environmental risks, necessitating methods for their rapid and accurate identification. While fluorescence spectroscopy is a powerful tool, its application faces a critical challenge: existing analytical strategies either rely on complex and costly sensor arrays to acquire rich data, or they apply traditional machine learning to simpler, single-spectrum data, which often requires pre-processing steps like PCA that risk information loss. Furthermore, standard deep learning approaches are often unsuitable due to the high cost and effort required to acquire the large datasets they need for training. To address this gap, we propose an end-to-end, few-shot learning method (CNN-PN) for the classification of anionic surfactant fluorescence emission spectra. Our approach leverages a one-dimensional convolutional neural network (1D-CNN) to automatically extract features from the full, raw spectrum, thus avoiding lossy pre-processing. It then employs a prototypical network to perform robust, similarity-based classification, a strategy highly effective for limited sample sizes. We validated our method on our FESS dataset (53 surfactant categories) and a public metal oxides dataset. In our experiments, the CNN-PN method consistently outperformed traditional techniques like LDA, SVM, and KNN. It achieved 76.36% accuracy when trained with only a single sample per class, 95.90% in a multi-sample scenario on our FESS dataset, and 84.86% on the public dataset. This work provides a powerful and data-efficient framework for spectral analysis, facilitating the development of more accessible and rapid fluorescence sensing technologies, particularly for applications where data collection is expensive or constrained.

Methods

A few-shot learning classification method based on prototypical networks was employed. A one-dimensional convolutional neural network (1D-CNN) was utilized to extract spectral features from the full fluorescence emission spectra. Classification was then performed within the prototypical network framework using Euclidean distance as the similarity metric between features in the learned latent space. The Python programming language and the PyTorch library were used for all model implementations and data analysis.

查看原文本刊更多论文

阴离子表面活性剂荧光发射光谱的少次学习分类方法。

背景：阴离子表面活性剂的无管制使用带来了重大的环境风险，需要快速准确地识别它们的方法。虽然荧光光谱是一种强大的工具，但它的应用面临着一个关键的挑战：现有的分析策略要么依赖于复杂而昂贵的传感器阵列来获取丰富的数据，要么将传统的机器学习应用于更简单的单光谱数据，这通常需要像PCA这样的预处理步骤，这可能会导致信息丢失。此外，标准的深度学习方法往往不适合，因为获取训练所需的大型数据集需要高成本和努力。为了解决这一差距，我们提出了一种端到端、少镜头学习方法（CNN-PN）用于阴离子表面活性剂荧光发射光谱的分类。我们的方法利用一维卷积神经网络（1D-CNN）从完整的原始频谱中自动提取特征，从而避免了有损的预处理。然后，它采用一个原型网络来执行鲁棒的、基于相似性的分类，这是一种对有限样本量非常有效的策略。我们在FESS数据集（53种表面活性剂类别）和公共金属氧化物数据集上验证了我们的方法。在我们的实验中，CNN-PN方法始终优于LDA、SVM和KNN等传统技术。当每个类只训练一个样本时，它的准确率达到76.36%，在FESS数据集的多样本场景下达到95.90%，在公共数据集上达到84.86%。这项工作为光谱分析提供了一个强大且数据高效的框架，促进了更容易获得和快速荧光传感技术的发展，特别是对于数据收集昂贵或受限的应用。方法：采用基于原型网络的少次学习分类方法。利用一维卷积神经网络（1D-CNN）从全荧光发射光谱中提取光谱特征。然后在原型网络框架内使用欧氏距离作为学习到的潜在空间中特征之间的相似度度量进行分类。Python编程语言和PyTorch库用于所有模型实现和数据分析。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Molecular Modeling 化学-化学综合

CiteScore

3.50

自引率

4.50%

发文量

362

审稿时长

2.9 months

期刊介绍： The Journal of Molecular Modeling focuses on "hardcore" modeling, publishing high-quality research and reports. Founded in 1995 as a purely electronic journal, it has adapted its format to include a full-color print edition, and adjusted its aims and scope fit the fast-changing field of molecular modeling, with a particular focus on three-dimensional modeling. Today, the journal covers all aspects of molecular modeling including life science modeling; materials modeling; new methods; and computational chemistry. Topics include computer-aided molecular design; rational drug design, de novo ligand design, receptor modeling and docking; cheminformatics, data analysis, visualization and mining; computational medicinal chemistry; homology modeling; simulation of peptides, DNA and other biopolymers; quantitative structure-activity relationships (QSAR) and ADME-modeling; modeling of biological reaction mechanisms; and combined experimental and computational studies in which calculations play a major role.