Quantification of Protein Secondary Structures from Discrete Frequency Infrared Images Using Machine Learning.

IF 2.2 3区化学 Q2 INSTRUMENTS & INSTRUMENTATION

Applied Spectroscopy Pub Date : 2025-10-01 Epub Date: 2025-03-31 DOI:10.1177/00037028251325553

Harrison Edmonds, Sudipta S Mukherjee, Brooke Holcombe, Kevin Yeh, Rohit Bhargava, Ayanjeet Ghosh

{"title":"Quantification of Protein Secondary Structures from Discrete Frequency Infrared Images Using Machine Learning.","authors":"Harrison Edmonds, Sudipta S Mukherjee, Brooke Holcombe, Kevin Yeh, Rohit Bhargava, Ayanjeet Ghosh","doi":"10.1177/00037028251325553","DOIUrl":null,"url":null,"abstract":"<p><p>Discrete frequency infrared (IR) imaging is an exciting experimental technique that has shown promise in various applications in biomedical science. This technique often involves acquiring IR absorptive images at specific frequencies of interest that enable pathologically relevant chemical contrast. However, certain applications, such as tracking the spatial variations in protein secondary structure of tissue specimens, necessary for the characterization of neurodegenerative diseases, require deeper analysis of spectral data. In such cases, the conventional analytical approach involves band fitting the hyperspectral data to extract the relative populations of different structures through their fitted areas under the curve (AUC). While Gaussian spectral fitting for one spectrum is viable, expanding that to an image with millions of pixels, as often applicable for tissue specimens, becomes a computationally expensive process. Alternatives like principal component analysis (PCA) are less structurally interpretable and incompatible with sparsely sampled data. Furthermore, this detracts from the key advantages of discrete frequency imaging by necessitating the acquisition of more finely sampled spectral data that is optimal for curve fitting, resulting in significantly longer data acquisition times, larger datasets, and additional computational overhead. In this work, we demonstrate that a simple two-step regressive neural network model can be utilized to mitigate these challenges and employ discrete frequency imaging for retrieving the results from band fitting without significant loss of fidelity. Our model reduces the data acquisition time nearly six-fold by requiring only seven wavenumbers to accurately interpolate spectral information at a higher resolution and subsequently using the upscaled spectra to accurately predict the component AUCs, which is more than 3000 times faster than spectral fitting. Our approach thus drastically cuts down the data acquisition and analysis time and predicts key differences in protein structure that can be vital towards broadening potential applications of discrete frequency imaging.</p>","PeriodicalId":8253,"journal":{"name":"Applied Spectroscopy","volume":" ","pages":"1465-1477"},"PeriodicalIF":2.2000,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12353105/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Spectroscopy","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1177/00037028251325553","RegionNum":3,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/3/31 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"INSTRUMENTS & INSTRUMENTATION","Score":null,"Total":0}

引用次数: 0

Abstract

Discrete frequency infrared (IR) imaging is an exciting experimental technique that has shown promise in various applications in biomedical science. This technique often involves acquiring IR absorptive images at specific frequencies of interest that enable pathologically relevant chemical contrast. However, certain applications, such as tracking the spatial variations in protein secondary structure of tissue specimens, necessary for the characterization of neurodegenerative diseases, require deeper analysis of spectral data. In such cases, the conventional analytical approach involves band fitting the hyperspectral data to extract the relative populations of different structures through their fitted areas under the curve (AUC). While Gaussian spectral fitting for one spectrum is viable, expanding that to an image with millions of pixels, as often applicable for tissue specimens, becomes a computationally expensive process. Alternatives like principal component analysis (PCA) are less structurally interpretable and incompatible with sparsely sampled data. Furthermore, this detracts from the key advantages of discrete frequency imaging by necessitating the acquisition of more finely sampled spectral data that is optimal for curve fitting, resulting in significantly longer data acquisition times, larger datasets, and additional computational overhead. In this work, we demonstrate that a simple two-step regressive neural network model can be utilized to mitigate these challenges and employ discrete frequency imaging for retrieving the results from band fitting without significant loss of fidelity. Our model reduces the data acquisition time nearly six-fold by requiring only seven wavenumbers to accurately interpolate spectral information at a higher resolution and subsequently using the upscaled spectra to accurately predict the component AUCs, which is more than 3000 times faster than spectral fitting. Our approach thus drastically cuts down the data acquisition and analysis time and predicts key differences in protein structure that can be vital towards broadening potential applications of discrete frequency imaging.

查看原文本刊更多论文

利用机器学习从离散频率红外图像中量化蛋白质二级结构。

离散频率红外成像是一项令人兴奋的实验技术，在生物医学科学的各种应用中显示出前景。该技术通常涉及获取特定频率的IR吸收图像，以实现病理相关的化学对比。然而，某些应用，如跟踪组织标本中蛋白质二级结构的空间变化，这是表征神经退行性疾病所必需的，需要对光谱数据进行更深入的分析。在这种情况下，传统的分析方法是对高光谱数据进行波段拟合，通过曲线下拟合面积（AUC）提取不同结构的相对总体。虽然一个光谱的高斯光谱拟合是可行的，但将其扩展到具有数百万像素的图像（通常适用于组织样本）成为计算昂贵的过程。主成分分析（PCA）等替代方法在结构上的可解释性较差，并且与稀疏采样数据不兼容。此外，这削弱了离散频率成像的关键优势，因为需要采集更精细的采样光谱数据，这是曲线拟合的最佳选择，导致数据采集时间显着延长，数据集更大，以及额外的计算开销。在这项工作中，我们证明了一个简单的两步回归神经网络模型可以用来缓解这些挑战，并使用离散频率成像从波段拟合中检索结果，而不会显着损失保真度。我们的模型只需要7个波数就可以以更高的分辨率准确地插值光谱信息，然后使用升级的光谱准确预测成分auc，从而将数据采集时间缩短了近6倍，这比光谱拟合快3000倍以上。因此，我们的方法大大减少了数据采集和分析时间，并预测了蛋白质结构的关键差异，这对于扩大离散频率成像的潜在应用至关重要。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Applied Spectroscopy 工程技术-光谱学

CiteScore

6.60

自引率

5.70%

发文量

139

审稿时长

3.5 months

期刊介绍： Applied Spectroscopy is one of the world''s leading spectroscopy journals, publishing high-quality peer-reviewed articles, both fundamental and applied, covering all aspects of spectroscopy. Established in 1951, the journal is owned by the Society for Applied Spectroscopy and is published monthly. The journal is dedicated to fulfilling the mission of the Society to “…advance and disseminate knowledge and information concerning the art and science of spectroscopy and other allied sciences.”