Simple methods for uncertainty estimation in neural networks applied to spectral data processing: A case study on mango dry matter prediction

IF 3.8 2区化学 Q2 AUTOMATION & CONTROL SYSTEMS

Chemometrics and Intelligent Laboratory Systems Pub Date : 2025-09-16 DOI:10.1016/j.chemolab.2025.105532

Metz Maxime , Khadija Lamdibih , Jean-Michel Roger , David Esteve , Ryad Bendoula , Florent Abdelghafour

{"title":"Simple methods for uncertainty estimation in neural networks applied to spectral data processing: A case study on mango dry matter prediction","authors":"Metz Maxime , Khadija Lamdibih , Jean-Michel Roger , David Esteve , Ryad Bendoula , Florent Abdelghafour","doi":"10.1016/j.chemolab.2025.105532","DOIUrl":null,"url":null,"abstract":"<div><div>The growing complexity of real-world chemometric applications, particularly in spectroscopy, has exposed the limitations of traditional linear models in capturing non-linear patterns in spectral data. Deep learning models offer a powerful alternative but remain underutilised in chemometrics due to concerns about interpretability and trust, particularly in high-risk applications where uncertainty estimation is critical. This study investigates and compares three uncertainty estimation techniques suitable for neural networks: Monte Carlo Dropout (MC dropout), model averaging, and Stochastic Weight Averaging-Gaussian (SWAG). These methods are evaluated using a spectral deep learning architecture. The analysis focuses on identifying key hyper-parameters affecting both predictive performance and uncertainty calibration. Results show that while MC Dropout offers a good balance between accuracy and uncertainty estimation at low computational cost, model averaging provides robust performance but at the expense of greater training time and storage. SWAG emerges as a middle-ground method requiring careful tuning. Importantly, a trade-off between predictive accuracy and uncertainty calibration is observed, underscoring the need to consider uncertainty as an integral part of model evaluation. These findings highlight the relevance of deep learning uncertainty estimation in chemometrics and open new directions for optimising data acquisition, model calibration, and model selection based on both prediction confidence and performance.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"267 ","pages":"Article 105532"},"PeriodicalIF":3.8000,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Chemometrics and Intelligent Laboratory Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169743925002175","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

The growing complexity of real-world chemometric applications, particularly in spectroscopy, has exposed the limitations of traditional linear models in capturing non-linear patterns in spectral data. Deep learning models offer a powerful alternative but remain underutilised in chemometrics due to concerns about interpretability and trust, particularly in high-risk applications where uncertainty estimation is critical. This study investigates and compares three uncertainty estimation techniques suitable for neural networks: Monte Carlo Dropout (MC dropout), model averaging, and Stochastic Weight Averaging-Gaussian (SWAG). These methods are evaluated using a spectral deep learning architecture. The analysis focuses on identifying key hyper-parameters affecting both predictive performance and uncertainty calibration. Results show that while MC Dropout offers a good balance between accuracy and uncertainty estimation at low computational cost, model averaging provides robust performance but at the expense of greater training time and storage. SWAG emerges as a middle-ground method requiring careful tuning. Importantly, a trade-off between predictive accuracy and uncertainty calibration is observed, underscoring the need to consider uncertainty as an integral part of model evaluation. These findings highlight the relevance of deep learning uncertainty estimation in chemometrics and open new directions for optimising data acquisition, model calibration, and model selection based on both prediction confidence and performance.

查看原文本刊更多论文

光谱数据处理中神经网络不确定性估计的简单方法——以芒果干物质预测为例

现实世界中化学计量学应用的日益复杂，特别是在光谱学中，暴露了传统线性模型在捕获光谱数据中的非线性模式方面的局限性。深度学习模型提供了一个强大的替代方案，但由于对可解释性和信任度的担忧，特别是在不确定性评估至关重要的高风险应用中，深度学习模型在化学计量学中仍未得到充分利用。本文研究并比较了三种适用于神经网络的不确定性估计技术：蒙特卡罗Dropout （MC Dropout）、模型平均和随机加权平均高斯（SWAG）。这些方法使用光谱深度学习架构进行评估。分析的重点是识别影响预测性能和不确定度校准的关键超参数。结果表明，虽然MC Dropout在较低的计算成本下提供了准确性和不确定性估计之间的良好平衡，但模型平均提供了鲁棒性性能，但代价是更多的训练时间和存储。SWAG是一种需要仔细调整的中间方法。重要的是，预测精度和不确定度校准之间的权衡被观察到，强调需要考虑不确定度作为模型评估的一个组成部分。这些发现突出了深度学习不确定性估计在化学计量学中的相关性，并为优化数据采集、模型校准和基于预测置信度和性能的模型选择开辟了新的方向。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Chemometrics and Intelligent Laboratory Systems 工程技术-分析化学

CiteScore

7.50

自引率

7.70%

发文量

169

审稿时长

3.4 months

期刊介绍： Chemometrics and Intelligent Laboratory Systems publishes original research papers, short communications, reviews, tutorials and Original Software Publications reporting on development of novel statistical, mathematical, or computer techniques in Chemistry and related disciplines. Chemometrics is the chemical discipline that uses mathematical and statistical methods to design or select optimal procedures and experiments, and to provide maximum chemical information by analysing chemical data. The journal deals with the following topics: 1) Development of new statistical, mathematical and chemometrical methods for Chemistry and related fields (Environmental Chemistry, Biochemistry, Toxicology, System Biology, -Omics, etc.) 2) Novel applications of chemometrics to all branches of Chemistry and related fields (typical domains of interest are: process data analysis, experimental design, data mining, signal processing, supervised modelling, decision making, robust statistics, mixture analysis, multivariate calibration etc.) Routine applications of established chemometrical techniques will not be considered. 3) Development of new software that provides novel tools or truly advances the use of chemometrical methods. 4) Well characterized data sets to test performance for the new methods and software. The journal complies with International Committee of Medical Journal Editors'' Uniform requirements for manuscripts.