Combinatorial Order Pre-processing Search (COPS): A new pre-processing strategy for large-scale interpretable data analysis in process analytical technologies
IF 3.9 2区 工程技术Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS
Wilson Cardoso , Jussara V. Roque , Jeroen J. Jansen , Sin Yong Teng , Reinaldo F. Teófilo
{"title":"Combinatorial Order Pre-processing Search (COPS): A new pre-processing strategy for large-scale interpretable data analysis in process analytical technologies","authors":"Wilson Cardoso , Jussara V. Roque , Jeroen J. Jansen , Sin Yong Teng , Reinaldo F. Teófilo","doi":"10.1016/j.compchemeng.2024.108892","DOIUrl":null,"url":null,"abstract":"<div><div>Combinatorial Order Pre-processing Search (COPS), a novel approach for optimizing data pre-processing is proposed in this work. Unlike simultaneous hyperparameter optimization, COPS employs <em>a priori</em> optimization to reduce computational time while refining the search space for preprocessing sequences and combinations. It allows for setting a maximum number of pre-processing methods, while efficiently searching through combinations of methods with chemically relevant knowledge. In this work, 67 calibration datasets across various analytical techniques, including fluorescence spectroscopy, gas chromatography (GC), near-infrared spectroscopy (NIR), mid-infrared spectroscopy (MIR), visible-near-infrared spectroscopy (Vis-NIR), Raman spectroscopy, nuclear magnetic resonance (NMR) spectroscopy, and voltammetry were evaluated. COPS yielded significant improvements over existing methodologies based on design of experiment and compounded pre-processing approaches. The COPS outperformed the other methods, resulting in an average root mean square error of prediction (RMSE<sub>P</sub>) reduction of 31.7%, while also reduced the complexity (number of latent variables) of the model which allows for easier interpretation. This underscores the importance of combinatorial order set theory for the search of pre-processing method combinations (without fixing the sequence of pre-processing methods) to enhance model performance and interpretation. The novel COPS approach can be employed in process analytical technology (such as inline, online or at-line chemical sensing analytics) to enhance predictive accuracy and operational efficiency, fundamentally transforming the quality and reliability of chemical process monitoring and control.</div></div>","PeriodicalId":286,"journal":{"name":"Computers & Chemical Engineering","volume":"192 ","pages":"Article 108892"},"PeriodicalIF":3.9000,"publicationDate":"2024-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Chemical Engineering","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0098135424003107","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
Combinatorial Order Pre-processing Search (COPS), a novel approach for optimizing data pre-processing is proposed in this work. Unlike simultaneous hyperparameter optimization, COPS employs a priori optimization to reduce computational time while refining the search space for preprocessing sequences and combinations. It allows for setting a maximum number of pre-processing methods, while efficiently searching through combinations of methods with chemically relevant knowledge. In this work, 67 calibration datasets across various analytical techniques, including fluorescence spectroscopy, gas chromatography (GC), near-infrared spectroscopy (NIR), mid-infrared spectroscopy (MIR), visible-near-infrared spectroscopy (Vis-NIR), Raman spectroscopy, nuclear magnetic resonance (NMR) spectroscopy, and voltammetry were evaluated. COPS yielded significant improvements over existing methodologies based on design of experiment and compounded pre-processing approaches. The COPS outperformed the other methods, resulting in an average root mean square error of prediction (RMSEP) reduction of 31.7%, while also reduced the complexity (number of latent variables) of the model which allows for easier interpretation. This underscores the importance of combinatorial order set theory for the search of pre-processing method combinations (without fixing the sequence of pre-processing methods) to enhance model performance and interpretation. The novel COPS approach can be employed in process analytical technology (such as inline, online or at-line chemical sensing analytics) to enhance predictive accuracy and operational efficiency, fundamentally transforming the quality and reliability of chemical process monitoring and control.
期刊介绍:
Computers & Chemical Engineering is primarily a journal of record for new developments in the application of computing and systems technology to chemical engineering problems.