{"title":"Toward more efficient and effective color quality control for the large-scale offset printing process","authors":"Pawel Dziki, Lukasz Pieszczek, Michal Daszykowski","doi":"10.1002/cem.3543","DOIUrl":"10.1002/cem.3543","url":null,"abstract":"<p>This study illustrates at-line application of hyperspectral imaging in the visible range for quality control of large-scale offset printing. In particular, the measurement stability of a competitive device is assessed and compared to traditional handheld and desktop spectrophotometers. The performance of the commercially available instruments was assessed based on collected spectra and their corresponding L*, a*, and b* values. The printing process was described by hyperspectral images (in visible range) of selected regions from template color fields acquired at 17 sampling occasions. Spectra constituting hyperspectral images were visualized and evaluated in the space of significant principal components obtained from the principal component analysis. Furthermore, confidence ellipses were constructed for each set of spectra characterizing a specific moment of the printing process. Comparing their mutual locations, shapes, orientations, and sizes enabled effective visualization of process variability and was more comprehensive regarding the classic approach based on information provided by desktop and handheld spectrometers.</p>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 4","pages":""},"PeriodicalIF":2.4,"publicationDate":"2024-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140154995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
B. Borkovits, E. Kontsek, A. Pesti, P. Gordon, S. Gergely, I. Csabai, A. Kiss, P. Pollner
{"title":"Classification of colorectal primer carcinoma from normal colon with mid-infrared spectra","authors":"B. Borkovits, E. Kontsek, A. Pesti, P. Gordon, S. Gergely, I. Csabai, A. Kiss, P. Pollner","doi":"10.1002/cem.3542","DOIUrl":"10.1002/cem.3542","url":null,"abstract":"<p>In this project, we used formalin-fixed paraffin-embedded (FFPE) tissue samples to measure thousands of spectra per tissue core with Fourier transform mid-infrared spectroscopy using an FT-IR imaging system. These cores varied between normal colon (NC) and colorectal primer carcinoma (CRC) tissues. We created a database to manage all the multivariate data obtained from the measurements. Then, we applied classifier algorithms to identify the tissue based on its yielded spectra. For classification, we used the random forest, a support vector machine, XGBoost, and linear discriminant analysis methods, as well as three deep neural networks. We compared two data manipulation techniques using these models and then applied filtering. In the end, we compared model performances via the sum of ranking differences (SRD).</p>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 7","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cem.3542","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140126719","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Developing multifruit global near-infrared model to predict dry matter based on just-in-time modeling","authors":"Puneet Mishra","doi":"10.1002/cem.3540","DOIUrl":"10.1002/cem.3540","url":null,"abstract":"<p>Modeling near-infrared (NIR) spectral data to predict fresh fruit properties is a challenging task. The difficulty lies in creating generalized models that can work on fruits of different cultivars, seasons, and even multiple commodities of fruit. Due to intrinsic differences in spectral properties, NIR models often fail in testing, resulting in high bias and prediction errors. One current solution for achieving generalized models is to use large calibration sets measured over multiple cultivars and harvest seasons. However, current practice primarily focuses on calibration sets for single fruit commodities, disregarding the rich information available from other fruit commodities. This study aims to demonstrate the potential of locally weighted partial least-squares an example of just-in-time (JIT) modeling to develop real-time models based on calibration sets consisting of multiple fruit commodities. The study also explores JIT modeling for leveraging relevant information from other fruit commodities or adapting the model based on new samples. The application demonstrated here predicts the dry matter in fresh fruit using portable NIR spectroscopy. The results show that JIT modeling is particularly effective for multiple fruit commodities in a single calibration set. The JIT models achieved a root mean squared error of prediction (RMSEP) of 0.69% fresh weight (FW), while the traditional partial least squares (PLS) modeling RMSEP was 0.93% FW. JIT modeling can be particularly beneficial when the user has multiple fruit datasets and wants to combine them into a single dataset to utilize all the relevant information available.</p>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 4","pages":""},"PeriodicalIF":2.4,"publicationDate":"2024-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cem.3540","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140043948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optimizing air quality predictions: A discrete wavelet transform and long short-term memory approach with wavelet-type selection for hourly PM10 concentrations","authors":"Gökçe Nur Taşağıl Arslan, Serpil Kılıç Depren","doi":"10.1002/cem.3539","DOIUrl":"10.1002/cem.3539","url":null,"abstract":"<p>The rapid advancement of industrialization and urbanization has led to the global problem of air pollution. Air quality can decrease due to pollutants in the air, including types of gases and particles that are carcinogenic, causing adverse health effects. Therefore, estimating the concentration of air pollutants is of great interest as it can provide accurate information about air quality with proper planning of future activities. In this manner, this study considers Istanbul, a province with a high concentration of industry, population, and vehicle traffic. Particulate matter (PM), one of the most basic air pollutants, is stated to contain microscopic solids or liquid droplets that are small enough to be inhaled and cause serious health problems. Thus, it is recommended to apply discrete wavelet transform (DWT) and deep learning method long short-term memory (LSTM) as a hybrid model to predict the concentration of PM<sub>10</sub>. Using the mentioned methods, they can predict air pollution to have been developed within the scope of this study. Furthermore, the hybrid approach with LSTM by selecting the most appropriate discrete wavelet type emphasizes the difference of this study from the existing literature. The ability of these developed methods to make successful future predictions helps institutions and organizations that can take precautions on the subject to take action at the right time; in addition, the deep learning methods used contribute to the development of sustainable smart environmental systems. In today's environment when air pollution is increasing and threatening human health, any precaution that can be taken would improve the quality of life for all living things, reduce health issues and deaths caused by air pollution, and thus raise the degree of well-being. These findings might offer a reliable scientific evidence for Istanbul City's air pollution management, which can serve as an example for other regions.</p>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 4","pages":""},"PeriodicalIF":2.4,"publicationDate":"2024-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140044044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jing Wang, Yi Liu, Dongping Zhang, Lei Xie, Jiusun Zeng
{"title":"Structured discriminative Gaussian graph learning for multimode process monitoring","authors":"Jing Wang, Yi Liu, Dongping Zhang, Lei Xie, Jiusun Zeng","doi":"10.1002/cem.3538","DOIUrl":"10.1002/cem.3538","url":null,"abstract":"<p>Aiming at the actual industrial process background that different modes share the same system configurations and control structure, this article proposes a novel structured discriminant Gaussian graph learning for multimode process monitoring. The proposed method considers not only the sparsity of graph model but also the measurement of data variation based on a mismatched graph and the common node support between different graphical structures. The objective function involves two sets of regularization terms: the trace terms for mismatched measurements and the \u0000<math>\u0000 <msub>\u0000 <mrow>\u0000 <mi>ℓ</mi>\u0000 </mrow>\u0000 <mrow>\u0000 <mn>2,1</mn>\u0000 </mrow>\u0000 </msub></math>-norm imposed on the union of decomposed graph matrices. Due to the introduced mismatched trace terms, the cost of matching the data points and graph models that have inconsistent class labels can be expanded, which brings more discrimination for the graph-based mode identification. While the common structure extracted by the \u0000<math>\u0000 <msub>\u0000 <mrow>\u0000 <mi>ℓ</mi>\u0000 </mrow>\u0000 <mrow>\u0000 <mn>2,1</mn>\u0000 </mrow>\u0000 </msub></math>-norm forces the estimated graph models to have structural similarities, thus alleviating the negative influence caused by graph discrimination. Once a relatively accurate and discriminative reference graph model is obtained, the downstream test graph learning and analysis can be conducted online by employing the moving window techniques. By comparing the matched and mismatched graph-based measurements, the process mode can be identified correctly and stably. To grasp the abnormal process changes, the \u0000<math>\u0000 <msub>\u0000 <mrow>\u0000 <mi>ℓ</mi>\u0000 </mrow>\u0000 <mrow>\u0000 <mn>2,1</mn>\u0000 </mrow>\u0000 </msub></math>-norm for the row sparsity is again applied to the graph difference matrices, the sensitive monitoring statistics and the fault isolation results can be obtained effectively. All the optimization problems in this paper can be solved using the alternating direction multiplier (ADMM) algorithm. The effectiveness of our proposed approach is illustrated by the application to a real blast furnace iron-making production process.</p>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 3","pages":""},"PeriodicalIF":2.4,"publicationDate":"2024-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140032884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Andreas Baum, Rayisa Moiseyenko, Simon Glanville, Thomas Martini Jørgensen
{"title":"Image-based characterization of flocculation processes through PLS inspired representation learning in convolutional neural networks","authors":"Andreas Baum, Rayisa Moiseyenko, Simon Glanville, Thomas Martini Jørgensen","doi":"10.1002/cem.3534","DOIUrl":"10.1002/cem.3534","url":null,"abstract":"<p>Monitoring of flocculation processes such as those used in downstream processing of a fermentation broth is essential for process control. One approach is to apply microscopic imaging combined with image analysis for characterizing the state of the process. In this work, we investigate and compare the use of supervised feedforward convolutional neural network (CNN) architectures to predict the process states from the image information and compare the results with the traditional alternative of characterizing flocs based on manually engineered image features guided by human expertise. From a well-defined image data set representing six process states, the objective is to establish end-to-end classification models which are accurate but at the same time learn meaningful latent variable space representations. Specifically, we evaluate three different CNN architectures with varying degrees of regularization and compare results with logistic regression models based on inputs from two different traditional feature engineering methods. By applying global average pooling as a structural regularizer to the CNN architecture, we significantly improve the generalization performance in comparison with the classification accuracies of the traditional feature engineered models. Furthermore, we show that by imposing a projection to latent structures (PLS) like regularization framework onto the CNN, it can also learn a latent variable representation that mimics the features selected by human expertise.</p>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 6","pages":""},"PeriodicalIF":2.4,"publicationDate":"2024-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cem.3534","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139952965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Juliana Fontes, Michel J. Anzanello, João B. G. Brito, Guilherme B. Bucco
{"title":"A novel two-phase near-infrared and midinfrared wavelength selection framework for sample classification","authors":"Juliana Fontes, Michel J. Anzanello, João B. G. Brito, Guilherme B. Bucco","doi":"10.1002/cem.3536","DOIUrl":"10.1002/cem.3536","url":null,"abstract":"<p>Spectral data describing product samples are typically composed of a large number of noisy and irrelevant wavelengths that tends to undermine the performance of multivariate predictive techniques. This paper proposes a two-phase framework that integrates a preselection wavelength step oriented by wavelength clustering to a wrapper-based strategy. The first phase performs a pruning process in the data that removes the less informative wavelengths relying on the spectral clustering, a technique deemed suitable to the Fourier transform infrared (FTIR) spectroscopy and near-infrared (NIR) spectroscopy data at hand. The preselected wavelengths undergo a second phase of selection efforts based on the combination of different wavelength importance indices (i.e., Bhattacharyya distance, Chi-square, ReliefF, and Gini) and classification techniques (i.e., support vector machine, <i>k</i>-nearest neighbors, and random forest). When applied to 11 FTIR datasets from different domains, the recommended combination of importance index and classifier increased the average accuracy by 6.37% (from 0.863 to 0.918), while retaining average 3.84% of the original spectra. The framework also improved the selection process regarding computational time.</p>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 3","pages":""},"PeriodicalIF":2.4,"publicationDate":"2024-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139959463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhen-Zhen Li, Niu Huang, Lun-Zhao Yi, Guang-Hui Fu
{"title":"Affine combination-based over-sampling for imbalanced regression","authors":"Zhen-Zhen Li, Niu Huang, Lun-Zhao Yi, Guang-Hui Fu","doi":"10.1002/cem.3537","DOIUrl":"10.1002/cem.3537","url":null,"abstract":"<p>Imbalanced domain prediction analysis is currently one of the hot research topics. Many real-world data mining analyses involve using imbalanced data to obtain predictive models. In the context of imbalance, research on classification problems has been extensive, but research on regression problems is negligible. Rare values rarely occur in imbalanced regression problems, but the focus is on accurately predicting the continuous target variables of rare instances. One of the challenges in imbalanced regression is finding a suitable strategy to rebalance the original dataset in order to improve the predictive performance of the model in rare instances. In this study, two algorithms are proposed: sigma nearest over-sampling based on convex combination for regression (SNOCCR) and affine combination-based over-sampling (ACOS). ACOS rebalances the original dataset by generating new instances through the affine combinations of the original examples. The region where the new instances are generated can be adjusted based on the distribution of the data, ensuring that the generated cases better mimic the distribution of the original examples. The comparison among ACOS, SNOCCR, and other preprocessing methods was conducted on 15 datasets to validate the predictive performance of models trained on rebalanced datasets for rare instances. The experimental results indicate that ACOS outperforms other existing methods.</p>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 3","pages":""},"PeriodicalIF":2.4,"publicationDate":"2024-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139771515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Two new methods for the estimation and interpretation of the range of feasible profiles in multivariate curve resolution and their implications to analytical chemistry","authors":"Alejandro C. Olivieri","doi":"10.1002/cem.3535","DOIUrl":"10.1002/cem.3535","url":null,"abstract":"<p>Two new models have been recently introduced for studying the remaining rotational ambiguity in the bilinear decomposition of matrix data. One of the models is N-BANDS, which yields two extreme profiles per sample component, corresponding to maximum or minimum signal contribution function or relative component area under its concentration profile. It is highly useful for computing the relative root mean square error due to rotational ambiguity in estimated analyte concentrations (RMSE<sub>RA</sub>), which numerically quantifies the impact of the phenomenon in terms of prediction uncertainty. Since N-BANDS successfully consider the presence of instrumental noise in the data, it is extremely useful for the analysis of real data sets. The other model is SW-N-BANDS, which is similar to N-BANDS, but is applied in a sensor wise manner, that is, computing the maximum and minimum intensity value at each sensor. It provides the boundaries of the full set of feasible profiles, and helps to better understand the behavior of a given component under the application of several constraints. Both models are described in light of both simulations and experimental data, illustrating their main characteristics of importance to analytical chemistry studies.</p>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 6","pages":""},"PeriodicalIF":2.4,"publicationDate":"2024-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139648433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nina Tomčić, Milica Jankov, Petar Ristivojević, Jelena Trifković, Filip Andrić
{"title":"Assessment of adulteration of sage (Salvia sp.) with olive leaves using high-performance thin-layer chromatography, image analysis, and multivariate linear modeling","authors":"Nina Tomčić, Milica Jankov, Petar Ristivojević, Jelena Trifković, Filip Andrić","doi":"10.1002/cem.3533","DOIUrl":"10.1002/cem.3533","url":null,"abstract":"<p>According to the study carried out at the University of Bristol, 60% of oregano spices present on the European Union (EU) market are adulterated with olive, myrtle, cistus, and hazelnut leaves. According to the same authors, the sage products are adulterated by similar bulking agents. The aim of this study was to assess possibilities for detection of sage adulteration by olive leaves using high-performance thin-layer chromatography (HPTLC) coupled with digital image analysis and multivariate linear regression/classification (partial least squares and partial least squares discriminant analysis). Twenty-four samples (4 pure sage leaves, 4 pure olive leaves, and 16 mixtures of olive and sage leaves with content of added olive leaves varying in 5%, 10%, 20%, and 50%) have been prepared, extracted, and analyzed under normal-phase conditions. Several derivatization methods were tested, and derivatized HPTLC plates were inspected under visible or ultraviolet light. Digital images of chromatograms were recorded. In order to minimize effects of intraplate and interplate peak shifts, background changes, and baseline drifts, correlation-optimized warping, standard normal variate, and mean centering were applied to acquired signals. Partial least squares and partial least squares discriminant analysis models with moderate complexity (two to four latent variables) based on chromatographic signals obtained after derivatization by FeCl<sub>3</sub>, anisaldehyde–sulfuric acid, and 2,2-diphenyl-1-picrylhydrazyl demonstrated good statistical performances with <i>R</i><sup>2</sup> ranging 0.894–0.998 and relative prediction error of 4–12%. Misclassification error <4% was obtained in the case of 2,2-diphenyl-1-picrylhydrazyl and anisaldehyde–sulfuric acid derivatization. Therefore, HPTLC combined with multivariate image analysis, signal processing, and linear modeling proved to be promising, cost-effective chromatographic tool for assessment of sage adulteration by olive leaves.</p>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 6","pages":""},"PeriodicalIF":2.4,"publicationDate":"2024-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139516211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}