Ying Tian, Jian Shen, Ao Wang, Zeqiu Li, Xiuhui Huang
{"title":"Data Augmentation and Fault Diagnosis for Imbalanced Industrial Process Data Based on Residual Wasserstein Generative Adversarial Network With Gradient Penalty","authors":"Ying Tian, Jian Shen, Ao Wang, Zeqiu Li, Xiuhui Huang","doi":"10.1002/cem.3624","DOIUrl":"https://doi.org/10.1002/cem.3624","url":null,"abstract":"<div>\u0000 \u0000 <p>In practical industrial applications, equipment usually operates normally and failures are relatively rare, resulting in serious imbalances in the collected data. This imbalance leads to issues such as overfitting, instability, and poor robustness, significantly reducing the accuracy and stability of fault diagnosis system. To address these challenges, this research proposes a method for imbalanced data augmentation and industrial process fault diagnosis based on improved Generative Adversarial Network (GAN). The method adopts Wasserstein distance with gradient penalty and integrates residual connections into the architecture of the generator. This innovation not only helps improve gradient transfer in the generator, but also significantly enhances the data generation capabilities of the generative model through improving the stability of training. Limited industrial process data is used by a generative model to produce synthetic samples with high similarity and diversity. These high-quality samples improve fault diagnosis by enriching the imbalanced dataset. Experimental results on two industrial datasets confirm the method's effectiveness in enhancing fault diagnosis performance with limited data.</p>\u0000 </div>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 12","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142860508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Past, Present and Future of Research in Analytical Figures of Merit","authors":"Alejandro Olivieri","doi":"10.1002/cem.3616","DOIUrl":"https://doi.org/10.1002/cem.3616","url":null,"abstract":"","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 11","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cem.3616","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142665740","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Peiyuan Li, Tao Shen, Shaobing Yang, Zhitian Zuo, Yuanzhong Wang, Qiang Hu
{"title":"Characterization of Chemical Information and Content Prediction of Dendrobium officinale Based on ATR-FTIR","authors":"Peiyuan Li, Tao Shen, Shaobing Yang, Zhitian Zuo, Yuanzhong Wang, Qiang Hu","doi":"10.1002/cem.3626","DOIUrl":"https://doi.org/10.1002/cem.3626","url":null,"abstract":"<div>\u0000 \u0000 <p><i>Dendrobium officinale</i> is a medicinal and food plant with high commercial and medicinal value. Yunnan is known as China's “plant kingdom,” and although the climatic conditions are favorable, the large vertical climatic differences have led to a large difference in the quality of dendrobium from different origins. The analysis of quality differences between several origins with large ecological advantages has not been reported yet. Therefore, the aim of this study is to compare these regions in terms of both morphology and chemical composition and to analyze the variation of their chemical composition in spectral information. The PLS-DA, SVM, and PLSR models were developed to qualitatively and quantitatively evaluate <i>Dendrobium</i> from different production areas. The results show that the Menghai production area was superior to other production areas in terms of phenotypic morphology, quality, and yield. Within the appropriate range, the higher the specific absorbance, the higher the quercetin content.</p>\u0000 </div>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 12","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142860198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Three-Way Data Reduction Based on Essential Information","authors":"Raffaele Vitale, Azar Azizi, Mahdiyeh Ghaffari, Nematollah Omidikia, Cyril Ruckebusch","doi":"10.1002/cem.3617","DOIUrl":"https://doi.org/10.1002/cem.3617","url":null,"abstract":"<p>In this article, the idea of essential information-based compression is extended to trilinear datasets. This basically boils down to identifying and labelling the essential rows (ERs), columns (ECs) and tubes (ETs) of such three-dimensional datasets that allow by themselves to reconstruct in a linear way the entire space of the original measurements. ERs, ECs and ETs can be determined by exploiting convex geometry computational approaches such as convex hull or convex polytope estimations and can be used to generate a reduced version of the data at hand. These compressed data and their uncompressed counterpart share the same multilinear properties and their factorisation (carried out by means of, for example, parallel factor analysis–alternating least squares [PARAFAC-ALS]) yield, in principle, indistinguishable results. More in detail, an algorithm for the assessment and extraction of the essential information encoded in trilinear data structures is here proposed. Its performance was evaluated in both real-world and simulated scenarios which permitted to highlight the benefits that this novel data reduction strategy can bring in domains like multiway fluorescence spectroscopy and imaging.</p>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 12","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cem.3617","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142860385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Liwei Feng, Shaofeng Guo, Yifei Wu, Yu Xing, Yuan Li
{"title":"Fault Detection Strategy of Partial Least Squares Based on Temporal Neighborhood Difference","authors":"Liwei Feng, Shaofeng Guo, Yifei Wu, Yu Xing, Yuan Li","doi":"10.1002/cem.3621","DOIUrl":"https://doi.org/10.1002/cem.3621","url":null,"abstract":"<div>\u0000 \u0000 <p>Aiming at the difficulty of detecting time-lag faults in dynamic processes, a fault detection strategy based on time neighborhood difference (TND) is proposed, and it is introduced into the partial least squares (PLS) method to suggest the PLS-TND fault detection method. The TND method takes the mean to the multibatch training set to obtain a baseline training set, and it constructs the mean squared Euclidean distance (MSED) statistic by calculating the average distance between the sample's first <i>k</i>-moments neighborhood samples and samples at the same moment in the baseline training set. The TND method can help the PLS method to effectively detect time-lag faults and significantly improve the fault detection capability of PLS by measuring the overall positional difference between the temporal neighborhood sample set of the sample and its temporal neighborhood sample set in the baseline training set. The PLS-TND method is compared with some classical fault detection methods through a numerical simulation process and a Continuous Stirred Tank Reactor (CSTR) system design fault detection experiment. The PLS-TND method gives the best performance of fault detection.</p>\u0000 </div>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 12","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142860386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dynamic Multiblock Regression for Process Modelling","authors":"Marco Cattaldo, Alberto Ferrer, Ingrid Måge","doi":"10.1002/cem.3618","DOIUrl":"https://doi.org/10.1002/cem.3618","url":null,"abstract":"<p>The study introduces three novel strategies for incorporating capabilities for dynamic modelling into multiblock regression methods by integrating sequentially orthogonalised partial least squares (SO-PLS) with different dynamic modelling techniques. The study evaluates these strategies using synthetic datasets and an industrial example, comparing their performance in predictive ability, identification of process dynamics, and quantification of block contributions. Results suggest that these approaches can effectively model the dynamics with performance comparable to state-of-the-art methods, providing, at the same time, insight into the dynamic order and block contributions. One of the strategies, sequentially orthogonalised dynamic augmented (SODA)–PLS, shows promise by ensuring that redundant information in the time dimension is not included, resulting in simpler and more easily interpretable dynamic models. These multiblock dynamic regression strategies have potential applications for improved process understanding in industrial settings, especially where multiple data sources and inherent time dynamics are present.</p>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 12","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cem.3618","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142860169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SFMD-X: A New Functional Data Classifier Based on Shrinkage Functional Mahalanobis Distance","authors":"Shunke Bao, Jiakun Guo, Zhouping Li","doi":"10.1002/cem.3615","DOIUrl":"https://doi.org/10.1002/cem.3615","url":null,"abstract":"<div>\u0000 \u0000 <p>In this article, we propose a novel classification approach for functional data based on a shrinkage estimate of functional Mahalanobis distance. We first introduce a new shrinkage functional Mahalanobis distance (SFMD), by using this new distance, we transform the functional observations into a set of vector-valued pseudo-samples. Furthermore, we adopt some good classification algorithms designed for multivariate data to this pseudo-samples instead of the original functional data. The new approach has advantage of highly flexible and scalable, that is, it can easily combine with any classification algorithm, such as support vector machine, tree-based methods, and neural networks. We demonstrate the performance of the proposed functional classifier through both extensive simulation studies and two real data applications.</p>\u0000 </div>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 12","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142860168","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhang-Feng Tang, Wei-Wei Wei, Zhi-Guo Wang, Wen Du, Zeng-Ping Chen
{"title":"A Practical Framework Integrating Two-Way Chemometric Methods With Three-Way Ones for the Analysis of Hyphenated Chromatographic Data of Complex Systems","authors":"Zhang-Feng Tang, Wei-Wei Wei, Zhi-Guo Wang, Wen Du, Zeng-Ping Chen","doi":"10.1002/cem.3625","DOIUrl":"https://doi.org/10.1002/cem.3625","url":null,"abstract":"<div>\u0000 \u0000 <p>Hyphenated chromatographic techniques are widely used to analyze and characterize complex samples. Chemometric methods are generally needed to extract the qualitative and quantitative information of the target analytes from complex hyphenated chromatographic data. However, neither two-way nor three-way chemometric methods are efficient enough in analyzing hyphenated chromatographic data with both severe peak overlapping and retention time shift across samples. To address this issue, a practical framework was proposed herein. It consists of three chemometric algorithms, that is, (1) “fix-sized moving window evolving target spectral projection” for locating the possible peak positions of the target analytes, (2) “target identification based on singular value comparison” for determining whether the identified peaks are indeed the chromatographic peaks of the target analytes, and (3) “fix-sized moving window evolving trilinear decomposition” for obtaining the quantitative results of the target analytes. Experimental results on the GC-MS data sets of mixture samples of 10 compounds verified that the proposed framework could deal with the problems of both severe peak overlapping and retention time shift across samples. The proposed framework has the advantages of simplicity in concept, easy implementation, and good performance and hence is expected to be a competitive alternative to existing methods for the analysis of hyphenated chromatographic data of complex samples.</p>\u0000 </div>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 12","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142860093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Joan Borràs-Ferrís, Abel Folch-Fortuny, Alberto Ferrer
{"title":"On the Properties of PLS for Analyzing Two-Level Factorial Experimental Designs","authors":"Joan Borràs-Ferrís, Abel Folch-Fortuny, Alberto Ferrer","doi":"10.1002/cem.3620","DOIUrl":"https://doi.org/10.1002/cem.3620","url":null,"abstract":"<p>We present here a novel methodology to analyze data from two-level factorial experimental designs, with or without missing runs, with just one method: partial least squares regression with one response variable (PLS1, hereinafter PLS). This property is very attractive for practitioners because, to the best of our knowledge, no other statistical tool has comparable versatility. In the case of a full and fractional factorial design, the one-PLS component model yields the same analytical solution as multiple linear regression (MLR), not only in the estimation of the effects but also in their statistical significance. When having missing runs in the factorial design, PLS is of particular interest as it is a powerful tool when dealing with complex correlation structures, as opposed to MLR. Thus, we challenge the widely held view that PLS is useful only when dealing with nonexperimental design (i.e., correlated observational data). The methodology is illustrated by two illustrative examples and synthesized by an easy-to-follow route map useful for practitioners.</p>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 12","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cem.3620","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142862128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Natalia A. Burmistrova, Polina M. Ilicheva, Kirill Yu. Presnyakov, Pavel S. Pidenko, Douglas N. Rutledge
{"title":"3D Fluorescence Spectroscopy Combined With Chemometrics as a Tool for Control of Imprinted Protein Purification From Template Molecules","authors":"Natalia A. Burmistrova, Polina M. Ilicheva, Kirill Yu. Presnyakov, Pavel S. Pidenko, Douglas N. Rutledge","doi":"10.1002/cem.3622","DOIUrl":"https://doi.org/10.1002/cem.3622","url":null,"abstract":"<div>\u0000 \u0000 <p>Imprinted proteins (IPs) are promising alternatives to natural recognition systems, such as biological receptors or antibodies. One of the crucial stages during development of IPs is removal of the template molecules from its complex with the protein. In this study, bovine serum albumin was imprinted in the presence of 4-hydroxycoumarin (4-HC); purification of IPs were carried out by dialysis, and fluorescence 3D spectroscopy was used to monitor the IP purification process. Excitation–emission matrix (EEM) was further investigated via several chemometric algorithms (principal component analysis [PCA], parallel factor analysis [PARAFAC], and independent components analysis [ICA]). We found that the models using PARAFAC and ICA worked better than those of PCA. It was shown that PARAFAC and ICA analyses allow not only to recognize IP sample with signal close to nonimprinted protein, but also to provide recommendations on the optimal dialysis time.</p>\u0000 </div>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 12","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142862246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}