Hae Rang Roh, Chae Sun Kim, Yongseok Lee, Jong Min Lee
{"title":"Dimensionality Reduction for Clustering of Nonlinear Industrial Data: A Tutorial","authors":"Hae Rang Roh, Chae Sun Kim, Yongseok Lee, Jong Min Lee","doi":"10.1007/s11814-025-00402-7","DOIUrl":null,"url":null,"abstract":"<div><p>Dimensionality reduction is essential for industrial process data with numerous nonlinear variables to retain only the important features for visualization or subsequent tasks. This study serves as a tutorial demonstrating how various dimensionality reduction techniques perform as the complexity of process variables in toy examples increases. Among the variables, there are those containing fault signals, aiming to demonstrate the process of performing a fault detection task. The results evaluated based on three criteria showed that Uniform Manifold Approximation and Projection (UMAP) demonstrated notable results, particularly with sparse and noisy data, while also offering adequate robustness to out-of-sample test data. This tutorial provides guidance on selecting the appropriate dimensionality reduction technique based on data complexity, ultimately enabling more effective execution of subsequent tasks.</p></div>","PeriodicalId":684,"journal":{"name":"Korean Journal of Chemical Engineering","volume":"42 5","pages":"987 - 1001"},"PeriodicalIF":2.9000,"publicationDate":"2025-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s11814-025-00402-7.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Korean Journal of Chemical Engineering","FirstCategoryId":"5","ListUrlMain":"https://link.springer.com/article/10.1007/s11814-025-00402-7","RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
Dimensionality reduction is essential for industrial process data with numerous nonlinear variables to retain only the important features for visualization or subsequent tasks. This study serves as a tutorial demonstrating how various dimensionality reduction techniques perform as the complexity of process variables in toy examples increases. Among the variables, there are those containing fault signals, aiming to demonstrate the process of performing a fault detection task. The results evaluated based on three criteria showed that Uniform Manifold Approximation and Projection (UMAP) demonstrated notable results, particularly with sparse and noisy data, while also offering adequate robustness to out-of-sample test data. This tutorial provides guidance on selecting the appropriate dimensionality reduction technique based on data complexity, ultimately enabling more effective execution of subsequent tasks.
期刊介绍:
The Korean Journal of Chemical Engineering provides a global forum for the dissemination of research in chemical engineering. The Journal publishes significant research results obtained in the Asia-Pacific region, and simultaneously introduces recent technical progress made in other areas of the world to this region. Submitted research papers must be of potential industrial significance and specifically concerned with chemical engineering. The editors will give preference to papers having a clearly stated practical scope and applicability in the areas of chemical engineering, and to those where new theoretical concepts are supported by new experimental details. The Journal also regularly publishes featured reviews on emerging and industrially important subjects of chemical engineering as well as selected papers presented at international conferences on the subjects.