L. Briceno-Mena, M. Nnadili, M. G. Benton, J. Romagnoli
{"title":"Data mining and knowledge discovery in chemical processes: Effect of alternative processing techniques","authors":"L. Briceno-Mena, M. Nnadili, M. G. Benton, J. Romagnoli","doi":"10.1017/dce.2022.21","DOIUrl":null,"url":null,"abstract":"Abstract Data mining and knowledge discovery (DMKD) focuses on extracting useful information from data. In the chemical process industry, tasks such as process monitoring, fault detection, process control, optimization, etc., can be achieved using DMKD. However, the selection of the appropriate method for each step in the DMKD process, namely data cleaning, sampling, scaling, dimensionality reduction (DR), clustering, clustering analysis and data visualization to obtain meaningful insights is far from trivial. In this contribution, a computational environment (FastMan) is introduced and used to illustrate how method selection affects DMKD in chemical process data. Two case studies, using data from a simulated natural gas liquid plant and real data from an industrial pyrolysis unit, were conducted to demonstrate the applicability of these methodologies in real-life scenarios. Sampling and normalization methods were found to have a great impact on the quality of the DMKD results. Also, a neighbor graphs method for DR, t-distributed stochastic neighbor embedding, outperformed principal component analysis, a matrix factorization method frequently used in the chemical process industry for identifying both local and global changes.","PeriodicalId":34169,"journal":{"name":"DataCentric Engineering","volume":" ","pages":""},"PeriodicalIF":2.4000,"publicationDate":"2022-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"DataCentric Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1017/dce.2022.21","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 2
Abstract
Abstract Data mining and knowledge discovery (DMKD) focuses on extracting useful information from data. In the chemical process industry, tasks such as process monitoring, fault detection, process control, optimization, etc., can be achieved using DMKD. However, the selection of the appropriate method for each step in the DMKD process, namely data cleaning, sampling, scaling, dimensionality reduction (DR), clustering, clustering analysis and data visualization to obtain meaningful insights is far from trivial. In this contribution, a computational environment (FastMan) is introduced and used to illustrate how method selection affects DMKD in chemical process data. Two case studies, using data from a simulated natural gas liquid plant and real data from an industrial pyrolysis unit, were conducted to demonstrate the applicability of these methodologies in real-life scenarios. Sampling and normalization methods were found to have a great impact on the quality of the DMKD results. Also, a neighbor graphs method for DR, t-distributed stochastic neighbor embedding, outperformed principal component analysis, a matrix factorization method frequently used in the chemical process industry for identifying both local and global changes.