Antonios Kamariotis, Luca Sardi, I. Papaioannou, E. Chatzi, D. Štraub
{"title":"On off-line and on-line Bayesian filtering for uncertainty quantification of structural deterioration","authors":"Antonios Kamariotis, Luca Sardi, I. Papaioannou, E. Chatzi, D. Štraub","doi":"10.1017/dce.2023.13","DOIUrl":"https://doi.org/10.1017/dce.2023.13","url":null,"abstract":"Abstract Data-informed predictive maintenance planning largely relies on stochastic deterioration models. Monitoring information can be utilized to update sequentially the knowledge on model parameters. In this context, on-line (recursive) Bayesian filtering algorithms typically fail to properly quantify the full posterior uncertainty of time-invariant model parameters. Off-line (batch) algorithms are—in principle—better suited for the uncertainty quantification task, yet they are computationally prohibitive in sequential settings. In this work, we adapt and investigate selected Bayesian filters for parameter estimation: an on-line particle filter, an on-line iterated batch importance sampling filter, which performs Markov Chain Monte Carlo (MCMC) move steps, and an off-line MCMC-based sequential Monte Carlo filter. A Gaussian mixture model approximates the posterior distribution within the resampling process in all three filters. Two numerical examples provide the basis for a comparative assessment. The first example considers a low-dimensional, nonlinear, non-Gaussian probabilistic fatigue crack growth model that is updated with sequential monitoring measurements. The second high-dimensional, linear, Gaussian example employs a random field to model corrosion deterioration across a beam, which is updated with sequential sensor measurements. The numerical investigations provide insights into the performance of off-line and on-line filters in terms of the accuracy of posterior estimates and the computational cost, when applied to problems of different nature, increasing dimensionality and varying sensor information amount. Importantly, they show that a tailored implementation of the on-line particle filter proves competitive with the computationally demanding MCMC-based filters. Suggestions on the choice of the appropriate method in function of problem characteristics are provided.","PeriodicalId":34169,"journal":{"name":"DataCentric Engineering","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42149087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
L. Briceno-Mena, M. Nnadili, M. G. Benton, J. Romagnoli
{"title":"Data mining and knowledge discovery in chemical processes: Effect of alternative processing techniques","authors":"L. Briceno-Mena, M. Nnadili, M. G. Benton, J. Romagnoli","doi":"10.1017/dce.2022.21","DOIUrl":"https://doi.org/10.1017/dce.2022.21","url":null,"abstract":"Abstract Data mining and knowledge discovery (DMKD) focuses on extracting useful information from data. In the chemical process industry, tasks such as process monitoring, fault detection, process control, optimization, etc., can be achieved using DMKD. However, the selection of the appropriate method for each step in the DMKD process, namely data cleaning, sampling, scaling, dimensionality reduction (DR), clustering, clustering analysis and data visualization to obtain meaningful insights is far from trivial. In this contribution, a computational environment (FastMan) is introduced and used to illustrate how method selection affects DMKD in chemical process data. Two case studies, using data from a simulated natural gas liquid plant and real data from an industrial pyrolysis unit, were conducted to demonstrate the applicability of these methodologies in real-life scenarios. Sampling and normalization methods were found to have a great impact on the quality of the DMKD results. Also, a neighbor graphs method for DR, t-distributed stochastic neighbor embedding, outperformed principal component analysis, a matrix factorization method frequently used in the chemical process industry for identifying both local and global changes.","PeriodicalId":34169,"journal":{"name":"DataCentric Engineering","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46308246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Thermoacoustic stability prediction using classification algorithms","authors":"R. Gaudron, A. Morgans","doi":"10.1017/dce.2022.17","DOIUrl":"https://doi.org/10.1017/dce.2022.17","url":null,"abstract":"Abstract Predicting the occurrence of thermoacoustic instabilities is of major interest in a variety of engineering applications such as aircraft propulsion, power generation, and industrial heating. Predictive methodologies based on a physical approach have been developed in the past decades, but have a moderate-to-high computational cost when exploring a large number of designs. In this study, the stability prediction capabilities and computational cost of four well-established classification algorithms—the K-Nearest Neighbors, Decision Tree (DT), Random Forest (RF), and Multilayer Perceptron (MLP) algorithms—are investigated. These algorithms are trained using an in-house physics-based low-order network model tool called OSCILOS. All four algorithms are able to predict which configurations are thermoacoustically unstable with a very high accuracy and a very low runtime. Furthermore, the frequency intervals containing unstable modes for a given configuration are also accurately predicted using multilabel classification. The RF algorithm correctly predicts the overall stability and finds all frequency intervals containing unstable modes for 99.6 and 98.3% of all configurations, respectively, which makes it the most accurate algorithm when a large number of training examples is available. For smaller training sets, the MLP algorithm becomes the most accurate algorithm. The DT algorithm is found to be slightly less accurate, but can be trained extremely quickly and runs about a million times faster than a traditional physics-based low-order network model tool. These findings could be used to devise a new generation of combustor optimization tools that would run much faster than existing codes while retaining a similar accuracy.","PeriodicalId":34169,"journal":{"name":"DataCentric Engineering","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45914005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A data-driven method for automated data superposition with applications in soft matter science","authors":"Kyle R. Lennon, G. McKinley, J. Swan","doi":"10.1017/dce.2023.3","DOIUrl":"https://doi.org/10.1017/dce.2023.3","url":null,"abstract":"Abstract The superposition of data sets with internal parametric self-similarity is a longstanding and widespread technique for the analysis of many types of experimental data across the physical sciences. Typically, this superposition is performed manually, or recently through the application of one of a few automated algorithms. However, these methods are often heuristic in nature, are prone to user bias via manual data shifting or parameterization, and lack a native framework for handling uncertainty in both the data and the resulting model of the superposed data. In this work, we develop a data-driven, nonparametric method for superposing experimental data with arbitrary coordinate transformations, which employs Gaussian process regression to learn statistical models that describe the data, and then uses maximum a posteriori estimation to optimally superpose the data sets. This statistical framework is robust to experimental noise and automatically produces uncertainty estimates for the learned coordinate transformations. Moreover, it is distinguished from black-box machine learning in its interpretability—specifically, it produces a model that may itself be interrogated to gain insight into the system under study. We demonstrate these salient features of our method through its application to four representative data sets characterizing the mechanics of soft materials. In every case, our method replicates results obtained using other approaches, but with reduced bias and the addition of uncertainty estimates. This method enables a standardized, statistical treatment of self-similar data across many fields, producing interpretable data-driven models that may inform applications such as materials classification, design, and discovery.","PeriodicalId":34169,"journal":{"name":"DataCentric Engineering","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46638783","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Akroyd, Zachary S. Harper, David Soutar, Feroz Farazi, A. Bhave, S. Mosbach, Markus Kraft
{"title":"Universal Digital Twin: Land Use – ADDENDUM","authors":"J. Akroyd, Zachary S. Harper, David Soutar, Feroz Farazi, A. Bhave, S. Mosbach, Markus Kraft","doi":"10.1017/dce.2022.8","DOIUrl":"https://doi.org/10.1017/dce.2022.8","url":null,"abstract":"","PeriodicalId":34169,"journal":{"name":"DataCentric Engineering","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43044418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Intelligent vehicle drive mode which predicts the driver behavior vector to augment the engine performance in real-time","authors":"Srikanth Kolachalama, Hafiz Abid Mahmood Malik","doi":"10.1017/dce.2022.15","DOIUrl":"https://doi.org/10.1017/dce.2022.15","url":null,"abstract":"Abstract In this article, a novel drive mode, “intelligent vehicle drive mode” (IVDM), was proposed, which augments the vehicle engine performance in real-time. This drive mode predicts the driver behavior vector (DBV), which optimizes the vehicle engine performance, and the metric of optimal vehicle engine performance was defined using the elements of engine operating point (EOP) and heating ventilation and air conditioning system (HVAC). Deep learning (DL) models were developed by mapping the vehicle level vectors (VLV) with EOP and HVAC parameters, and the trained functions were utilized to predict the future states of DBV reflecting augmented vehicle engine performance. The iterative analysis was performed by empirically estimating the future states of VLV in the allowable range of DBV and was fed into the DL model to predict the performance vectors. The defined vehicle engine performance metric was applied to the predicted vectors, and thus optimal DBV is the instantaneous output of the IVDM. The analytical and validation techniques were developed using field data obtained from General Motors Inc., Warren, Michigan. Finally, the proposed concept was quantified by analyzing the instantaneous engine efficiency (IEE) and smoothness measure of the instantaneous engine map (IEM).","PeriodicalId":34169,"journal":{"name":"DataCentric Engineering","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42830087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
T. Griffiths, Débora C. Corrêa, M. Hodkiewicz, A. Polpo
{"title":"Managing streamed sensor data for mobile equipment prognostics","authors":"T. Griffiths, Débora C. Corrêa, M. Hodkiewicz, A. Polpo","doi":"10.1017/dce.2022.4","DOIUrl":"https://doi.org/10.1017/dce.2022.4","url":null,"abstract":"Abstract The ability to wirelessly stream data from sensors on heavy mobile equipment provides opportunities to proactively assess asset condition. However, data analysis methods are challenging to apply due to the size and structure of the data, which contain inconsistent and asynchronous entries, and large periods of missing data. Current methods usually require expertise from site engineers to inform variable selection. In this work, we develop a data preparation method to clean and arrange this streaming data for analysis, including a data-driven variable selection. Data are drawn from a mining industry case study, with sensor data from a primary production excavator over a period of 9 months. Variables include 58 numerical sensors and 40 binary indicators captured in 45-million rows of data describing the conditions and status of different subsystems of the machine. A total of 57% of time stamps contain missing values for at least one sensor. The response variable is drawn from fault codes selected by the operator and stored in the fleet management system. Application to the hydraulic system, for 21 failure events identified by the operator, shows that the data-driven selection contains variables consistent with subject matter expert expectations, as well as some sensors on other systems on the excavator that are less easy to explain from an engineering perspective. Our contribution is to demonstrate a compressed data representation using open-high-low-close and variable selection to visualize data and support identification of potential indicators of failure events from multivariate streamed data.","PeriodicalId":34169,"journal":{"name":"DataCentric Engineering","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43392846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Bonney, M. de Angelis, M. Dal Borgo, Luis Andrade, S. Beregi, N. Jamia, D. Wagg
{"title":"Development of a digital twin operational platform using Python Flask—ADDENDUM","authors":"M. Bonney, M. de Angelis, M. Dal Borgo, Luis Andrade, S. Beregi, N. Jamia, D. Wagg","doi":"10.1017/dce.2022.13","DOIUrl":"https://doi.org/10.1017/dce.2022.13","url":null,"abstract":"","PeriodicalId":34169,"journal":{"name":"DataCentric Engineering","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43677106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Feature extraction and artificial neural networks for the on-the-fly classification of high-dimensional thermochemical spaces in adaptive-chemistry simulations—ADDENDUM","authors":"G. D’Alessio, A. Cuoci, A. Parente","doi":"10.1017/dce.2022.12","DOIUrl":"https://doi.org/10.1017/dce.2022.12","url":null,"abstract":"","PeriodicalId":34169,"journal":{"name":"DataCentric Engineering","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42651786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Svalova, P. Helm, D. Prangle, M. Rouainia, S. Glendinning, D. Wilkinson
{"title":"Emulating computer experiments of transport infrastructure slope stability using Gaussian processes and Bayesian inference—ADDENDUM","authors":"A. Svalova, P. Helm, D. Prangle, M. Rouainia, S. Glendinning, D. Wilkinson","doi":"10.1017/dce.2022.14","DOIUrl":"https://doi.org/10.1017/dce.2022.14","url":null,"abstract":"The editors and publisher ofData-Centric Engineering have awarded the Open Data and OpenMaterials badges to this article Svalova A, et al. (2021). Open Data Badge—indicates that data necessary to reproduce the reported results are available in an open access repository, under an open licence, with an accompanying description of the data. Open Materials Badge—indicates that any infrastructure, instruments, or equipment related to the reported methodology are available in an open access repository and are described in sufficient detail to allow a researcher to reproduce the procedure. The original article has been updated to include the badges. Please refer to the Data Availability Statement to find the identifier linking to the open data or open materials.","PeriodicalId":34169,"journal":{"name":"DataCentric Engineering","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47000068","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}