Marcel Dix, Gianluca Manca, Kenneth Chigozie Okafor, Reuben Borrison, Konstantin Kirchheim, Divyasheel Sharma, Kr Chandrika, Deepti Maduskar, F. Ortmeier
{"title":"测量ML模型对工业时间序列数据质量问题的鲁棒性","authors":"Marcel Dix, Gianluca Manca, Kenneth Chigozie Okafor, Reuben Borrison, Konstantin Kirchheim, Divyasheel Sharma, Kr Chandrika, Deepti Maduskar, F. Ortmeier","doi":"10.1109/INDIN51400.2023.10218129","DOIUrl":null,"url":null,"abstract":"The performance of machine learning models can be significantly impacted by variations in data quality. Typically, conventional model testing does not examine how robust the model would be in the face of potential data quality deterioration. In an industrial use case, however, data quality is a pertinent issue, as sensors are susceptible to a variety of technical and external issues that may result in poor data quality over time. In order to develop robust machine learning models, industrial data scientists must understand the sensitivity of their models against data quality issues, through the application of an appropriate and comprehensive testing solution. In this work, we propose a generic framework for systematically analyzing the impact of data quality issues on the performance of machine learning models by intentionally applying gradual perturbations to the original time series data. The evaluation is performed using a benchmark industrial process consisting of multivariate time series from sensors in a complex chemical process.","PeriodicalId":174443,"journal":{"name":"2023 IEEE 21st International Conference on Industrial Informatics (INDIN)","volume":"282 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Measuring the Robustness of ML Models Against Data Quality Issues in Industrial Time Series Data\",\"authors\":\"Marcel Dix, Gianluca Manca, Kenneth Chigozie Okafor, Reuben Borrison, Konstantin Kirchheim, Divyasheel Sharma, Kr Chandrika, Deepti Maduskar, F. Ortmeier\",\"doi\":\"10.1109/INDIN51400.2023.10218129\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The performance of machine learning models can be significantly impacted by variations in data quality. Typically, conventional model testing does not examine how robust the model would be in the face of potential data quality deterioration. In an industrial use case, however, data quality is a pertinent issue, as sensors are susceptible to a variety of technical and external issues that may result in poor data quality over time. In order to develop robust machine learning models, industrial data scientists must understand the sensitivity of their models against data quality issues, through the application of an appropriate and comprehensive testing solution. In this work, we propose a generic framework for systematically analyzing the impact of data quality issues on the performance of machine learning models by intentionally applying gradual perturbations to the original time series data. The evaluation is performed using a benchmark industrial process consisting of multivariate time series from sensors in a complex chemical process.\",\"PeriodicalId\":174443,\"journal\":{\"name\":\"2023 IEEE 21st International Conference on Industrial Informatics (INDIN)\",\"volume\":\"282 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-07-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 IEEE 21st International Conference on Industrial Informatics (INDIN)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/INDIN51400.2023.10218129\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE 21st International Conference on Industrial Informatics (INDIN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INDIN51400.2023.10218129","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Measuring the Robustness of ML Models Against Data Quality Issues in Industrial Time Series Data
The performance of machine learning models can be significantly impacted by variations in data quality. Typically, conventional model testing does not examine how robust the model would be in the face of potential data quality deterioration. In an industrial use case, however, data quality is a pertinent issue, as sensors are susceptible to a variety of technical and external issues that may result in poor data quality over time. In order to develop robust machine learning models, industrial data scientists must understand the sensitivity of their models against data quality issues, through the application of an appropriate and comprehensive testing solution. In this work, we propose a generic framework for systematically analyzing the impact of data quality issues on the performance of machine learning models by intentionally applying gradual perturbations to the original time series data. The evaluation is performed using a benchmark industrial process consisting of multivariate time series from sensors in a complex chemical process.