{"title":"预测性维护中的不平衡数据处理:一种基于重采样的方法","authors":"Sejma Cicak, Umut Avci","doi":"10.1109/HORA58378.2023.10156799","DOIUrl":null,"url":null,"abstract":"Imbalanced data is a common problem in many areas, and it can have significant impacts on the performance and generalizability of machine learning models. This is because the models fail to create a good representation of the examples in the minority class. This study aims at improving the classification success for the predictive maintenance tasks in which the data is generally imbalanced. To this end, we use resampling methods that target creating balanced data. We present various oversampling and undersampling techniques and apply them to both synthetic and real-world datasets. We then perform classification experiments with imbalanced and balanced datasets by using different classifiers. The performances of different classifiers have been compared. More importantly, we evaluate the effectiveness of resampling techniques to provide insights into their usefulness in handling class imbalance. Our study contributes to the growing body of literature on addressing the class imbalance in classification tasks and provides practical guidance for selecting appropriate sampling methods based on the characteristics of the dataset.","PeriodicalId":247679,"journal":{"name":"2023 5th International Congress on Human-Computer Interaction, Optimization and Robotic Applications (HORA)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Handling Imbalanced Data in Predictive Maintenance: A Resampling-Based Approach\",\"authors\":\"Sejma Cicak, Umut Avci\",\"doi\":\"10.1109/HORA58378.2023.10156799\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Imbalanced data is a common problem in many areas, and it can have significant impacts on the performance and generalizability of machine learning models. This is because the models fail to create a good representation of the examples in the minority class. This study aims at improving the classification success for the predictive maintenance tasks in which the data is generally imbalanced. To this end, we use resampling methods that target creating balanced data. We present various oversampling and undersampling techniques and apply them to both synthetic and real-world datasets. We then perform classification experiments with imbalanced and balanced datasets by using different classifiers. The performances of different classifiers have been compared. More importantly, we evaluate the effectiveness of resampling techniques to provide insights into their usefulness in handling class imbalance. Our study contributes to the growing body of literature on addressing the class imbalance in classification tasks and provides practical guidance for selecting appropriate sampling methods based on the characteristics of the dataset.\",\"PeriodicalId\":247679,\"journal\":{\"name\":\"2023 5th International Congress on Human-Computer Interaction, Optimization and Robotic Applications (HORA)\",\"volume\":\"21 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-06-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 5th International Congress on Human-Computer Interaction, Optimization and Robotic Applications (HORA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HORA58378.2023.10156799\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 5th International Congress on Human-Computer Interaction, Optimization and Robotic Applications (HORA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HORA58378.2023.10156799","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Handling Imbalanced Data in Predictive Maintenance: A Resampling-Based Approach
Imbalanced data is a common problem in many areas, and it can have significant impacts on the performance and generalizability of machine learning models. This is because the models fail to create a good representation of the examples in the minority class. This study aims at improving the classification success for the predictive maintenance tasks in which the data is generally imbalanced. To this end, we use resampling methods that target creating balanced data. We present various oversampling and undersampling techniques and apply them to both synthetic and real-world datasets. We then perform classification experiments with imbalanced and balanced datasets by using different classifiers. The performances of different classifiers have been compared. More importantly, we evaluate the effectiveness of resampling techniques to provide insights into their usefulness in handling class imbalance. Our study contributes to the growing body of literature on addressing the class imbalance in classification tasks and provides practical guidance for selecting appropriate sampling methods based on the characteristics of the dataset.