{"title":"用于产量预测的不平衡可视化模式数据处理","authors":"M. M. Noor, S. Jusoh","doi":"10.1109/ITSIM.2008.4631657","DOIUrl":null,"url":null,"abstract":"The prediction of the yield outcome in a non close loop manufacturing process can be achieved by visualizing the historical data pattern generated from the inspection machine, transform the data pattern and map it into machine learning algorithm for training, in order to automatically generate a prediction model without the visual interpretation needs to be done by human. Anyhow, the nature of manufacturing process dataset for the bad yield outcome is highly skewed where the majority class of good yield extremely outnumbers the minority class of bad yield. Comparison between the undersampling, over- sampling and SMOTE + VDM sampling technique indicates that the combination of SMOTE + VDM and undersampled dataset produced a robust classifier performance capable of handling better with different batches of prediction test data sets. Furtherance, suitable distance function for SMOTE is needed to improve class recall and minimize overfitting whilst different approach on the majority class sampling is required to improve the class precision due to information loss by the undersampling.","PeriodicalId":314159,"journal":{"name":"2008 International Symposium on Information Technology","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Handling imbalance visualized pattern dataset for yield prediction\",\"authors\":\"M. M. Noor, S. Jusoh\",\"doi\":\"10.1109/ITSIM.2008.4631657\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The prediction of the yield outcome in a non close loop manufacturing process can be achieved by visualizing the historical data pattern generated from the inspection machine, transform the data pattern and map it into machine learning algorithm for training, in order to automatically generate a prediction model without the visual interpretation needs to be done by human. Anyhow, the nature of manufacturing process dataset for the bad yield outcome is highly skewed where the majority class of good yield extremely outnumbers the minority class of bad yield. Comparison between the undersampling, over- sampling and SMOTE + VDM sampling technique indicates that the combination of SMOTE + VDM and undersampled dataset produced a robust classifier performance capable of handling better with different batches of prediction test data sets. Furtherance, suitable distance function for SMOTE is needed to improve class recall and minimize overfitting whilst different approach on the majority class sampling is required to improve the class precision due to information loss by the undersampling.\",\"PeriodicalId\":314159,\"journal\":{\"name\":\"2008 International Symposium on Information Technology\",\"volume\":\"16 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2008-09-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2008 International Symposium on Information Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ITSIM.2008.4631657\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 International Symposium on Information Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ITSIM.2008.4631657","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Handling imbalance visualized pattern dataset for yield prediction
The prediction of the yield outcome in a non close loop manufacturing process can be achieved by visualizing the historical data pattern generated from the inspection machine, transform the data pattern and map it into machine learning algorithm for training, in order to automatically generate a prediction model without the visual interpretation needs to be done by human. Anyhow, the nature of manufacturing process dataset for the bad yield outcome is highly skewed where the majority class of good yield extremely outnumbers the minority class of bad yield. Comparison between the undersampling, over- sampling and SMOTE + VDM sampling technique indicates that the combination of SMOTE + VDM and undersampled dataset produced a robust classifier performance capable of handling better with different batches of prediction test data sets. Furtherance, suitable distance function for SMOTE is needed to improve class recall and minimize overfitting whilst different approach on the majority class sampling is required to improve the class precision due to information loss by the undersampling.