Handling imbalance visualized pattern dataset for yield prediction

2008 International Symposium on Information Technology Pub Date : 2008-09-26 DOI:10.1109/ITSIM.2008.4631657

M. M. Noor, S. Jusoh

引用次数: 4

Abstract

The prediction of the yield outcome in a non close loop manufacturing process can be achieved by visualizing the historical data pattern generated from the inspection machine, transform the data pattern and map it into machine learning algorithm for training, in order to automatically generate a prediction model without the visual interpretation needs to be done by human. Anyhow, the nature of manufacturing process dataset for the bad yield outcome is highly skewed where the majority class of good yield extremely outnumbers the minority class of bad yield. Comparison between the undersampling, over- sampling and SMOTE + VDM sampling technique indicates that the combination of SMOTE + VDM and undersampled dataset produced a robust classifier performance capable of handling better with different batches of prediction test data sets. Furtherance, suitable distance function for SMOTE is needed to improve class recall and minimize overfitting whilst different approach on the majority class sampling is required to improve the class precision due to information loss by the undersampling.

查看原文本刊更多论文

用于产量预测的不平衡可视化模式数据处理

对非闭环制造过程中良率结果的预测，可以通过将检测机生成的历史数据模式可视化，将数据模式转换成映射到机器学习算法中进行训练，从而自动生成预测模型，而无需人工进行可视化解释。无论如何，制造过程数据集的性质对于坏的良率结果是高度倾斜的，其中良率的大多数类别远远超过坏的少数类别。对欠采样、过采样和SMOTE + VDM采样技术的比较表明，SMOTE + VDM与欠采样数据集的结合产生了鲁棒的分类器性能，能够更好地处理不同批次的预测测试数据集。此外，需要适合SMOTE的距离函数来提高类召回率和最小化过拟合，而由于欠采样造成的信息损失，需要对大多数类采样采用不同的方法来提高类精度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2008 International Symposium on Information Technology

自引率

0.00%

发文量