Yun-Che Hsieh, Chieh-Yu Chen, Da-Yin Liao, Peter B. Luh, Shi-Chung Chang
{"title":"Equipment Sensor Data Cleansing Algorithm Design for ML-Based Anomaly Detection","authors":"Yun-Che Hsieh, Chieh-Yu Chen, Da-Yin Liao, Peter B. Luh, Shi-Chung Chang","doi":"10.1109/ISSM55802.2022.10027125","DOIUrl":null,"url":null,"abstract":"Anomaly detection (AD) by exploiting machine learning (ML) of equipment sensory data can make significant contributions to yield improvements. Data cleansing is critical to provide ML-based AD with fixed-length input without distortion of data characteristics. We present a novel data cleansing design. Design innovations are: process step and mode-based input data length determination, importance indicator of sample data based on relative difference, and data cleansing priority by exploiting importance indicator and entropy. Experiment results demonstrate our cleansing design is superior to two frequently used methods in preserving data characteristics for effective AD by using an unsupervised ML approach.","PeriodicalId":130513,"journal":{"name":"2022 International Symposium on Semiconductor Manufacturing (ISSM)","volume":"210 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Symposium on Semiconductor Manufacturing (ISSM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISSM55802.2022.10027125","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Anomaly detection (AD) by exploiting machine learning (ML) of equipment sensory data can make significant contributions to yield improvements. Data cleansing is critical to provide ML-based AD with fixed-length input without distortion of data characteristics. We present a novel data cleansing design. Design innovations are: process step and mode-based input data length determination, importance indicator of sample data based on relative difference, and data cleansing priority by exploiting importance indicator and entropy. Experiment results demonstrate our cleansing design is superior to two frequently used methods in preserving data characteristics for effective AD by using an unsupervised ML approach.