Yun-Che Hsieh, Chieh-Yu Chen, Da-Yin Liao, Peter B. Luh, Shi-Chung Chang
{"title":"基于机器学习的异常检测设备传感器数据清洗算法设计","authors":"Yun-Che Hsieh, Chieh-Yu Chen, Da-Yin Liao, Peter B. Luh, Shi-Chung Chang","doi":"10.1109/ISSM55802.2022.10027125","DOIUrl":null,"url":null,"abstract":"Anomaly detection (AD) by exploiting machine learning (ML) of equipment sensory data can make significant contributions to yield improvements. Data cleansing is critical to provide ML-based AD with fixed-length input without distortion of data characteristics. We present a novel data cleansing design. Design innovations are: process step and mode-based input data length determination, importance indicator of sample data based on relative difference, and data cleansing priority by exploiting importance indicator and entropy. Experiment results demonstrate our cleansing design is superior to two frequently used methods in preserving data characteristics for effective AD by using an unsupervised ML approach.","PeriodicalId":130513,"journal":{"name":"2022 International Symposium on Semiconductor Manufacturing (ISSM)","volume":"210 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Equipment Sensor Data Cleansing Algorithm Design for ML-Based Anomaly Detection\",\"authors\":\"Yun-Che Hsieh, Chieh-Yu Chen, Da-Yin Liao, Peter B. Luh, Shi-Chung Chang\",\"doi\":\"10.1109/ISSM55802.2022.10027125\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Anomaly detection (AD) by exploiting machine learning (ML) of equipment sensory data can make significant contributions to yield improvements. Data cleansing is critical to provide ML-based AD with fixed-length input without distortion of data characteristics. We present a novel data cleansing design. Design innovations are: process step and mode-based input data length determination, importance indicator of sample data based on relative difference, and data cleansing priority by exploiting importance indicator and entropy. Experiment results demonstrate our cleansing design is superior to two frequently used methods in preserving data characteristics for effective AD by using an unsupervised ML approach.\",\"PeriodicalId\":130513,\"journal\":{\"name\":\"2022 International Symposium on Semiconductor Manufacturing (ISSM)\",\"volume\":\"210 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 International Symposium on Semiconductor Manufacturing (ISSM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISSM55802.2022.10027125\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Symposium on Semiconductor Manufacturing (ISSM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISSM55802.2022.10027125","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Equipment Sensor Data Cleansing Algorithm Design for ML-Based Anomaly Detection
Anomaly detection (AD) by exploiting machine learning (ML) of equipment sensory data can make significant contributions to yield improvements. Data cleansing is critical to provide ML-based AD with fixed-length input without distortion of data characteristics. We present a novel data cleansing design. Design innovations are: process step and mode-based input data length determination, importance indicator of sample data based on relative difference, and data cleansing priority by exploiting importance indicator and entropy. Experiment results demonstrate our cleansing design is superior to two frequently used methods in preserving data characteristics for effective AD by using an unsupervised ML approach.