J. Huo, Changtong Lu, Yongfeng Yang, Hong-Mei Guo, Chenggang Li, Qian Li, Xuebin Zhao, Huaiqi Li
{"title":"基于鲁棒稀疏PCA的烟草质量异常点检测:优点与局限性","authors":"J. Huo, Changtong Lu, Yongfeng Yang, Hong-Mei Guo, Chenggang Li, Qian Li, Xuebin Zhao, Huaiqi Li","doi":"10.1109/ICSESS54813.2022.9930311","DOIUrl":null,"url":null,"abstract":"Quality control is important for tobacco industry and tobacco leaf is the source material for cigarettes product. For a certain brand’s products, without known standard samples as center, it is difficult to detect outliers of unknown groups with classical PCA. Although classical PCA has been widely used in NIRS for tobacco, the accuracy of classical PCA can not satisfy the industrial requirements to correctly classify the products and identify the outliers. Therefore the robust sparse PCA (RSPCA) here is used for tobacco leaf NIR process, which has advantages over both robust PCA (RPCA) and classical PCA (CPCA) that the RSPCA can suppress the effect of outliers through sparse loadings and has robust dimension projection. Thus RSPCA brings in higher accuracy for tobacco leaf source classification and outlier detection compared to classical PCA. With Eigenvalue Decomposition Discriminant Analysis (EDDA), a Gaussian component based supervised classification method, the tobacco leaf sources from different quality levels are well classified according to the robust score distance(SD) and orthogonal distance(OD) of RSPCA. Furthermore, the principal components (PCs) based classification and SD-OD based classification are also compared between the three types of PCA, which shows the RSPCA SD-OD based classification has the best performance.","PeriodicalId":265412,"journal":{"name":"2022 IEEE 13th International Conference on Software Engineering and Service Science (ICSESS)","volume":"63 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Quality Outlier Detection for Tobacco Based on Robust Sparse PCA: Advantages and Limitations\",\"authors\":\"J. Huo, Changtong Lu, Yongfeng Yang, Hong-Mei Guo, Chenggang Li, Qian Li, Xuebin Zhao, Huaiqi Li\",\"doi\":\"10.1109/ICSESS54813.2022.9930311\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Quality control is important for tobacco industry and tobacco leaf is the source material for cigarettes product. For a certain brand’s products, without known standard samples as center, it is difficult to detect outliers of unknown groups with classical PCA. Although classical PCA has been widely used in NIRS for tobacco, the accuracy of classical PCA can not satisfy the industrial requirements to correctly classify the products and identify the outliers. Therefore the robust sparse PCA (RSPCA) here is used for tobacco leaf NIR process, which has advantages over both robust PCA (RPCA) and classical PCA (CPCA) that the RSPCA can suppress the effect of outliers through sparse loadings and has robust dimension projection. Thus RSPCA brings in higher accuracy for tobacco leaf source classification and outlier detection compared to classical PCA. With Eigenvalue Decomposition Discriminant Analysis (EDDA), a Gaussian component based supervised classification method, the tobacco leaf sources from different quality levels are well classified according to the robust score distance(SD) and orthogonal distance(OD) of RSPCA. Furthermore, the principal components (PCs) based classification and SD-OD based classification are also compared between the three types of PCA, which shows the RSPCA SD-OD based classification has the best performance.\",\"PeriodicalId\":265412,\"journal\":{\"name\":\"2022 IEEE 13th International Conference on Software Engineering and Service Science (ICSESS)\",\"volume\":\"63 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE 13th International Conference on Software Engineering and Service Science (ICSESS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICSESS54813.2022.9930311\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 13th International Conference on Software Engineering and Service Science (ICSESS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSESS54813.2022.9930311","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Quality Outlier Detection for Tobacco Based on Robust Sparse PCA: Advantages and Limitations
Quality control is important for tobacco industry and tobacco leaf is the source material for cigarettes product. For a certain brand’s products, without known standard samples as center, it is difficult to detect outliers of unknown groups with classical PCA. Although classical PCA has been widely used in NIRS for tobacco, the accuracy of classical PCA can not satisfy the industrial requirements to correctly classify the products and identify the outliers. Therefore the robust sparse PCA (RSPCA) here is used for tobacco leaf NIR process, which has advantages over both robust PCA (RPCA) and classical PCA (CPCA) that the RSPCA can suppress the effect of outliers through sparse loadings and has robust dimension projection. Thus RSPCA brings in higher accuracy for tobacco leaf source classification and outlier detection compared to classical PCA. With Eigenvalue Decomposition Discriminant Analysis (EDDA), a Gaussian component based supervised classification method, the tobacco leaf sources from different quality levels are well classified according to the robust score distance(SD) and orthogonal distance(OD) of RSPCA. Furthermore, the principal components (PCs) based classification and SD-OD based classification are also compared between the three types of PCA, which shows the RSPCA SD-OD based classification has the best performance.