Qien Yu , Shengxin Dai , Ran Dong , Soichiro Ikuno
{"title":"Attention-based vector quantized variational autoencoder for anomaly detection by using orthogonal subspace constraints","authors":"Qien Yu , Shengxin Dai , Ran Dong , Soichiro Ikuno","doi":"10.1016/j.patcog.2025.111500","DOIUrl":null,"url":null,"abstract":"<div><div>This paper introduces a new framework that uses a vector quantized variational autoencoder (VQVAE) enhanced by orthogonal subspace constraints (OSC) and pyramid criss-cross attention (PCCA). The framework was designed for anomaly detection in industrial product image datasets. Previous studies on modeling low-dimensional feature distributions have been unable to effectively distinguish between normal features and noisy/abnormal information, which is effectively addressed using OSC in this study. Then, the vector quantized mechanism is embodied in these two complementary subspaces to obtain normal and abnormal embedding subspaces and discrete representations for normal and noisy information, respectively. The proposed approach robustly represents low-dimensional discrete manifolds to present the information from normal data using a limited number of feature vectors. Additionally, two PCCA modules are proposed to capture feature maps from different layers in the encoder and decoder, benefitting the low-dimensional mapping and reconstruction process. The features of different layers are treated as the query (Q), key (K), and value (V), which could capture both low-level and high-level features, incorporating comprehensive contextual information. The effectiveness of the proposed framework for anomaly detection is assessed by comparing its performance with those of the state-of-the-art approaches on various publicly available industrial product image datasets.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"164 ","pages":"Article 111500"},"PeriodicalIF":7.5000,"publicationDate":"2025-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0031320325001608","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
This paper introduces a new framework that uses a vector quantized variational autoencoder (VQVAE) enhanced by orthogonal subspace constraints (OSC) and pyramid criss-cross attention (PCCA). The framework was designed for anomaly detection in industrial product image datasets. Previous studies on modeling low-dimensional feature distributions have been unable to effectively distinguish between normal features and noisy/abnormal information, which is effectively addressed using OSC in this study. Then, the vector quantized mechanism is embodied in these two complementary subspaces to obtain normal and abnormal embedding subspaces and discrete representations for normal and noisy information, respectively. The proposed approach robustly represents low-dimensional discrete manifolds to present the information from normal data using a limited number of feature vectors. Additionally, two PCCA modules are proposed to capture feature maps from different layers in the encoder and decoder, benefitting the low-dimensional mapping and reconstruction process. The features of different layers are treated as the query (Q), key (K), and value (V), which could capture both low-level and high-level features, incorporating comprehensive contextual information. The effectiveness of the proposed framework for anomaly detection is assessed by comparing its performance with those of the state-of-the-art approaches on various publicly available industrial product image datasets.
期刊介绍:
The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.