{"title":"A sharper definition of alignment for Panoptic Quality","authors":"Ruben van Heusden, Maarten Marx","doi":"10.1016/j.patrec.2024.07.005","DOIUrl":"10.1016/j.patrec.2024.07.005","url":null,"abstract":"<div><p>The Panoptic Quality metric, developed by Kirillov et al. in 2019, makes object-level precision, recall and F1 measures available for evaluating image segmentation, and more generally any partitioning task, against a gold standard. Panoptic Quality is based on partial isomorphisms between hypothesized and true segmentations. Kirillov et al. desire that functions defining these one-to-one matchings should be simple, interpretable and effectively computable. They show that for <span><math><mi>t</mi></math></span> and <span><math><mi>h</mi></math></span>, true and hypothesized segments, the condition stating that there are more correct than wrongly predicted pixels, formalized as <span><math><mrow><mi>I</mi><mi>o</mi><mi>U</mi><mrow><mo>(</mo><mi>t</mi><mo>,</mo><mi>h</mi><mo>)</mo></mrow><mo>></mo><mo>.</mo><mn>5</mn></mrow></math></span> or equivalently as <span><math><mrow><mo>|</mo><mi>t</mi><mo>∩</mo><mi>h</mi><mo>|</mo><mo>></mo><mo>.</mo><mn>5</mn><mo>|</mo><mi>t</mi><mo>∪</mo><mi>h</mi><mo>|</mo></mrow></math></span> has these properties. We show that a weaker function, requiring that more than half of the pixels in the hypothesized segment are in the true segment and vice-versa, formalized as <span><math><mrow><mo>|</mo><mi>t</mi><mo>∩</mo><mi>h</mi><mo>|</mo><mo>></mo><mo>.</mo><mn>5</mn><mo>|</mo><mi>t</mi><mo>|</mo></mrow></math></span> and <span><math><mrow><mo>|</mo><mi>t</mi><mo>∩</mo><mi>h</mi><mo>|</mo><mo>></mo><mo>.</mo><mn>5</mn><mo>|</mo><mi>h</mi><mo>|</mo></mrow></math></span>, is not only sufficient but also necessary. With a small proviso, every function defining a partial isomorphism satisfies this condition. We theoretically and empirically compare the two conditions.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"185 ","pages":"Pages 87-93"},"PeriodicalIF":3.9,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167865524002083/pdfft?md5=bb6442127be088116923de392456ce0d&pid=1-s2.0-S0167865524002083-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141698070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ivan Karpukhin , Stanislav Dereka , Sergey Kolesnikov
{"title":"EXACT: How to train your accuracy","authors":"Ivan Karpukhin , Stanislav Dereka , Sergey Kolesnikov","doi":"10.1016/j.patrec.2024.06.033","DOIUrl":"https://doi.org/10.1016/j.patrec.2024.06.033","url":null,"abstract":"<div><p>Classification tasks are typically evaluated based on accuracy. However, due to the discontinuous nature of accuracy, it cannot be directly optimized using gradient-based methods. The conventional approach involves minimizing surrogate losses such as cross-entropy or hinge loss, which may result in suboptimal performance. In this paper, we introduce a novel optimization technique that incorporates stochasticity into the model’s output and focuses on optimizing the expected accuracy, defined as the accuracy of the stochastic model. Comprehensive experimental evaluations demonstrate that our proposed optimization method significantly enhances performance across various classification tasks, including SVHN, CIFAR-10, CIFAR-100, and ImageNet.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"185 ","pages":"Pages 23-30"},"PeriodicalIF":3.9,"publicationDate":"2024-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141604803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuanhao Yue , Laixiang Shi , Zheng Zheng , Long Chen , Zhongyuan Wang , Qin Zou
{"title":"Deep motion estimation through adversarial learning for gait recognition","authors":"Yuanhao Yue , Laixiang Shi , Zheng Zheng , Long Chen , Zhongyuan Wang , Qin Zou","doi":"10.1016/j.patrec.2024.06.031","DOIUrl":"https://doi.org/10.1016/j.patrec.2024.06.031","url":null,"abstract":"<div><p>Gait recognition is a form of identity verification that can be performed over long distances without requiring the subject’s cooperation, making it particularly valuable for applications such as access control, surveillance, and criminal investigation. The essence of gait lies in the motion dynamics of a walking individual. Accurate gait-motion estimation is crucial for high-performance gait recognition. In this paper, we introduce two main designs for gait motion estimation. Firstly, we propose a fully convolutional neural network named W-Net for silhouette segmentation from video sequences. Secondly, we present an adversarial learning-based algorithm for robust gait motion estimation. Together, these designs contribute to a high-performance system for gait recognition and user authentication. In the experiment, two datasets, i.e., OU-IRIS and our own dataset, are used for performance evaluation. Experimental results show that, the W-Net achieves an accuracy of 89.46% in silhouette segmentation, and the proposed user-authentication method achieves over 99.6% and 93.8% accuracy on the two datasets, respectively.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"184 ","pages":"Pages 232-237"},"PeriodicalIF":3.9,"publicationDate":"2024-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141582471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sona Taheri , Adil M. Bagirov , Nargiz Sultanova , Burak Ordin
{"title":"Robust clustering algorithm: The use of soft trimming approach","authors":"Sona Taheri , Adil M. Bagirov , Nargiz Sultanova , Burak Ordin","doi":"10.1016/j.patrec.2024.06.032","DOIUrl":"https://doi.org/10.1016/j.patrec.2024.06.032","url":null,"abstract":"<div><p>The presence of noise or outliers in data sets may heavily affect the performance of clustering algorithms and lead to unsatisfactory results. The majority of conventional clustering algorithms are sensitive to noise and outliers. Robust clustering algorithms often overcome difficulties associated with noise and outliers and find true cluster structures. We introduce a soft trimming approach for the hard clustering problem where its objective is modeled as a sum of the cluster function and a function represented as a composition of the algebraic and distance functions. We utilize the composite function to estimate the degree of the significance of each data point in clustering. A robust clustering algorithm based on the new model and a procedure for generating starting cluster centers is developed. We demonstrate the performance of the proposed algorithm using some synthetic and real-world data sets containing noise and outliers. We also compare its performance with that of some well-known clustering techniques. Results show that the new algorithm is robust to noise and outliers and finds true cluster structures.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"185 ","pages":"Pages 15-22"},"PeriodicalIF":3.9,"publicationDate":"2024-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167865524002022/pdfft?md5=dbe56ad973c9985e231d856d9eba464a&pid=1-s2.0-S0167865524002022-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141604804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Points2NeRF: Generating Neural Radiance Fields from 3D point cloud","authors":"Dominik Zimny , Joanna Waczyńska , Tomasz Trzciński , Przemysław Spurek","doi":"10.1016/j.patrec.2024.07.002","DOIUrl":"https://doi.org/10.1016/j.patrec.2024.07.002","url":null,"abstract":"<div><p>Neural Radiance Fields (NeRFs) offers a state-of-the-art quality in synthesizing novel views of complex 3D scenes from a small subset of base images. For NeRFs to perform optimally, the registration of base images has to follow certain assumptions, including maintaining a constant distance between the camera and the object. We can address this limitation by training NeRFs with 3D point clouds instead of images, yet a straightforward substitution is impossible due to the sparsity of 3D clouds in the under-sampled regions, which leads to incomplete reconstruction output by NeRFs. To solve this problem, here we propose an auto-encoder-based architecture that leverages a hypernetwork paradigm to transfer 3D points with the associated color values through a lower-dimensional latent space and generate weights of NeRF model. This way, we can accommodate the sparsity of 3D point clouds and fully exploit the potential of point cloud data. As a side benefit, our method offers an implicit way of representing 3D scenes and objects that can be employed to condition NeRFs and hence generalize the models beyond objects seen during training. The empirical evaluation confirms the advantages of our method over conventional NeRFs and proves its superiority in practical applications.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"185 ","pages":"Pages 8-14"},"PeriodicalIF":3.9,"publicationDate":"2024-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141592707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Rescaling large datasets based on validation outcomes of a pre-trained network","authors":"Thanh Tuan Nguyen , Thanh Phuong Nguyen","doi":"10.1016/j.patrec.2024.07.001","DOIUrl":"10.1016/j.patrec.2024.07.001","url":null,"abstract":"<div><p>In fact, several categories in a large dataset are not difficult for recent advanced deep neural networks to recognize. Eliminating them for a challenging smaller subset will assist the early network proposals in taking a quick trial of verification. To this end, we propose an efficient rescaling method based on the validation outcomes of a pre-trained model. Firstly, we will take out the sensitive images of the lowest-accuracy classes of the validation outcomes. Each of such images is then considered to identify which label it was confused with. Gathering the lowest-accuracy classes along with the most confused ones can produce a smaller subset with a higher challenge for quick validation of an early network draft. Finally, a rescaling application is introduced to rescale two popular large datasets (ImageNet and Places365) for different tiny subsets (i.e., <span><math><msup><mrow><mi>ReIN</mi></mrow><mrow><mi>Ω</mi></mrow></msup></math></span> and <span><math><msup><mrow><mi>RePL</mi></mrow><mrow><mi>Ω</mi></mrow></msup></math></span> respectively). Experiments for image classification have proved that neural networks obtaining good performance on the original datasets also achieve good results on their rescaled subsets. For instance, MobileNetV1 and MobileNetV2 with 70.6% and 72% on ImageNet respectively obtained 46.53% and 47.47% on its small subset <span><math><msup><mrow><mi>ReIN</mi></mrow><mrow><mn>30</mn></mrow></msup></math></span>, which only contains about 39<!--> <!-->000 images. It can be observed that the better performance of MobileNetV2 on ImageNet correspondingly leads to the better rate on its rescaled subset. Appropriately, utilizing these rescaled sets would help researchers save time and computational costs in the way of designing deep neural architectures. All codes related to the rescaling proposal and the resultant subsets are available at <span><span>http://github.com/nttbdrk25/ImageNetPlaces365</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"185 ","pages":"Pages 73-80"},"PeriodicalIF":3.9,"publicationDate":"2024-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141702955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yonghyun Jeong , Doyeon Kim , Pyounggeon Kim , Youngmin Ro , Jongwon Choi
{"title":"Self-supervised scheme for generalizing GAN image detection","authors":"Yonghyun Jeong , Doyeon Kim , Pyounggeon Kim , Youngmin Ro , Jongwon Choi","doi":"10.1016/j.patrec.2024.06.030","DOIUrl":"https://doi.org/10.1016/j.patrec.2024.06.030","url":null,"abstract":"<div><p>Although the recent advancement in generative models brings diverse advantages to society, it can also be abused with malicious purposes, such as fraud, defamation, and fake news. To prevent such cases, vigorous research is conducted to distinguish the generated images from the real images, but challenges still remain to distinguish the generated images outside of the training settings. Such limitations occur due to data dependency arising from the model’s overfitting issue to the specific Generative Adversarial Networks (GANs) and categories of the training data. To overcome this issue, we adopt a self-supervised scheme. Our method is composed of the artificial artifact generator reconstructing the high-quality artificial artifacts of GAN images, and the GAN detector distinguishing GAN images by learning the reconstructed artificial artifacts. To improve the generalization of the artificial artifact generator, we build multiple autoencoders with different numbers of upconvolution layers. With numerous ablation studies, the robust generalization of our method is validated by outperforming the generalization of the previous state-of-the-art algorithms, even without utilizing the GAN images of the training dataset.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"184 ","pages":"Pages 219-224"},"PeriodicalIF":3.9,"publicationDate":"2024-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141582469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Luis M. Valentín-Coronado, Rodolfo Martínez-Manuel, Jonathan Esquivel-Hernández, Maria de los Angeles Martínez-Guerrero, Sophie LaRochelle
{"title":"Bending classification from interference signals of a fiber optic sensor using shallow learning and convolutional neural networks","authors":"Luis M. Valentín-Coronado, Rodolfo Martínez-Manuel, Jonathan Esquivel-Hernández, Maria de los Angeles Martínez-Guerrero, Sophie LaRochelle","doi":"10.1016/j.patrec.2024.06.029","DOIUrl":"https://doi.org/10.1016/j.patrec.2024.06.029","url":null,"abstract":"Bending monitoring is critical in engineering applications, as it helps determine any structural deformation caused by load action or fatigue effect. While strain gauges and accelerometers were previously used to measure bending magnitude, optical fiber sensors have emerged as a reliable alternative. In this work, a machine-learning-based model is proposed to analyze the interference signal of an interferometric fiber sensor system and characterize the bending magnitude and direction. In particular, shallow learning-based and convolutional neural network-based (CNN) models have been implemented to perform this task. Furthermore, given the repeatability of the interference signals, a synthetic dataset was created to train the models, whereas real interferometric signals were used to evaluate the models’ performance. Experiments were conducted on a flexible rod in fixed–free and fixed–fixed ends configurations for bending monitoring. Although both models achieved mean accuracies above 91%, only the CNN-based model reached a mean accuracy above 98%. This confirms that monitoring bending movements through interference signal analysis by means of a CNN-based model is a viable approach.","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"56 1","pages":""},"PeriodicalIF":5.1,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141611020","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Taejune Kim , Yun-Gyoo Lee , Inho Jeong , Soo-Youn Ham , Simon S. Woo
{"title":"Patch-wise vector quantization for unsupervised medical anomaly detection","authors":"Taejune Kim , Yun-Gyoo Lee , Inho Jeong , Soo-Youn Ham , Simon S. Woo","doi":"10.1016/j.patrec.2024.06.028","DOIUrl":"https://doi.org/10.1016/j.patrec.2024.06.028","url":null,"abstract":"<div><p>Radiography images inherently possess globally consistent structures while exhibiting significant diversity in local anatomical regions, making it challenging to model their normal features through unsupervised anomaly detection. Since unsupervised anomaly detection methods localize anomalies by utilizing discrepancies between learned normal features and input abnormal features, previous studies introduce a memory structure to capture the normal features of radiography images. However, these approaches store extremely localized image segments in their memory, causing the model to represent both normal and pathological features with the stored components. This poses a significant challenge in unsupervised anomaly detection by reducing the disparity between learned features and abnormal features. Furthermore, with the diverse settings in radiography imaging, the above issue is exacerbated: more diversity in the normal images results in stronger representation of pathological features. To resolve the issues above, we propose a novel pathology detection method called Patch-wise Vector Quantization (P-VQ). Unlike the previous methods, P-VQ learns vector-quantized representations of normal “patches” while preserving its spatial information by incorporating vector similarity metric. Furthermore, we introduce a novel method for selecting features in the memory to further enhance the robustness against diverse imaging settings. P-VQ even mitigates the “index collapse” problem of vector quantization by proposing top-<span><math><mrow><mi>k</mi><mtext>%</mtext></mrow></math></span> dropout. Our extensive experiments on the BMAD benchmark demonstrate the superior performance of P-VQ against existing state-of-the-art methods.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"184 ","pages":"Pages 205-211"},"PeriodicalIF":3.9,"publicationDate":"2024-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141543125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shuixin Deng , Lei Deng , Xiangze Meng , Ting Sun , Baohua Chen , Zhixiang Chen , Hao Hu , Yusen Xie , Hanxi Yin , Shijie Yu
{"title":"EHIR: Energy-based Hierarchical Iterative Image Registration for Accurate PCB Defect Detection","authors":"Shuixin Deng , Lei Deng , Xiangze Meng , Ting Sun , Baohua Chen , Zhixiang Chen , Hao Hu , Yusen Xie , Hanxi Yin , Shijie Yu","doi":"10.1016/j.patrec.2024.06.027","DOIUrl":"10.1016/j.patrec.2024.06.027","url":null,"abstract":"<div><p>Printed Circuit Board (PCB) Surface defect detection is crucial to ensure the quality of electronic products in manufacturing industry. Detection methods can be divided into non-referential and referential methods. Non-referential methods employ designed rules or learned data distribution without template images but are difficult to address the uncertainty and subjectivity issues of defects. In contrast, referential methods use templates to achieve better performance but rely on precise image registration. However, image registration is especially challenging in feature extracting and matching for PCB images with defective, reduplicated or less features. To address these issues, we propose a novel <strong>E</strong>nergy-based <strong>H</strong>ierarchical <strong>I</strong>terative Image <strong>R</strong>egistration method (EHIR) to formulate image registration as an energy optimization problem based on the edge points rather than finite features. Our framework consists of three stages: Edge-guided Energy Transformation (EET), EHIR and Edge-guided Energy-based Defect Detection (EEDD). The novelty is that the consistency of contours contributes to aligning images and the difference is highlighted for defect location. Extensive experiments show that this method has high accuracy and strong robustness, especially in the presence of defect feature interference, where our method demonstrates an overwhelming advantage over other methods.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"185 ","pages":"Pages 38-44"},"PeriodicalIF":3.9,"publicationDate":"2024-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141630040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}