Zhijie Wang , Masanori Suganuma , Takayuki Okatani
{"title":"Rethinking unsupervised domain adaptation for semantic segmentation","authors":"Zhijie Wang , Masanori Suganuma , Takayuki Okatani","doi":"10.1016/j.patrec.2024.09.022","DOIUrl":"10.1016/j.patrec.2024.09.022","url":null,"abstract":"<div><div>Unsupervised domain adaptation (UDA) adapts a model trained on one domain (called source) to a novel domain (called target) using only unlabeled data. Due to its high annotation cost, researchers have developed many UDA methods for semantic segmentation, which assume no labeled sample is available in the target domain. We question the practicality of this assumption for two reasons. First, after training a model with a UDA method, we must somehow verify the model before deployment. Second, UDA methods have at least a few hyper-parameters that need to be determined. The surest solution to these is to evaluate the model using validation data, i.e., a certain amount of labeled target-domain samples. This question about the basic assumption of UDA leads us to rethink UDA from a data-centric point of view. Specifically, we assume we have access to a minimum level of labeled data. Then, we ask how much is necessary to find good hyper-parameters of existing UDA methods. We then consider what if we use the same data for supervised training of the same model, e.g., finetuning. We conducted experiments to answer these questions with popular scenarios, {GTA5, SYNTHIA}<span><math><mo>→</mo></math></span>Cityscapes. We found that i) choosing good hyper-parameters needs only a few labeled images for some UDA methods whereas a lot more for others; and ii) simple finetuning works surprisingly well; it outperforms many UDA methods if only several dozens of labeled images are available.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"186 ","pages":"Pages 119-125"},"PeriodicalIF":3.9,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142421579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Motion-guided small MAV detection in complex and non-planar scenes","authors":"Hanqing Guo , Canlun Zheng , Shiyu Zhao","doi":"10.1016/j.patrec.2024.09.013","DOIUrl":"10.1016/j.patrec.2024.09.013","url":null,"abstract":"<div><div>In recent years, there has been a growing interest in the visual detection of micro aerial vehicles (MAVs) due to its importance in numerous applications. However, the existing methods based on either appearance or motion features encounter difficulties when the background is complex or the MAV is too small. In this paper, we propose a novel motion-guided MAV detector that can accurately identify small MAVs in complex and non-planar scenes. This detector first exploits a motion feature enhancement module to capture the motion features of small MAVs. Then it uses multi-object tracking and trajectory filtering to eliminate false positives caused by motion parallax. Finally, an appearance-based classifier and an appearance-based detector that operates on the cropped regions are used to achieve precise detection results. Our proposed method can effectively and efficiently detect extremely small MAVs from dynamic and complex backgrounds because it aggregates pixel-level motion features and eliminates false positives based on the motion and appearance features of MAVs. Experiments on the ARD-MAV dataset demonstrate that the proposed method could achieve high performance in small MAV detection under challenging conditions and outperform other state-of-the-art methods across various metrics.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"186 ","pages":"Pages 98-105"},"PeriodicalIF":3.9,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142421574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiadi Dong , Tianwei Qian , Yuxian Jiang , Lei Bi , Jinman Kim , Lisheng Wang , Xun Xu
{"title":"ClarityDiffuseNet: Enhancing fundus image quality under black shadows with diffusion model-based research","authors":"Jiadi Dong , Tianwei Qian , Yuxian Jiang , Lei Bi , Jinman Kim , Lisheng Wang , Xun Xu","doi":"10.1016/j.patrec.2024.10.012","DOIUrl":"10.1016/j.patrec.2024.10.012","url":null,"abstract":"<div><div>Deep learning models have achieved commendable success in the analysis of tasks related to fundus images. However, the performance of many models is affected by the quality of fundus images. A common quality issue observed in fundus images is the presence of severe black shadow artefact, primarily caused by opacities in the refractive media or due to insufficient or uneven illumination. Such low-quality images can compromise the model training and result in the models learning incorrect feature representations. The removal of black shadows can be regarded as a preprocessing problem in the enhancement of degraded images. Solutions typically involve either increasing the overall brightness of the image or restoring the dark shadowy areas. Prior work on increasing image brightness has often utilized generative adversarial networks (GANs), while restoration has been approached with autoencoders and variational autoencoders (VAEs). However, approaches that focus on brightening often fall short in properly addressing local degradations, and restoration techniques can lead to loss of details or over-smoothing in the shadow areas.</div><div>In this study, we introduce a method named ClarityDiffuseNet, a model for restoring low-quality fundus images based on diffusion generative models targeting severe black shadows. Our method restores areas with black shadows from regions of high quality, enhancing the image to be richer in detail and visually closer to artefact-free images. Compared to models based on GANs and inpainting methods, our approach demonstrates superior performance on four benchmark public datasets, with Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity index (SSIM) indices significantly surpassing the state-of-the-art models by 7% and 9%, respectively. Our method demonstrates notable improvements in downstream tasks, such as disease diagnosis—evidenced by 9% increase in the area under the curve (AUC) metric when tested on low-quality datasets—in vessel segmentation, which saw about 6% improvement in the Dice coefficient under similar conditions. These outcomes underscore the substantial promise of diffusion generative models within the realm of fundus image restoration, highlighting their effectiveness in enhancing image quality for further analysis.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"186 ","pages":"Pages 279-285"},"PeriodicalIF":3.9,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142592543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Personalized Federated Learning on long-tailed data via knowledge distillation and generated features","authors":"Fengling Lv , Pinxin Qian , Yang Lu , Hanzi Wang","doi":"10.1016/j.patrec.2024.09.024","DOIUrl":"10.1016/j.patrec.2024.09.024","url":null,"abstract":"<div><div>Personalized Federated Learning (PFL) offers a novel paradigm for distributed learning, which aims to learn a personalized model for each client through collaborative training of all distributed clients in a privacy-preserving manner. However, the performance of personalized models is often compromised by data heterogeneity and the challenges of long-tailed distributions, both of which are common in real-world applications. In this paper, we explore the joint problem of data heterogeneity and long-tailed distribution in PFL and propose a corresponding solution called Personalized Federated Learning with Distillation and generated Features (PFLDF). Specifically, we employ a lightweight generator trained on the server to generate a balanced feature set for each client that can supplement local minority class information with global class information. This augmentation mechanism is a robust countermeasure against the adverse effects of data imbalance. Subsequently, we use knowledge distillation to transfer the knowledge of the global model to personalized models to improve their generalization performance. Extensive experimental results show the superiority of PFLDF compared to other state-of-the-art PFL methods with long-tailed data distribution.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"186 ","pages":"Pages 178-183"},"PeriodicalIF":3.9,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142421554","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cheol-Hwan Yoo , Jang-Hee Yoo , Moon-Ki Back , Woo-Jin Wang , Yong-Goo Shin
{"title":"A unified framework to stereotyped behavior detection for screening Autism Spectrum Disorder","authors":"Cheol-Hwan Yoo , Jang-Hee Yoo , Moon-Ki Back , Woo-Jin Wang , Yong-Goo Shin","doi":"10.1016/j.patrec.2024.10.001","DOIUrl":"10.1016/j.patrec.2024.10.001","url":null,"abstract":"<div><div>We propose a unified pipeline for the task of stereotyped behaviors detection for early diagnosis of Autism Spectrum Disorder (ASD). Current methods for analyzing autism-related behaviors of ASD children primarily focus on action classification tasks utilizing pre-trimmed video segments, limiting their real-world applicability. To overcome these challenges, we develop a two-stage network for detecting stereotyped behaviors: one for temporally localizing repetitive actions and another for classifying behavioral types. Specifically, building on the observation that stereotyped behaviors commonly manifest in various repetitive forms, our method proposes an approach to localize video segments where arbitrary repetitive behaviors are observed. Subsequently, we classify the detailed types of behaviors within these localized segments, identifying actions such as arm flapping, head banging, and spinning. Extensive experimental results on SSBD and ESBD datasets demonstrate that our proposed pipeline surpasses existing baseline methods, achieving a classification accuracy of 88.3% and 88.6%, respectively. The code and dataset will be publicly available at <span><span>https://github.com/etri/AI4ASD/tree/main/pbr4RRB</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"186 ","pages":"Pages 156-163"},"PeriodicalIF":3.9,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142421557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xinzhi Liu , Jun Yu , Toru Kurihara , Congzhong Wu , Haiyan Zhang , Shu Zhan
{"title":"A complex neural network model by Hilbert Transform","authors":"Xinzhi Liu , Jun Yu , Toru Kurihara , Congzhong Wu , Haiyan Zhang , Shu Zhan","doi":"10.1016/j.patrec.2024.09.021","DOIUrl":"10.1016/j.patrec.2024.09.021","url":null,"abstract":"<div><div>The phase information of the optical wave plays a vital role in processing wave-related signals. In deep learning fields, complex-valued neural networks are put forward on the concept of complex amplitudes for full utilization of phase information. To build a complex-valued neural network, the common way is to exploit Fourier Transform of the observed signal to extract amplitude and phase information. However, this will lead to spectrum waste for a real-valued signal by introducing negative frequencies that have no physical meaning. To this end, we attempt to use Hilbert Transform as an alternative to yield a single sideband spectrum and avoid negative frequencies from interacting with positive ones. On the other hand, Fourier transform is a global analysis thus it tells nothing about the time domain. As our key insight, we further explore the usage of instantaneous frequency calculated by Hilbert Transform and propose a new method of constructing complex input from a time–frequency angle. Simple pixel-wise classification experiments are carried out on two hyperspectral datasets and MNIST dataset. Experimental results have demonstrated that Hilbert Transform with instantaneous frequency performs better by a large margin than Fourier Transform owing to the additional time information.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"186 ","pages":"Pages 113-118"},"PeriodicalIF":3.9,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142421578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Uncertainty quantification metrics for deep regression","authors":"Simon Kristoffersson Lind , Ziliang Xiong , Per-Erik Forssén , Volker Krüger","doi":"10.1016/j.patrec.2024.09.011","DOIUrl":"10.1016/j.patrec.2024.09.011","url":null,"abstract":"<div><div>When deploying deep neural networks on robots or other physical systems, the learned model should reliably quantify predictive uncertainty. A reliable uncertainty allows downstream modules to reason about the safety of its actions. In this work, we address metrics for uncertainty quantification. Specifically, we focus on regression tasks, and investigate Area Under Sparsification Error (AUSE), Calibration Error (CE), Spearman’s Rank Correlation, and Negative Log-Likelihood (NLL). Using multiple datasets, we look into how those metrics behave under four typical types of uncertainty, their stability regarding the size of the test set, and reveal their strengths and weaknesses. Our results indicate that Calibration Error is the most stable and interpretable metric, but AUSE and NLL also have their respective use cases. We discourage the usage of Spearman’s Rank Correlation for evaluating uncertainties and recommend replacing it with AUSE.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"186 ","pages":"Pages 91-97"},"PeriodicalIF":3.9,"publicationDate":"2024-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142319421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiangzhu Meng , Wei Wei , Qiang Liu , Yu Wang , Min Li , Liang Wang
{"title":"CvFormer: Cross-view transFormers with pre-training for fMRI analysis of human brain","authors":"Xiangzhu Meng , Wei Wei , Qiang Liu , Yu Wang , Min Li , Liang Wang","doi":"10.1016/j.patrec.2024.09.010","DOIUrl":"10.1016/j.patrec.2024.09.010","url":null,"abstract":"<div><p>In recent years, functional magnetic resonance imaging (fMRI) has been widely utilized to diagnose neurological disease, by exploiting the region of interest (RoI) nodes as well as their connectivities in human brain. However, most of existing works only rely on either RoIs or connectivities, neglecting the potential for complementary information between them. To address this issue, we study how to discover the rich cross-view information in fMRI data of human brain. This paper presents a novel method for cross-view analysis of fMRI data of the human brain, called Cross-view transFormers (CvFormer). CvFormer employs RoI and connectivity encoder modules to generate two separate views of the human brain, represented as RoI and sub-connectivity tokens. Then, basic transformer modules can be used to process the RoI and sub-connectivity tokens, and cross-view modules integrate the complement information across two views. Furthermore, CvFormer uses a global token for each branch as a query to exchange information with other branches in cross-view modules, which only requires linear time for both computational and memory complexity instead of quadratic time. To enhance the robustness of the proposed CvFormer, we propose a two-stage strategy to train its parameters. To be specific, RoI and connectivity views can be firstly utilized as self-supervised information to pre-train the CvFormer by combining it with contrastive learning and then fused to finetune the CvFormer using label information. Experiment results on two public ABIDE and ADNI datasets can show clear improvements by the proposed CvFormer, which can validate its effectiveness and superiority.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"186 ","pages":"Pages 85-90"},"PeriodicalIF":3.9,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142272723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DRGNN: Disentangled representation graph neural network for diverse category-level recommendations","authors":"Takuto Sugiyama , Soh Yoshida , Mitsuji Muneyasu","doi":"10.1016/j.patrec.2024.09.008","DOIUrl":"10.1016/j.patrec.2024.09.008","url":null,"abstract":"<div><p>Graph neural networks (GNNs) have significantly advanced recommender systems (RecSys) by enhancing their accuracy in complex collaborative filtering scenarios. However, this progress often comes at the cost of overlooking the diversity of recommendations, a factor in user satisfaction. Addressing this gap, this paper introduces the disentangled representation graph neural network (DRGNN). DRGNN integrates diversification into the candidate generation stage using two specialized modules. The first employs disentangled representation learning to separate item preferences from category preferences, thereby mitigating category bias in recommendations. The second module, focusing on positive sample selection, further reduces category bias. This approach not only maintains the high-order connectivity strengths of GNNs but also substantially improves the diversity of recommendations. Our extensive validation of DRGNN on three comprehensive web service datasets, Taobao, Amazon Beauty and MSD, shows that it not only matches the state-of-the-art methods in accuracy but also excels in achieving a balanced trade-off between accuracy and diversity in recommendations.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"186 ","pages":"Pages 78-84"},"PeriodicalIF":3.9,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142242542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hierarchical Glocal Attention Pooling for Graph Classification","authors":"Waqar Ali , Sebastiano Vascon , Thilo Stadelmann , Marcello Pelillo","doi":"10.1016/j.patrec.2024.09.009","DOIUrl":"10.1016/j.patrec.2024.09.009","url":null,"abstract":"<div><p>Graph pooling is an essential operation in Graph Neural Networks that reduces the size of an input graph while preserving its core structural properties. Existing pooling methods find a compressed representation considering the Global Topological Structures (e.g., cliques, stars, clusters) or Local information at node level (e.g., top-<span><math><mi>k</mi></math></span> informative nodes). However, an effective graph pooling method does not hierarchically integrate both Global and Local graph properties. To this end, we propose a dual-fold Hierarchical Global Local Attention Pooling (HGLA-Pool) layer that exploits the aforementioned graph properties, generating more robust graph representations. Exhaustive experiments on nine publicly available graph classification benchmarks under standard metrics show that HGLA-Pool significantly outperforms eleven state-of-the-art models on seven datasets while being on par for the remaining two.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"186 ","pages":"Pages 71-77"},"PeriodicalIF":3.9,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142242541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}