{"title":"Saliency information and mosaic based data augmentation method for densely occluded object recognition","authors":"Ying Tong, Xiangfeng Luo, Liyan Ma, Shaorong Xie, Wenbin Yang, Yinsai Guo","doi":"10.1007/s10044-024-01258-z","DOIUrl":"https://doi.org/10.1007/s10044-024-01258-z","url":null,"abstract":"<p>Data augmentation methods are crucial to improve the accuracy of densely occluded object recognition in the scene where the quantity and diversity of training images are insufficient. However, the current methods that use regional dropping and mixing strategies suffer from the problem of missing foreground objects and redundant background features, which can lead to densely occluded object recognition issues in classification or detection tasks. Herein, saliency information and mosaic based data augmentation method for densely occluded object recognition is proposed, which utilizes saliency information as prior knowledge to supervise the mosaic process of training images containing densely occluded objects. And the method uses fogging processing and class label mixing to construct new augmented images, in order to improve the accuracy of image classification and object recognition tasks by augmenting the quantity and diversity of training images. Extensive experiments on different classification datasets with various CNN architectures prove the effectiveness of our method.</p>","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":"43 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140322598","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Scene text detection using structured information and an end-to-end trainable generative adversarial networks","authors":"Palanichamy Naveen, Mahmoud Hassaballah","doi":"10.1007/s10044-024-01259-y","DOIUrl":"https://doi.org/10.1007/s10044-024-01259-y","url":null,"abstract":"<p>Scene text detection poses a considerable challenge due to the diverse nature of text appearance, backgrounds, and orientations. Enhancing robustness, accuracy, and efficiency in this context is vital for several applications, such as optical character recognition, image understanding, and autonomous vehicles. This paper explores the integration of generative adversarial network (GAN) and network variational autoencoder (VAE) to create a robust and potent text detection network. The proposed architecture comprises three interconnected modules: the VAE module, the GAN module, and the text detection module. In this framework, the VAE module plays a pivotal role in generating diverse and variable text regions. Subsequently, the GAN module refines and enhances these regions, ensuring heightened realism and accuracy. Then, the text detection module takes charge of identifying text regions in the input image via assigning confidence scores to each region. The comprehensive training of the entire network involves minimizing a joint loss function that encompasses the VAE loss, the GAN loss, and the text detection loss. The VAE loss ensures diversity in generated text regions and the GAN loss guarantees realism and accuracy, while the text detection loss ensures high-precision identification of text regions. The proposed method employs an encoder-decoder structure within the VAE module and a generator-discriminator structure in the GAN module. Rigorous testing on diverse datasets including Total-Text, CTW1500, ICDAR 2015, ICDAR 2017, ReCTS, TD500, COCO-Text, SynthText, Street View Text, and KIAST Scene Text demonstrates the superior performance of the proposed method compared to existing approaches.</p>","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":"1 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140168470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DA-ResNet: dual-stream ResNet with attention mechanism for classroom video summary","authors":"Yuxiang Wu, Xiaoyan Wang, Tianpan Chen, Yan Dou","doi":"10.1007/s10044-024-01256-1","DOIUrl":"https://doi.org/10.1007/s10044-024-01256-1","url":null,"abstract":"<p>It is important to generate both diverse and representative video summary for massive videos. In this paper, a convolution neural network based on dual-stream attention mechanism(DA-ResNet) is designed to obtain candidate summary sequences for classroom scenes. DA-ResNet constructs a dual stream input of image frame sequence and optical flow frame sequence to enhance the expression ability. The network also embeds the attention mechanism into ResNet. On the other hand, the final video summary is obtained by removing redundant frames with the improved hash clustering algorithm. In this process, preprocessing is performed first to reduce computational complexity. And then hash clustering is used to retain the frame with the highest entropy value in each class, removing other similar frames. To verify its effectiveness in classroom scenes, we also created ClassVideo, a real dataset consisting of 45 videos from the normal teaching environment of our school. The results of the experiments show the competitiveness of the proposed method DA-ResNet outperforms the existing methods by about 8% in terms of the F-measure. Besides, the visual results also demonstrate its ability to produce classroom video summaries that are very close to the human preferences.</p>","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":"20 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140155039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A novel Venus’ visible image processing neoteric workflow for improved planetary surface feature analysis","authors":"Indranil Misra, Mukesh Kumar Rohil, SManthira Moorthi, Debajyoti Dhar","doi":"10.1007/s10044-024-01253-4","DOIUrl":"https://doi.org/10.1007/s10044-024-01253-4","url":null,"abstract":"<p>The article presents a novel methodology that comprises of end-to-end Venus’ visible image processing neoteric workflow. The visible raw image is denoised using Tri-State median filter with background dark subtraction, and then enhanced using Contrast Limited Adaptive Histogram Equalization. The multi-modal image registration technique is developed using Segmented Affine Scale Invariant Feature Transform and Motion Smoothness Constraint outlier removal for co-registration of Venus’ visible and radar image. A novel image fusion algorithm using guided filter is developed to merge multi-modal Visible-Radar Venus’ image pair for generating the fused image. The Venus’ visible image quality assessment is performed at each processing step, and results are quantified and visualized. In addition, fuzzy color-coded segmentation map is generated for crucial information retrieval about Venus’ surface feature characteristics. It is found that Venus’ fused image clearly demarked planetary morphological features and validated with publicly available Venus’ radar nomenclature map.</p><h3 data-test=\"abstract-sub-heading\">Graphical abstract</h3>\u0000","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":"29 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140115429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Understanding the limitations of self-supervised learning for tabular anomaly detection","authors":"Kimberly T. Mai, Toby Davies, Lewis D. Griffin","doi":"10.1007/s10044-023-01208-1","DOIUrl":"https://doi.org/10.1007/s10044-023-01208-1","url":null,"abstract":"<p>While self-supervised learning has improved anomaly detection in computer vision and natural language processing, it is unclear whether tabular data can benefit from it. This paper explores the limitations of self-supervision for tabular anomaly detection. We conduct several experiments spanning various pretext tasks on 26 benchmark datasets to understand why this is the case. Our results confirm representations derived from self-supervision do not improve tabular anomaly detection performance compared to using the raw representations of the data. We show this is due to neural networks introducing irrelevant features, which reduces the effectiveness of anomaly detectors. However, we demonstrate that using a subspace of the neural network’s representation can recover performance.</p>","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":"137 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140115405","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mohammad Rahimzadeh, Soroush Parvin, Amirali Askari, Elnaz Safi, Mohammad Reza Mohammadi
{"title":"Wise-SrNet: a novel architecture for enhancing image classification by learning spatial resolution of feature maps","authors":"Mohammad Rahimzadeh, Soroush Parvin, Amirali Askari, Elnaz Safi, Mohammad Reza Mohammadi","doi":"10.1007/s10044-024-01211-0","DOIUrl":"https://doi.org/10.1007/s10044-024-01211-0","url":null,"abstract":"<p>One of the main challenges, since the advancement of convolutional neural networks is how to connect the extracted feature map to the final classification layer. VGG models used two sets of fully connected layers for the classification part of their architectures, which significantly increased the number of models’ weights. ResNet and the next deep convolutional models used the global average pooling layer to compress the feature map and feed it to the classification layer. Although using the GAP layer reduces the computational cost, but also causes losing spatial resolution of the feature map, which results in decreasing learning efficiency. In this paper, we aim to tackle this problem by replacing the GAP layer with a new architecture called Wise-SrNet. It is inspired by the depthwise convolutional idea and is designed for processing spatial resolution while not increasing computational cost. We have evaluated our method using three different datasets they are Intel Image Classification Challenge, MIT Indoors Scenes, and a part of the ImageNet dataset. We investigated the implementation of our architecture on several models of the Inception, ResNet, and DenseNet families. Applying our architecture has revealed a significant effect on increasing convergence speed and accuracy. Our experiments on images with 224224 resolution increased the Top-1 accuracy between 2 to 8% on different datasets and models. Running our models on 512512 resolution images of the MIT Indoors Scenes dataset showed a notable result of improving the Top-1 accuracy within 3 to 26%. We will also demonstrate the GAP layer’s disadvantage when the input images are large and the number of classes is not few. In this circumstance, our proposed architecture can do a great help in enhancing classification results. The code is shared at https://github.com/mr7495/image-classification-spatial.</p>","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":"4 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140099403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Feature selection using adaptive manta ray foraging optimization for brain tumor classification","authors":"K. S. Neetha, Dayanand Lal Narayan","doi":"10.1007/s10044-024-01236-5","DOIUrl":"https://doi.org/10.1007/s10044-024-01236-5","url":null,"abstract":"<p>Brain tumor is an anomalous growth of glial and neural cells and is considered as one of the primary causes of death worldwide. Therefore, it is essential to identify the tumor as soon as possible for reducing the mortality rate throughout the world. However, the classification of brain tumor is a challenging task due to presence of irrelevant features that cause misclassification during detection. In this research, the adaptive manta ray foraging optimization (AMRFO) is proposed for performing an effective feature selection to avoid the problem of overfitting while performing the classification. The adaptive control parameter strategy is incorporated in the AMRFO for enhancing the search process while selecting the feature subset. The linear intensity distribution information and regularization parameter-based intuitionistic fuzzy C-means algorithm namely LRIFCM is used to perform the segmentation of tumor regions. Next, LeeNET, gray-level co-occurrence matrix, local ternary pattern, histogram of gradients, and shape features are used to extract essential features from the segmented regions. Further, the attention-based long short-term memory (ALSTM) is used to classify the brain tumor types according to the features selected by AMRFO. The datasets utilized in this research study for the evaluation of AMRFO-ALSTM method are BRATS 2017, BRATS 2018, and Figshare brain datasets. Segmentation and classification are the two different evaluations examined for the AMRFO-ALSTM. The structural similarity index measure, Jaccard, dice, accuracy, and sensitivity are utilized during segmentation evaluation, while accuracy, specificity, sensitivity, precision, and F1-score are used during classification evaluation. The existing researches namely, transformer-enhanced convolutional neural network, Chan Vese (CV)-support vector machine, CV-K-nearest neighbor, deep convolutional neural network (DCNN), and salp water optimization with deep belief network are used to compare with the AMRFO-ALSTM. The accuracy of AMRFO-ALSTM for Figshare brain dataset is 99.80 which is a greater achievement when compared to the DCNN.</p>","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":"59 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140074218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Quantifying robustness: 3D tree point cloud skeletonization with smart-tree in noisy domains","authors":"","doi":"10.1007/s10044-024-01238-3","DOIUrl":"https://doi.org/10.1007/s10044-024-01238-3","url":null,"abstract":"<h3>Abstract</h3> <p>Extracting tree skeletons from 3D tree point clouds is challenged by noise and incomplete data. While our prior work (Dobbs et al., in: Iberian conference on pattern recognition and image analysis, Springer, Berlin, pp. 351–362, 2023) introduced a deep learning approach for approximating tree branch medial axes, its robustness against various types of noise has not been thoroughly evaluated. This paper addresses this gap. Specifically, we simulate real-world noise challenges by introducing 3D Perlin noise (to represent subtractive noise) and Gaussian noise (to mimic additive noise). To facilitate this evaluation, we introduce a new synthetic tree point cloud dataset, available at https://github.com/uc-vision/synthetic-trees-II. Our results indicate that our deep learning-based skeletonization method is tolerant to both additive and subtractive noise.</p>","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":"47 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140044914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"LFFNet: lightweight feature-enhanced fusion network for real-time semantic segmentation of road scenes","authors":"Xuegang Hu, Jing Feng, Juelin Gong","doi":"10.1007/s10044-024-01237-4","DOIUrl":"https://doi.org/10.1007/s10044-024-01237-4","url":null,"abstract":"<p>Deep neural networks have significantly improved semantic segmentation, but their great performance frequently comes at the expense of expensive computation and protracted inference times, which fall short of the exacting standards of real-world applications. A lightweight feature-enhanced fusion network (LFFNet) for real-time semantic segmentation is proposed. LFFNet is a particular type of asymmetric encoder–decoder structure. In the encoder, A multi-dilation rate fusion module can guarantee the retention of local information while enlarging the appropriate field in the encoder section, which resolves the issue of insufficient feature extraction caused by the variability of target size. In the decoder, different decoding modules are designed for spatial information and semantic information. The attentional feature enhancement module takes advantage of the attention mechanism to feature-optimize the contextual information of the high-level output, and the lightweight multi-scale feature fusion module fuses the features from various stages to aggregate more spatial detail information and contextual semantic information. The experimental findings demonstrate that LFFNet achieves 72.1% mIoU and 67.0% mIoU on Cityscapes and Camvid datasets at 102 FPS and 244 FPS, respectively, with only 0.63M parameters. Note that there is neither pretraining nor pre-processing. Our model can achieve superior segmentation performance with fewer parameters and less computation compared to existing networks.</p>","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":"38 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140035365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zizhu Fan, Yijing Huang, Chao Xi, Cheng Peng, Shitong Wang
{"title":"Semi-supervised fuzzy broad learning system based on mean-teacher model","authors":"Zizhu Fan, Yijing Huang, Chao Xi, Cheng Peng, Shitong Wang","doi":"10.1007/s10044-024-01217-8","DOIUrl":"https://doi.org/10.1007/s10044-024-01217-8","url":null,"abstract":"<p>Fuzzy broad learning system (FBLS) is a newly proposed fuzzy system, which introduces Takagi–Sugeno fuzzy model into broad learning system. It has shown that FBLS has better nonlinear fitting ability and faster calculation speed than the most of fuzzy neural networks proposed earlier. At the same time, compared to other fuzzy neural networks, FBLS has fewer rules and lower cost of training time. However, label errors or missing are prone to appear in large-scale dataset, which will greatly reduce the performance of FBLS. Therefore, how to use limited label information to train a powerful classifier is an important challenge. In order to address this problem, we introduce Mean-Teacher model for the fuzzy broad learning system. We use the Mean-Teacher model to rebuild the weights of the output layer of FBLS, and use the Teacher–Student model to train FBLS. The proposed model is an implementation of semi-supervised learning which integrates fuzzy logic and broad learning system in the Mean-Teacher-based knowledge distillation framework. Finally, we have proved the great performance of Mean-Teacher-based fuzzy broad learning system (MT-FBLS) through a large number of experiments.</p>","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":"148 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140009575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}