Friedrich Rieken Münke, Jan Schützke, Felix Berens, Markus Reischl
{"title":"A review of adaptable conventional image processing pipelines and deep learning on limited datasets","authors":"Friedrich Rieken Münke, Jan Schützke, Felix Berens, Markus Reischl","doi":"10.1007/s00138-023-01501-3","DOIUrl":"https://doi.org/10.1007/s00138-023-01501-3","url":null,"abstract":"<p>The objective of this paper is to study the impact of limited datasets on deep learning techniques and conventional methods in semantic image segmentation and to conduct a comparative analysis in order to determine the optimal scenario for utilizing both approaches. We introduce a synthetic data generator, which enables us to evaluate the impact of the number of training samples as well as the difficulty and diversity of the dataset. We show that deep learning methods excel when large datasets are available and conventional image processing approaches perform well when the datasets are small and diverse. Since transfer learning is a common approach to work around small datasets, we are specifically assessing its impact and found only marginal impact. Furthermore, we implement the conventional image processing pipeline to enable fast and easy application to new problems, making it easy to apply and test conventional methods alongside deep learning with minimal overhead.\u0000</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":"34 1","pages":""},"PeriodicalIF":3.3,"publicationDate":"2024-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139648922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Regional filtering distillation for object detection","authors":"","doi":"10.1007/s00138-023-01503-1","DOIUrl":"https://doi.org/10.1007/s00138-023-01503-1","url":null,"abstract":"<h3>Abstract</h3> <p>Knowledge distillation is a common and effective method in model compression, which trains a compact student model to mimic the capability of a large teacher model to get superior generalization. Previous works on knowledge distillation are underperforming for challenging tasks such as object detection, compared to the general application of unsophisticated classification tasks. In this paper, we propose that the failure of knowledge distillation on object detection is mainly caused by the imbalance between features of informative and invalid background. Not all background noise is redundant, and the valuable background noise after screening contains relations between foreground and background. Therefore, we propose a novel regional filtering distillation (RFD) algorithm to solve this problem through two modules: region selection and attention-guided distillation. Region selection first filters massive invalid backgrounds and retains knowledge-dense regions on near object anchor locations. Attention-guided distillation further improves distillation performance on object detection tasks by extracting the relations between foreground and background to migrate key features. Extensive experiments on both one-stage and two-stage detectors have been conducted to prove the effectiveness of RFD. For example, RFD improves 2.8% and 2.6% mAP for ResNet50-RetinaNet and ResNet50-FPN student networks on the MS COCO dataset, respectively. We also evaluate our method with the Faster R-CNN model on Pascal VOC and KITTI benchmark, which obtain 1.52% and 4.36% mAP promotions for the ResNet18-FPN student network, respectively. Furthermore, our method increases 5.70% of mAP for MobileNetv2-SSD compared to the original model. The proposed RFD technique performs highly on detection tasks through regional filtering distillation. In the future, we plan to extend it to more challenging task scenarios, such as segmentation.</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":"12 1","pages":""},"PeriodicalIF":3.3,"publicationDate":"2024-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139648921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mohammad Hajizadeh, Mohammad Sabokrou, Adel Rahmani
{"title":"STARNet: spatio-temporal aware recurrent network for efficient video object detection on embedded devices","authors":"Mohammad Hajizadeh, Mohammad Sabokrou, Adel Rahmani","doi":"10.1007/s00138-023-01504-0","DOIUrl":"https://doi.org/10.1007/s00138-023-01504-0","url":null,"abstract":"<p>The challenge of converting various object detection methods from image to video remains unsolved. When applied to video, image methods frequently fail to generalize effectively due to issues, such as blurriness, different and unclear positions, low quality, and other relevant issues. Additionally, the lack of a good long-term memory in video object detection presents an additional challenge. In the majority of instances, the outputs of successive frames are known to be quite similar; therefore, this fact is relied upon. Furthermore, the information contained in a series of successive or non-successive frames is greater than that contained in a single frame. In this study, we present a novel recurrent cell for feature propagation and identify the optimal location of layers to increase the memory interval. As a result, we achieved higher accuracy compared to other proposed methods in other studies. Hardware limitations can exacerbate this challenge. The paper aims to implement and increase the efficiency of the methods on embedded devices. We achieved 68.7% <i>mAP</i> accuracy on the ImageNet VID dataset for embedded devices in real-time and at a speed of 52 <i>fps</i>.</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":"32 1","pages":""},"PeriodicalIF":3.3,"publicationDate":"2024-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139648920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SGBGAN: minority class image generation for class-imbalanced datasets","authors":"Qian Wan, Wenhui Guo, Yanjiang Wang","doi":"10.1007/s00138-023-01506-y","DOIUrl":"https://doi.org/10.1007/s00138-023-01506-y","url":null,"abstract":"<h3 data-test=\"abstract-sub-heading\">Abstract</h3><p>Class imbalance frequently arises in the context of image classification. Conventional generative adversarial networks (GANs) have a tendency to produce samples from the majority class when trained on class-imbalanced datasets. To address this issue, the Balancing GAN with gradient penalty (BAGAN-GP) has been proposed, but the outcomes may still exhibit a bias toward the majority categories when the similarity between images from different categories is substantial. In this study, we introduce a novel approach called the Pre-trained Gated Variational Autoencoder with Self-attention for Balancing Generative Adversarial Network (SGBGAN) as an image augmentation technique for generating high-quality images. The proposed method utilizes a Gated Variational Autoencoder with Self-attention (SA-GVAE) to initialize the GAN and transfers pre-trained SA-GVAE weights to the GAN. Our experimental results on Fashion-MNIST, CIFAR-10, and a highly unbalanced medical image dataset demonstrate that the SGBGAN outperforms other state-of-the-art methods. Results on Fréchet inception distance (FID) and structural similarity measures (SSIM) show that our model overcomes the instability problems that exist in other GANs. Especially on the Cells dataset, the FID of a minority class increases up to 23.09% compared to the latest BAGAN-GP, and the SSIM of a minority class increases up to 10.81%. It is proved that SGBGAN overcomes the class imbalance restriction and generates high-quality minority class images.\u0000</p><h3 data-test=\"abstract-sub-heading\">Graphical abstract</h3><p>The diagram provides an overview of the technical approach employed in this research paper. To address the issue of class imbalance within the dataset, a novel technique called the Gated Variational Autoencoder with Self-attention (SA-GVAE) is proposed. This SA-GVAE is utilized to initialize the Generative Adversarial Network (GAN), with the pre-trained weights from SA-GVAE being transferred to the GAN. Consequently, a Pre-trained Gated Variational Autoencoder with Self-attention for Balancing GAN (SGBGAN) is formed, serving as an image augmentation tool to generate high-quality images. Ultimately, the generation of minority samples is employed to restore class balance within the dataset.</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":"200 1","pages":""},"PeriodicalIF":3.3,"publicationDate":"2024-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139649209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Tackling confusion among actions for action segmentation with adaptive margin and energy-driven refinement","authors":"Zhichao Ma, Kan Li","doi":"10.1007/s00138-023-01505-z","DOIUrl":"https://doi.org/10.1007/s00138-023-01505-z","url":null,"abstract":"<p>Video action segmentation is a crucial task in evaluating the ability to understand human activities. Previous works on this task mainly focus on capturing complex temporal structures and fail to consider the feature ambiguity among similar actions and the biased training sets, thus they are easy to confuse some actions. In this paper, we propose a novel action segmentation framework, called DeConfuNet, to solve the above issue. First, we design a discriminative enhancement module (DEM) trained by an adaptive margin-guided discriminative feature learning which adjusts the margin adaptively to increase the feature distinguishability among similar actions, and whose multi-stage reasoning and adaptive feature fusion structures provide structural advantages for distinguishing similar actions. Second, we propose an equalizing influence module (EIM) that can overcome the impact of biased training sets by balancing the influence of training samples under a coefficient-adaptive loss function. Third, an energy and context-driven refinement module (ECRM) further alleviates the impact of the unbalanced influence of training samples by fusing and refining the inference of DEM and EIM, which utilizes the phased prediction including context and energy clues to assimilate untrustworthy segments, alleviating over-segmentation hugely. Extensive experiments show the effectiveness of each proposed technique, they verify that the DEM and EIM are complementary in reasoning and cooperate to overcome the confusion issue, and our approach achieves significant improvement and state-of-the-art performance of accuracy, edit score, and F1 score on the challenging 50Salads, GTEA, and Breakfast benchmarks.</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":"1 1","pages":""},"PeriodicalIF":3.3,"publicationDate":"2024-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139582210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optimization model based on attention mechanism for few-shot image classification","authors":"Ruizhi Liao, Junhai Zhai, Feng Zhang","doi":"10.1007/s00138-023-01502-2","DOIUrl":"https://doi.org/10.1007/s00138-023-01502-2","url":null,"abstract":"<p>Deep learning has emerged as the leading approach for pattern recognition, but its reliance on large labeled datasets poses challenges in real-world applications where obtaining annotated samples is difficult. Few-shot learning, inspired by human learning, enables fast adaptation to new concepts with limited examples. Optimization-based meta-learning has gained popularity as a few-shot learning method. However, it struggles with capturing long-range dependencies of gradients and has slow convergence rates, making it challenging to extract features from limited samples. To overcome these issues, we propose MLAL, an optimization model based on attention for few-shot learning. The model comprises two parts: the attention-LSTM meta-learner, which optimizes gradients hierarchically using the self-attention mechanism, and the cross-attention base-learner, which uses the cross-attention mechanism to cross-learn the common category features of support and query sets in a meta-task. Extensive experiments on two benchmark datasets show that MLAL achieves exceptional 1-shot and 5-shot classification accuracy on MiniImagenet and TiredImagenet. The codes for our proposed method are available at https://github.com/wflrz123/MLAL.</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":"19 1","pages":""},"PeriodicalIF":3.3,"publicationDate":"2024-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139509594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Obs-tackle: an obstacle detection system to assist navigation of visually impaired using smartphones","authors":"U. Vijetha, V. Geetha","doi":"10.1007/s00138-023-01499-8","DOIUrl":"https://doi.org/10.1007/s00138-023-01499-8","url":null,"abstract":"<p>As the prevalence of vision impairment continues to rise worldwide, there is an increasing need for affordable and accessible solutions that improve the daily experiences of individuals with vision impairment. The Visually Impaired (VI) are often prone to falls and injuries due to their inability to recognize dangers on the path while navigating. It is therefore crucial that they are aware of potential hazards in both known and unknown environments. Obstacle detection plays a key role in navigation assistance solutions for VI users. There has been a surge in experimentation on obstacle detection since the introduction of autonomous navigation in automobiles, robots, and drones. Previously, auditory, laser, and depth sensors dominated obstacle detection; however, advances in computer vision and deep learning have enabled it using simpler tools like smartphone cameras. While previous approaches to obstacle detection using estimated depth data have been effective, they suffer from limitations such as compromised accuracy when adapted for edge devices and the incapability to identify objects in the scene. To address these limitations, we propose an indoor and outdoor obstacle detection and identification technique that combines semantic segmentation with depth estimation data. We hypothesize that this combination of techniques will enhance obstacle detection and identification compared to using depth data alone. To evaluate the effectiveness of our proposed Obstacle detection method, we validated it against ground truth Obstacle data derived from the DIODE and NYU Depth v2 dataset. Our experimental results demonstrate that the proposed method achieves near 85% accuracy in detecting nearby obstacles with lower false positive and false negative rates. The demonstration of the proposed system deployed as an Android app-‘Obs-tackle’ is available at https://youtu.be/PSn-FEc5EQg?si=qPGB13tkYkD1kSOf.</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":"70 1","pages":""},"PeriodicalIF":3.3,"publicationDate":"2024-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139509841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jinmei Song, Baokai Liu, Yao Yu, Kaiwu Zhang, Shiqiang Du
{"title":"Multi-view spectral clustering based on constrained Laplacian rank","authors":"Jinmei Song, Baokai Liu, Yao Yu, Kaiwu Zhang, Shiqiang Du","doi":"10.1007/s00138-023-01497-w","DOIUrl":"https://doi.org/10.1007/s00138-023-01497-w","url":null,"abstract":"<p>The graph-based approach is a representative clustering method among multi-view clustering algorithms. However, it remains a challenge to quickly acquire complementary information in multi-view data and to execute effective clustering when the quality of the initially constructed data graph is inadequate. Therefore, we propose multi-view spectral clustering based on constrained Laplacian rank method, a new graph-based method (CLRSC). The following are our contributions: (1) Self-representation learning and CLR are extended to multi-view and they are connected into a unified framework to learn a common affinity matrix. (2) To achieve the overall optimization we construct a graph learning method based on constrained Laplacian rank and combine it with spectral clustering. (3) An iterative optimization-based procedure we designed and showed that our algorithm is convergent. Finally, sufficient experiments are carried out on 5 benchmark datasets. The experimental results on MSRC-v1 and BBCSport datasets show that the accuracy (ACC) of the method is 10.95% and 4.61% higher than the optimal comparison algorithm, respectively.\u0000</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":"12 1","pages":""},"PeriodicalIF":3.3,"publicationDate":"2024-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139464496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
C. Elmo Kulanesan, P. Vacher, L. Charleux, E. Roux
{"title":"High-accuracy 3D locators tracking in real time using monocular vision","authors":"C. Elmo Kulanesan, P. Vacher, L. Charleux, E. Roux","doi":"10.1007/s00138-023-01498-9","DOIUrl":"https://doi.org/10.1007/s00138-023-01498-9","url":null,"abstract":"<p>In the field of medical applications, precise localization of medical instruments and bone structures is crucial to ensure computer-assisted surgical interventions. In orthopedic surgery, existing devices typically rely on stereoscopic vision. Their purpose is to aid the surgeon in screw fixation of prostheses or bone removal. This article addresses the challenge of localizing a rigid object consisting of randomly arranged planar markers using a single camera. This approach is especially vital in medical situations where accurate object alignment relative to a camera is necessary at distances ranging from 80 cm to 120 cm. In addition, the size limitation of a few tens of centimeters ensures that the resulting locator does not obstruct the work area. This rigid locator consists of a solid at the surface of which a set of plane markers (ArUco) are glued. These plane markers are randomly distributed over the surface in order to systematically have a minimum of two visible markers whatever the orientation of the locator. The calibration of the locator involves finding the relative positions between the individual planar elements and is based on a bundle adjustment approach. One of the main and known difficulties associated with planar markers is the problem of pose ambiguity. To solve this problem, our method lies in the formulation of an efficient initial solution for the optimization step. After the calibration step, the reached positioning uncertainties of the locator are better than two-tenth of a cubic millimeter and one-tenth of a degree, regardless of the orientation of the locator in space. To assess the proposed method, the locator is rigidly attached to a stylus of about twenty centimeters length. Thanks to this approach, the tip of this stylus seen by a 16.1 megapixel camera at a distance of about 1 m is localized in real time in a cube lower than 1 mm side. A surface registration application is proposed by using the stylus on an artificial scapula.\u0000</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":"129 1","pages":""},"PeriodicalIF":3.3,"publicationDate":"2024-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139464897","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Local region-learning modules for point cloud classification","authors":"Kaya Turgut, Helin Dutagaci","doi":"10.1007/s00138-023-01495-y","DOIUrl":"https://doi.org/10.1007/s00138-023-01495-y","url":null,"abstract":"<p>Data organization via forming local regions is an integral part of deep learning networks that process 3D point clouds in a hierarchical manner. At each level, the point cloud is sampled to extract representative points and these points are used to be centers of local regions. The organization of local regions is of considerable importance since it determines the location and size of the receptive field at a particular layer of feature aggregation. In this paper, we present two local region-learning modules: Center Shift Module to infer the appropriate shift for each center point, and Radius Update Module to alter the radius of each local region. The parameters of the modules are learned through optimizing the loss associated with the particular task within an end-to-end network. We present alternatives for these modules through various ways of modeling the interactions of the features and locations of 3D points in the point cloud. We integrated both modules independently and together to the PointNet++ and PointCNN object classification architectures, and demonstrated that the modules contributed to a significant increase in classification accuracy for the ScanObjectNN data set consisting of scans of real-world objects. Our further experiments on ShapeNet data set showed that the modules are also effective on 3D CAD models.\u0000</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":"307 5 1","pages":""},"PeriodicalIF":3.3,"publicationDate":"2023-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138823537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}