Machine Vision and Applications最新文献_第5页

Performance analysis of various deep learning models based on Max-Min CNN for lung nodule classification on CT images 基于 Max-Min CNN 的各种深度学习模型在 CT 图像肺结节分类中的性能分析

IF 3.3 4区计算机科学

Machine Vision and Applications Pub Date : 2024-06-20 DOI: 10.1007/s00138-024-01569-5

Rekka Mastouri, Nawres Khlifa, Henda Neji, Saoussen Hantous-Zannad

{"title":"Performance analysis of various deep learning models based on Max-Min CNN for lung nodule classification on CT images","authors":"Rekka Mastouri, Nawres Khlifa, Henda Neji, Saoussen Hantous-Zannad","doi":"10.1007/s00138-024-01569-5","DOIUrl":"https://doi.org/10.1007/s00138-024-01569-5","url":null,"abstract":"Lung cancer remains one of the leading causes of cancer-related deaths worldwide, underlining the urgent need for accurate and early detection and classification methods. In this paper, we present a comprehensive study that evaluates and compares different deep learning techniques for accurately distinguishing between nodule and non-nodule in 2D CT images. Our work introduced an innovative deep learning strategy called “Max-Min CNN” to improve lung nodule classification. Three models have been developed based on the Max-Min strategy: (1) a Max-Min CNN model built and trained from scratch, (2) a Bilinear Max-Min CNN composed of two Max-Min CNN streams whose outputs were bilinearly pooled by a Kronecker product, and (3) a hybrid Max-Min ViT combining a ViT model built from scratch and the proposed Max-Min CNN architecture as a backbone. To ensure an objective analysis of our findings, we evaluated each proposed model on 3186 images from the public LUNA16 database. Experimental results demonstrated the outperformance of the proposed hybrid Max-Min ViT over the Bilinear Max-Min CNN and the Max-Min CNN, with an accuracy rate of 98.03% versus 96.89% and 95.82%, respectively. This study clearly demonstrated the contribution of the Max-Min strategy in improving the effectiveness of deep learning models for pulmonary nodule classification on CT images.","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":"189 1","pages":""},"PeriodicalIF":3.3,"publicationDate":"2024-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141509089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Generation of realistic synthetic cable images to train deep learning segmentation models 生成逼真的合成电缆图像以训练深度学习分割模型

IF 3.3 4区计算机科学

Machine Vision and Applications Pub Date : 2024-06-20 DOI: 10.1007/s00138-024-01562-y

Pablo MalvidoFresnillo, Wael M. Mohammed, Saigopal Vasudevan, Jose A. PerezGarcia, Jose L. MartinezLastra

{"title":"Generation of realistic synthetic cable images to train deep learning segmentation models","authors":"Pablo MalvidoFresnillo, Wael M. Mohammed, Saigopal Vasudevan, Jose A. PerezGarcia, Jose L. MartinezLastra","doi":"10.1007/s00138-024-01562-y","DOIUrl":"https://doi.org/10.1007/s00138-024-01562-y","url":null,"abstract":"Semantic segmentation is one of the most important and studied problems in machine vision, which has been solved with high accuracy by many deep learning models. However, all these models present a significant drawback, they require large and diverse datasets to be trained. Gathering and annotating all these images manually would be extremely time-consuming, hence, numerous researchers have proposed approaches to facilitate or automate the process. Nevertheless, when the objects to be segmented are deformable, such as cables, the automation of this process becomes more challenging, as the dataset needs to represent their high diversity of shapes while keeping a high level of realism, and none of the existing solutions have been able to address it effectively. Therefore, this paper proposes a novel methodology to automatically generate highly realistic synthetic datasets of cables for training deep learning models in image segmentation tasks. This methodology utilizes Blender to create photo-realistic cable scenes and a Python pipeline to introduce random variations and natural deformations. To prove its performance, a dataset composed of 25000 synthetic cable images and their corresponding masks was generated and used to train six popular deep learning segmentation models. These models were then utilized to segment real cable images achieving outstanding results (over 70% IoU and 80% Dice coefficient for all the models). Both the methodology and the generated dataset are publicly available in the project’s repository.","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":"43 1","pages":""},"PeriodicalIF":3.3,"publicationDate":"2024-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141509090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Dual contrast discriminator with sharing attention for video anomaly detection 用于视频异常检测的共享注意力双对比度判别器

IF 3.3 4区计算机科学

Machine Vision and Applications Pub Date : 2024-06-19 DOI: 10.1007/s00138-024-01566-8

Yiwenhao Zeng, Yihua Chen, Songsen Yu, Mingzhang Yang, Rongrong Chen, Fang Xu

{"title":"Dual contrast discriminator with sharing attention for video anomaly detection","authors":"Yiwenhao Zeng, Yihua Chen, Songsen Yu, Mingzhang Yang, Rongrong Chen, Fang Xu","doi":"10.1007/s00138-024-01566-8","DOIUrl":"https://doi.org/10.1007/s00138-024-01566-8","url":null,"abstract":"The detection of video anomalies is a well-known issue in the realm of visual research. The volume of normal and abnormal sample data in this field is unbalanced, hence unsupervised training is generally used in research. Since the development of deep learning, the field of video anomaly has developed from reconstruction-based detection methods to prediction-based detection methods, and then to hybrid detection methods. To identify the presence of anomalies, these methods take advantage of the differences between ground-truth frames and reconstruction or prediction frames. Thus, the evaluation of the results is directly impacted by the quality of the generated frames. Built around the Dual Contrast Discriminator for Video Sequences (DCDVS) and the corresponding loss function, we present a novel hybrid detection method for further explanation. With less false positives and more accuracy, this method improves the discriminator’s guidance on the reconstruction-prediction network’s generation performance. we integrate optical flow processing and attention processes into the Auto-encoder (AE) reconstruction network. The network’s sensitivity to motion information and its ability to concentrate on important areas are improved by this integration. Additionally, DCDVS’s capacity to successfully recognize significant features gets improved by introducing the attention module implemented through parameter sharing. Aiming to reduce the risk of network overfitting, we also invented reverse augmentation, a data augmentation technique designed specifically for temporal data. Our approach achieved outstanding performance with AUC scores of 99.4, 92.9, and 77.3(%) on the UCSD Ped2, CUHK Avenue, and ShanghaiTech datasets, respectively, demonstrates competitiveness with advanced methods and validates its effectiveness.","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":"230 1","pages":""},"PeriodicalIF":3.3,"publicationDate":"2024-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141517786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multi-scale information fusion generative adversarial network for real-world noisy image denoising 用于真实世界噪声图像去噪的多尺度信息融合生成对抗网络

IF 3.3 4区计算机科学

Machine Vision and Applications Pub Date : 2024-06-18 DOI: 10.1007/s00138-024-01563-x

Xuegang Hu, Wei Zhao

{"title":"Multi-scale information fusion generative adversarial network for real-world noisy image denoising","authors":"Xuegang Hu, Wei Zhao","doi":"10.1007/s00138-024-01563-x","DOIUrl":"https://doi.org/10.1007/s00138-024-01563-x","url":null,"abstract":"Image denoising is crucial for enhancing image quality, improving visual effects, and boosting the accuracy of image analysis and recognition. Most of the current image denoising methods perform superior on synthetic noise images, but their performance is limited on real-world noisy images since the types and distributions of real noise are often uncertain. To address this challenge, a multi-scale information fusion generative adversarial network method is proposed in this paper. Specifically, In this method, the generator is an end-to-end denoising network that consists of a novel encoder–decoder network branch and an improved residual network branch. The encoder–decoder branch extracts rich detailed and contextual information from images at different scales and utilizes a feature fusion method to aggregate multi-scale information, enhancing the feature representation performance of the network. The residual network further compensates for the compressed and lost information in the encoder stage. Additionally, to effectively aid the generator in accomplishing the denoising task, convolution kernels of various sizes are added to the discriminator to improve its image evaluation ability. Furthermore, the dual denoising loss function is presented to enhance the model’s capability in performing noise removal and image restoration. Experimental results show that the proposed method exhibits superior objective performance and visual quality than some state-of-the-art methods on three real-world datasets.","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":"34 1","pages":""},"PeriodicalIF":3.3,"publicationDate":"2024-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141517785","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Trusted 3D self-supervised representation learning with cross-modal settings 跨模态设置的可信 3D 自监督表征学习

IF 3.3 4区计算机科学

Machine Vision and Applications Pub Date : 2024-06-02 DOI: 10.1007/s00138-024-01556-w

Xu Han, Haozhe Cheng, Pengcheng Shi, Jihua Zhu

{"title":"Trusted 3D self-supervised representation learning with cross-modal settings","authors":"Xu Han, Haozhe Cheng, Pengcheng Shi, Jihua Zhu","doi":"10.1007/s00138-024-01556-w","DOIUrl":"https://doi.org/10.1007/s00138-024-01556-w","url":null,"abstract":"Cross-modal setting employing 2D images and 3D point clouds in self-supervised representation learning is proven to be an effective way to enhance visual perception capabilities. However, different modalities have different data formats and representations. Directly using features extracted from cross-modal datasets may lead to information conflicting and collapsing. We refer to this problem as uncertainty in network learning. Therefore, reducing uncertainty to obtain trusted descriptions has become the key to improving network performance. Motivated by this, we propose our trusted cross-modal network in self-supervised learning (TCMSS). It can obtain trusted descriptions by a trusted combination module as well as improve network performance with a well-designed loss function. In the trusted combination module, we utilize the Dirichlet distribution and the subjective logic to parameterize the features and acquire probabilistic uncertainty at the same. Then, the Dempster-Shafer Theory (DST) is used to obtain trusted descriptions by weighting uncertainty to the parameterized results. We have also designed our trusted domain loss function, including domain loss and trusted loss. It can effectively improve the prediction accuracy of the network by applying contrastive learning between different feature descriptions. The experimental results show that our model outperforms previous results on linear classification in ScanObjectNN as well as few-shot classification in both ModelNet40 and ScanObjectNN. In addition, part segmentation also reports a superior result to previous methods in ShapeNet. Further, the ablation studies validate the potency of our method for a better point cloud understanding.","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":"32 1","pages":""},"PeriodicalIF":3.3,"publicationDate":"2024-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141191332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Deep multimodal-based finger spelling recognition for Thai sign language: a new benchmark and model composition 基于深度多模态的泰语手语手指拼写识别：新基准和模型构成

IF 3.3 4区计算机科学

Machine Vision and Applications Pub Date : 2024-05-31 DOI: 10.1007/s00138-024-01557-9

Wuttichai Vijitkunsawat, Teeradaj Racharak, Minh Le Nguyen

{"title":"Deep multimodal-based finger spelling recognition for Thai sign language: a new benchmark and model composition","authors":"Wuttichai Vijitkunsawat, Teeradaj Racharak, Minh Le Nguyen","doi":"10.1007/s00138-024-01557-9","DOIUrl":"https://doi.org/10.1007/s00138-024-01557-9","url":null,"abstract":"Video-based sign language recognition is vital for improving communication for the deaf and hard of hearing. Creating and maintaining quality of Thai sign language video datasets is challenging due to a lack of resources. Tackling this issue, we rigorously investigate a design and development of deep learning-based system for Thai Finger Spelling recognition, assessing various models with a new dataset of 90 standard letters performed by 43 diverse signers. We investigate seven deep learning models with three distinct modalities for our analysis: video-only methods (including RGB-sequencing-based CNN-LSTM and VGG-LSTM), human body joint coordinate sequences (processed by LSTM, BiLSTM, GRU, and Transformer models), and skeleton analysis (using TGCN with graph-structured skeleton representation). A thorough assessment of these models is conducted across seven circumstances, encompassing single-hand postures, single-hand motions with one, two, and three strokes, as well as two-hand postures with both static and dynamic point-on-hand interactions. The research highlights that the TGCN model is the optimal lightweight model in all scenarios. In single-hand pose cases, a combination of the Transformer and TGCN models of two modalities delivers outstanding performance, excelling in four particular conditions: single-hand poses, single-hand poses requiring one, two, and three strokes. In contrast, two-hand poses with static or dynamic point-on-hand interactions present substantial challenges, as the data from joint coordinates is inadequate due to hand obstructions, stemming from insufficient coordinate sequence data and the lack of a detailed skeletal graph structure. The study recommends integrating RGB-sequencing with visual modality to enhance the accuracy of two-handed sign language gestures.","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":"71 1","pages":""},"PeriodicalIF":3.3,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141191355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Scale-adaptive gesture computing: detection, tracking and recognition in controlled complex environments 规模自适应手势计算：受控复杂环境中的检测、跟踪和识别

IF 3.3 4区计算机科学

Machine Vision and Applications Pub Date : 2024-05-31 DOI: 10.1007/s00138-024-01555-x

Anish Monsley Kirupakaran, Rabul Hussain Laskar

{"title":"Scale-adaptive gesture computing: detection, tracking and recognition in controlled complex environments","authors":"Anish Monsley Kirupakaran, Rabul Hussain Laskar","doi":"10.1007/s00138-024-01555-x","DOIUrl":"https://doi.org/10.1007/s00138-024-01555-x","url":null,"abstract":"Complexity intensifies when gesticulations span various scales. Traditional scale-invariant object recognition methods often falter when confronted with case-sensitive characters in the English alphabet. The literature underscores a notable gap, the absence of an open-source multi-scale un-instructional gesture database featuring a comprehensive dictionary. In response, we have created the NITS (gesture scale) database, which encompasses isolated mid-air gesticulations of ninety-five alphanumeric characters. In this research, we present a scale-centric framework that addresses three critical aspects: (1) detection of smaller gesture objects: our framework excels at detecting smaller gesture objects, such as a red color marker. (2) Removal of redundant self co-articulated strokes: we propose an effective approach to eliminate redundant self co-articulated strokes often present in gesture trajectories. (3) Scale-variant approach for recognition: to tackle the scale vs. size ambiguity in recognition, we introduce a novel scale-variant methodology. Our experimental results reveal a substantial improvement of approximately 16% compared to existing state-of-the-art recognition models for mid-air gesture recognition. These outcomes demonstrate that our proposed approach successfully emulates the perceptibility found in the human visual system, even when utilizing data from monophthalmic vision. Furthermore, our findings underscore the imperative need for comprehensive studies encompassing scale variations in gesture recognition.","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":"12 1","pages":""},"PeriodicalIF":3.3,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141192142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Supervised contrastive learning with multi-scale interaction and integrity learning for salient object detection 针对突出物体检测的多尺度交互和完整性学习的有监督对比学习

IF 3.3 4区计算机科学

Machine Vision and Applications Pub Date : 2024-05-29 DOI: 10.1007/s00138-024-01552-0

Yu Bi, Zhenxue Chen, Chengyun Liu, Tian Liang, Fei Zheng

{"title":"Supervised contrastive learning with multi-scale interaction and integrity learning for salient object detection","authors":"Yu Bi, Zhenxue Chen, Chengyun Liu, Tian Liang, Fei Zheng","doi":"10.1007/s00138-024-01552-0","DOIUrl":"https://doi.org/10.1007/s00138-024-01552-0","url":null,"abstract":"Salient object detection (SOD) is designed to mimic human visual mechanisms to identify and segment the most salient part of an image. Although related works have achieved great progress in SOD, they are limited when it comes to interferences of non-salient objects, finely shaped objects and co-salient objects. To improve the effectiveness and capability of SOD, we propose a supervised contrastive learning network with multi-scale interaction and integrity learning named SCLNet. It adopts contrastive learning (CL), multi-reception field confusion (MRFC) and context enhancement (CE) mechanisms. Using this method, the input image is first divided into two branches after two different data augmentations. Unlike existing models, which focus more on boundary guidance, we add a random position mask on one branch to break the continuous of objects. Through the CL module, we obtain more semantic information than appearance information by learning the invariance of different data augmentations. The MRFC module is then designed to learn the internal connections and common influences of various reception field features layer by layer. Next, the obtained features are learned through the CE module for the integrity and continuity of salient objects. Finally, comprehensive evaluations on five challenging benchmark datasets show that SCLNet achieves superior results. Code is available at https://github.com/YuPangpangpang/SCLNet.","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":"07 1","pages":""},"PeriodicalIF":3.3,"publicationDate":"2024-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141191188","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Medtransnet: advanced gating transformer network for medical image classification Medtransnet：用于医学图像分类的高级门控变压器网络

IF 3.3 4区计算机科学

Machine Vision and Applications Pub Date : 2024-05-29 DOI: 10.1007/s00138-024-01542-2

Nagur Shareef Shaik, Teja Krishna Cherukuri, N Veeranjaneulu, Jyostna Devi Bodapati

{"title":"Medtransnet: advanced gating transformer network for medical image classification","authors":"Nagur Shareef Shaik, Teja Krishna Cherukuri, N Veeranjaneulu, Jyostna Devi Bodapati","doi":"10.1007/s00138-024-01542-2","DOIUrl":"https://doi.org/10.1007/s00138-024-01542-2","url":null,"abstract":"Accurate medical image classification poses a significant challenge in designing expert computer-aided diagnosis systems. While deep learning approaches have shown remarkable advancements over traditional techniques, addressing inter-class similarity and intra-class dissimilarity across medical imaging modalities remains challenging. This work introduces the advanced gating transformer network (MedTransNet), a deep learning model tailored for precise medical image classification. MedTransNet utilizes channel and multi-gate attention mechanisms, coupled with residual interconnections, to learn category-specific attention representations from diverse medical imaging modalities. Additionally, the use of gradient centralization during training helps in preventing overfitting and improving generalization, which is especially important in medical imaging applications where the availability of labeled data is often limited. Evaluation on benchmark datasets, including APTOS-2019, Figshare, and SARS-CoV-2, demonstrates effectiveness of the proposed MedTransNet across tasks such as diabetic retinopathy severity grading, multi-class brain tumor classification, and COVID-19 detection. Experimental results showcase MedTransNet achieving 85.68% accuracy for retinopathy grading, 98.37% ((pm ,0.44)) for tumor classification, and 99.60% for COVID-19 detection, surpassing recent deep learning models. MedTransNet holds promise for significantly improving medical image classification accuracy.\u0000","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":"55 1","pages":""},"PeriodicalIF":3.3,"publicationDate":"2024-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141191828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Learning more discriminative local descriptors with parameter-free weighted attention for few-shot learning 利用无参数加权注意力学习更具辨别力的局部描述符，实现少镜头学习

IF 3.3 4区计算机科学

Machine Vision and Applications Pub Date : 2024-05-28 DOI: 10.1007/s00138-024-01551-1

Qijun Song, Siyun Zhou, Die Chen

引用次数: 0