Machine Vision and Applications最新文献

A novel key point based ROI segmentation and image captioning using guidance information 基于关键点的新颖 ROI 分割和使用引导信息的图像字幕制作

IF 3.3 4区计算机科学

Machine Vision and Applications Pub Date : 2024-09-12 DOI: 10.1007/s00138-024-01597-1

Jothi Lakshmi Selvakani, Bhuvaneshwari Ranganathan, Geetha Palanisamy

{"title":"A novel key point based ROI segmentation and image captioning using guidance information","authors":"Jothi Lakshmi Selvakani, Bhuvaneshwari Ranganathan, Geetha Palanisamy","doi":"10.1007/s00138-024-01597-1","DOIUrl":"https://doi.org/10.1007/s00138-024-01597-1","url":null,"abstract":"Recently, image captioning has become an intriguing task that has attracted many researchers. This paper proposes a novel keypoint-based segmentation algorithm for extracting regions of interest (ROI) and an image captioning model guided by this information to generate more accurate image captions. The Difference of Gaussian (DoG) is used to identify keypoints. A novel ROI segmentation algorithm then utilizes these keypoints to extract the ROI. Features of the ROI are extracted, and the text features of related images are merged into a common semantic space using canonical correlation analysis (CCA) to produce the guiding information. The text features are constructed using a Bag of Words (BoW) model. Based on the guiding information and the entire image features, an LSTM generates a caption for the image. The guiding information helps the LSTM focus on important semantic regions in the image to generate the most significant keywords in the image caption. Experiments on the Flickr8k dataset show that the proposed ROI segmentation algorithm accurately identifies the ROI, and the image captioning model with the guidance information outperforms state-of-the-art methods.","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":"2011 1","pages":""},"PeriodicalIF":3.3,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142189920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Specular Surface Detection with Deep Static Specular Flow and Highlight 利用深度静态镜面流和高光进行镜面检测

IF 3.3 4区计算机科学

Machine Vision and Applications Pub Date : 2024-09-10 DOI: 10.1007/s00138-024-01603-6

Hirotaka Hachiya, Yuto Yoshimura

引用次数: 0

Removing cloud shadows from ground-based solar imagery 消除地面太阳图像中的云影

IF 3.3 4区计算机科学

Machine Vision and Applications Pub Date : 2024-09-09 DOI: 10.1007/s00138-024-01607-2

Amal Chaoui, Jay Paul Morgan, Adeline Paiement, Jean Aboudarham

引用次数: 0

Underwater image object detection based on multi-scale feature fusion 基于多尺度特征融合的水下图像物体检测

IF 3.3 4区计算机科学

Machine Vision and Applications Pub Date : 2024-09-02 DOI: 10.1007/s00138-024-01606-3

Chao Yang, Ce Zhang, Longyu Jiang, Xinwen Zhang

{"title":"Underwater image object detection based on multi-scale feature fusion","authors":"Chao Yang, Ce Zhang, Longyu Jiang, Xinwen Zhang","doi":"10.1007/s00138-024-01606-3","DOIUrl":"https://doi.org/10.1007/s00138-024-01606-3","url":null,"abstract":"Underwater object detection and classification technology is one of the most important ways for humans to explore the oceans. However, existing methods are still insufficient in terms of accuracy and speed, and have poor detection performance for small objects such as fish. In this paper, we propose a multi-scale aggregation enhanced (MAE-FPN) object detection method based on the feature pyramid network, including the multi-scale convolutional calibration module (MCCM) and the feature calibration distribution module (FCDM). First, we design the MCCM module, which can adaptively extract feature information from objects at different scales. Then, we built the FCDM structure to make the multi-scale information fusion more appropriate and to alleviate the problem of missing features from small objects. Finally, we construct the Fish Segmentation and Detection (FSD) dataset by fusing multiple data augmentation methods, which enriches the data resources for underwater object detection and solves the problem of limited training resources for deep learning. We conduct experiments on FSD and public datasets, and the results show that the proposed MAE-FPN network significantly improves the detection performance of underwater objects, especially small objects.","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":"113 1","pages":""},"PeriodicalIF":3.3,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142224601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Object Recognition Consistency in Regression for Active Detection 主动探测回归中的物体识别一致性

IF 3.3 4区计算机科学

Machine Vision and Applications Pub Date : 2024-08-29 DOI: 10.1007/s00138-024-01604-5

Ming Jing, Zhilong Ou, Hongxing Wang, Jiaxin Li, Ziyi Zhao

{"title":"Object Recognition Consistency in Regression for Active Detection","authors":"Ming Jing, Zhilong Ou, Hongxing Wang, Jiaxin Li, Ziyi Zhao","doi":"10.1007/s00138-024-01604-5","DOIUrl":"https://doi.org/10.1007/s00138-024-01604-5","url":null,"abstract":"Active learning has achieved great success in image classification because of selecting the most informative samples for data labeling and model training. However, the potential of active learning has been far from being realised in object detection due to its unique challenge in utilizing localization information. A popular compromise is to simply take active classification learning over detected object candidates. To consider the localization information of object detection, current effort usually falls into the model-dependent fashion, which either works on specific detection frameworks or relies on additionally designed modules. In this paper, we propose model-agnostic Object Recognition Consistency in Regression (ORCR), which can holistically measure the uncertainty information of classification and localization of each detected candidate from object detection. The philosophy behind ORCR is to obtain the detection uncertainty by calculating the classification consistency through localization regression at two successive detection scales. In the light of the proposed ORCR, we devise an active learning framework that enables an effortless deployment to any object detection architecture. Experimental results on the PASCAL VOC and MS-COCO benchmarks show that our method achieves better performance while simplifying the active detection process.\u0000","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":"73 1","pages":""},"PeriodicalIF":3.3,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142189972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Fast no-reference deep image dehazing 快速无参照深度图像去毛刺

IF 3.3 4区计算机科学

Machine Vision and Applications Pub Date : 2024-08-29 DOI: 10.1007/s00138-024-01601-8

Hongyi Qin, Alexander G. Belyaev

引用次数: 0

Synergetic proto-pull and reciprocal points for open set recognition 用于开集识别的协同原动力和互惠点

IF 3.3 4区计算机科学

Machine Vision and Applications Pub Date : 2024-08-23 DOI: 10.1007/s00138-024-01596-2

Xin Deng, Luyao Yang, Ao Zhang, Jingwen Wang, Hexu Wang, Tianzhang Xing, Pengfei Xu

{"title":"Synergetic proto-pull and reciprocal points for open set recognition","authors":"Xin Deng, Luyao Yang, Ao Zhang, Jingwen Wang, Hexu Wang, Tianzhang Xing, Pengfei Xu","doi":"10.1007/s00138-024-01596-2","DOIUrl":"https://doi.org/10.1007/s00138-024-01596-2","url":null,"abstract":"Open set recognition (OSR) aims to accept and classify known classes while rejecting unknown classes, which is the key technology for pattern recognition algorithms to be widely applied in practice. The challenges to OSR is to reduce the empirical classification risk of known classes and the open space risk of potential unknown classes. However, the existing OSR methods less consider to optimize the open space risk, and much dark information in unknown space is not taken into account, which results in that many unknown classes are misidentified as known classes. Therefore, we present a self-supervised learningbased OSR method with synergetic proto-pull and reciprocal points, which can remarkably reduce the risks of empirical classification and open space. Especially, we propose a new concept of proto-pull point, which can be synergistically combined with reciprocal points to shrink the feature spaces of known and unknown classes, and increase the feature distance between different classes, so as to form a good feature distribution. In addition, a self-supervised learning task of identifying the directions of rotated images is introduced in OSR model training, which is benefit for the OSR mdoel to capture more distinguishing features, and decreases both empirical classification and open space risks. The final experimental results on benchmark datasets show that our propsoed approach outperforms most existing OSR methods.","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":"36 1","pages":""},"PeriodicalIF":3.3,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142189974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Enhanced keypoint information and pose-weighted re-ID features for multi-person pose estimation and tracking 用于多人姿态估计和跟踪的增强型关键点信息和姿态加权再识别特征

IF 3.3 4区计算机科学

Machine Vision and Applications Pub Date : 2024-08-22 DOI: 10.1007/s00138-024-01602-7

Xiangyang Wang, Tao Pei, Rui Wang

{"title":"Enhanced keypoint information and pose-weighted re-ID features for multi-person pose estimation and tracking","authors":"Xiangyang Wang, Tao Pei, Rui Wang","doi":"10.1007/s00138-024-01602-7","DOIUrl":"https://doi.org/10.1007/s00138-024-01602-7","url":null,"abstract":"Multi-person pose estimation and tracking are crucial research directions in the field of artificial intelligence, with widespread applications in virtual reality, action recognition, and human-computer interaction. While existing pose tracking algorithms predominantly follow the top-down paradigm, they face challenges, such as pose occlusion and motion blur in complex scenes, leading to tracking inaccuracies. To address these challenges, we leverage enhanced keypoint information and pose-weighted re-identification (re-ID) features to improve the performance of multi-person pose estimation and tracking. Specifically, our proposed Decouple Heatmap Network decouples heatmaps into keypoint confidence and position. The refined keypoint information are utilized to reconstruct occluded poses. For the pose tracking task, we introduce a more efficient pipeline founded on pose-weighted re-ID features. This pipeline integrates a Pose Embedding Network to allocate weights to re-ID features and achieves the final pose tracking through a novel tracking matching algorithm. Extensive experiments indicate that our approach performs well in both multi-person pose estimation and tracking and achieves state-of-the-art results on the PoseTrack 2017 and 2018 datasets. Our source code is available at: https://github.com/TaoTaoPei/posetracking.","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":"8 1","pages":""},"PeriodicalIF":3.3,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142189973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Camera-based mapping in search-and-rescue via flying and ground robot teams 通过飞行和地面机器人团队在搜救过程中使用基于摄像头的制图技术

IF 3.3 4区计算机科学

Machine Vision and Applications Pub Date : 2024-08-20 DOI: 10.1007/s00138-024-01594-4

Bernardo Esteves Henriques, Mirko Baglioni, Anahita Jamshidnejad

引用次数: 0

Transformer with multi-level grid features and depth pooling for image captioning 具有多级网格特征和深度汇集功能的变换器，用于图像字幕制作

IF 3.3 4区计算机科学

Machine Vision and Applications Pub Date : 2024-08-20 DOI: 10.1007/s00138-024-01599-z

Doanh C. Bui, Tam V. Nguyen, Khang Nguyen

{"title":"Transformer with multi-level grid features and depth pooling for image captioning","authors":"Doanh C. Bui, Tam V. Nguyen, Khang Nguyen","doi":"10.1007/s00138-024-01599-z","DOIUrl":"https://doi.org/10.1007/s00138-024-01599-z","url":null,"abstract":"Image captioning is an exciting yet challenging problem in both computer vision and natural language processing research. In recent years, this problem has been addressed by Transformer-based models optimized with Cross-Entropy loss and boosted performance via Self-Critical Sequence Training. Two types of representations are embedded into captioning models: grid features and region features, and there have been attempts to include 2D geometry information in the self-attention computation. However, the 3D order of object appearances is not considered, leading to confusion for the model in cases of complex scenes with overlapped objects. In addition, recent studies using only feature maps from the last layer or block of a pretrained CNN-based model may lack spatial information. In this paper, we present the Transformer-based captioning model dubbed TMDNet. Our model includes one module to aggregate multi-level grid features (MGFA) to enrich the representation ability using prior knowledge, and another module to effectively embed the image’s depth-grid aggregation (DGA) into the model space for better performance. The proposed model demonstrates its effectiveness via evaluation on the MS-COCO “Karpathy” test split across five standard metrics.\u0000","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":"9 1","pages":""},"PeriodicalIF":3.3,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142190020","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0