Machine Vision and Applications最新文献

筛选
英文 中文
A novel key point based ROI segmentation and image captioning using guidance information 基于关键点的新颖 ROI 分割和使用引导信息的图像字幕制作
IF 3.3 4区 计算机科学
Machine Vision and Applications Pub Date : 2024-09-12 DOI: 10.1007/s00138-024-01597-1
Jothi Lakshmi Selvakani, Bhuvaneshwari Ranganathan, Geetha Palanisamy
{"title":"A novel key point based ROI segmentation and image captioning using guidance information","authors":"Jothi Lakshmi Selvakani, Bhuvaneshwari Ranganathan, Geetha Palanisamy","doi":"10.1007/s00138-024-01597-1","DOIUrl":"https://doi.org/10.1007/s00138-024-01597-1","url":null,"abstract":"<p>Recently, image captioning has become an intriguing task that has attracted many researchers. This paper proposes a novel keypoint-based segmentation algorithm for extracting regions of interest (ROI) and an image captioning model guided by this information to generate more accurate image captions. The Difference of Gaussian (DoG) is used to identify keypoints. A novel ROI segmentation algorithm then utilizes these keypoints to extract the ROI. Features of the ROI are extracted, and the text features of related images are merged into a common semantic space using canonical correlation analysis (CCA) to produce the guiding information. The text features are constructed using a Bag of Words (BoW) model. Based on the guiding information and the entire image features, an LSTM generates a caption for the image. The guiding information helps the LSTM focus on important semantic regions in the image to generate the most significant keywords in the image caption. Experiments on the Flickr8k dataset show that the proposed ROI segmentation algorithm accurately identifies the ROI, and the image captioning model with the guidance information outperforms state-of-the-art methods.</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142189920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Specular Surface Detection with Deep Static Specular Flow and Highlight 利用深度静态镜面流和高光进行镜面检测
IF 3.3 4区 计算机科学
Machine Vision and Applications Pub Date : 2024-09-10 DOI: 10.1007/s00138-024-01603-6
Hirotaka Hachiya, Yuto Yoshimura
{"title":"Specular Surface Detection with Deep Static Specular Flow and Highlight","authors":"Hirotaka Hachiya, Yuto Yoshimura","doi":"10.1007/s00138-024-01603-6","DOIUrl":"https://doi.org/10.1007/s00138-024-01603-6","url":null,"abstract":"<p>To apply robot teaching to a factory with many mirror-polished parts, it is necessary to detect the specular surface accurately. Deep models for mirror detection have been studied by designing mirror-specific features, e.g., contextual contrast and similarity. However, mirror-polished parts such as plastic molds, tend to have complex shapes and ambiguous boundaries, and thus, existing mirror-specific deep features could not work well. To overcome the problem, we propose introducing attention maps based on the concept of static specular flow (SSF), condensed reflections of the surrounding scene, and specular highlight (SH), bright light spots, frequently appearing even in complex-shaped specular surfaces and applying them to deep model-based multi-level features. Then, we adaptively integrate approximated mirror maps generated by multi-level SSF, SH, and existing mirror detectors to detect complex specular surfaces. Through experiments with our original data sets with spherical mirrors and real-world plastic molds, we show the effectiveness of the proposed method.</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142189946","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Removing cloud shadows from ground-based solar imagery 消除地面太阳图像中的云影
IF 3.3 4区 计算机科学
Machine Vision and Applications Pub Date : 2024-09-09 DOI: 10.1007/s00138-024-01607-2
Amal Chaoui, Jay Paul Morgan, Adeline Paiement, Jean Aboudarham
{"title":"Removing cloud shadows from ground-based solar imagery","authors":"Amal Chaoui, Jay Paul Morgan, Adeline Paiement, Jean Aboudarham","doi":"10.1007/s00138-024-01607-2","DOIUrl":"https://doi.org/10.1007/s00138-024-01607-2","url":null,"abstract":"<p>The study and prediction of space weather entails the analysis of solar images showing structures of the Sun’s atmosphere. When imaged from the Earth’s ground, images may be polluted by terrestrial clouds which hinder the detection of solar structures. We propose a new method to remove cloud shadows, based on a U-Net architecture, and compare classical supervision with conditional GAN. We evaluate our method on two different imaging modalities, using both real images and a new dataset of synthetic clouds. Quantitative assessments are obtained through image quality indices (RMSE, PSNR, SSIM, and FID). We demonstrate improved results with regards to the traditional cloud removal technique and a sparse coding baseline, on different cloud types and textures.</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142189922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Underwater image object detection based on multi-scale feature fusion 基于多尺度特征融合的水下图像物体检测
IF 3.3 4区 计算机科学
Machine Vision and Applications Pub Date : 2024-09-02 DOI: 10.1007/s00138-024-01606-3
Chao Yang, Ce Zhang, Longyu Jiang, Xinwen Zhang
{"title":"Underwater image object detection based on multi-scale feature fusion","authors":"Chao Yang, Ce Zhang, Longyu Jiang, Xinwen Zhang","doi":"10.1007/s00138-024-01606-3","DOIUrl":"https://doi.org/10.1007/s00138-024-01606-3","url":null,"abstract":"<p>Underwater object detection and classification technology is one of the most important ways for humans to explore the oceans. However, existing methods are still insufficient in terms of accuracy and speed, and have poor detection performance for small objects such as fish. In this paper, we propose a multi-scale aggregation enhanced (MAE-FPN) object detection method based on the feature pyramid network, including the multi-scale convolutional calibration module (MCCM) and the feature calibration distribution module (FCDM). First, we design the MCCM module, which can adaptively extract feature information from objects at different scales. Then, we built the FCDM structure to make the multi-scale information fusion more appropriate and to alleviate the problem of missing features from small objects. Finally, we construct the Fish Segmentation and Detection (FSD) dataset by fusing multiple data augmentation methods, which enriches the data resources for underwater object detection and solves the problem of limited training resources for deep learning. We conduct experiments on FSD and public datasets, and the results show that the proposed MAE-FPN network significantly improves the detection performance of underwater objects, especially small objects.</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142224601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Object Recognition Consistency in Regression for Active Detection 主动探测回归中的物体识别一致性
IF 3.3 4区 计算机科学
Machine Vision and Applications Pub Date : 2024-08-29 DOI: 10.1007/s00138-024-01604-5
Ming Jing, Zhilong Ou, Hongxing Wang, Jiaxin Li, Ziyi Zhao
{"title":"Object Recognition Consistency in Regression for Active Detection","authors":"Ming Jing, Zhilong Ou, Hongxing Wang, Jiaxin Li, Ziyi Zhao","doi":"10.1007/s00138-024-01604-5","DOIUrl":"https://doi.org/10.1007/s00138-024-01604-5","url":null,"abstract":"<p>Active learning has achieved great success in image classification because of selecting the most informative samples for data labeling and model training. However, the potential of active learning has been far from being realised in object detection due to its unique challenge in utilizing localization information. A popular compromise is to simply take active classification learning over detected object candidates. To consider the localization information of object detection, current effort usually falls into the model-dependent fashion, which either works on specific detection frameworks or relies on additionally designed modules. In this paper, we propose model-agnostic Object Recognition Consistency in Regression (ORCR), which can holistically measure the uncertainty information of classification and localization of each detected candidate from object detection. The philosophy behind ORCR is to obtain the detection uncertainty by calculating the classification consistency through localization regression at two successive detection scales. In the light of the proposed ORCR, we devise an active learning framework that enables an effortless deployment to any object detection architecture. Experimental results on the PASCAL VOC and MS-COCO benchmarks show that our method achieves better performance while simplifying the active detection process.\u0000</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142189972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fast no-reference deep image dehazing 快速无参照深度图像去毛刺
IF 3.3 4区 计算机科学
Machine Vision and Applications Pub Date : 2024-08-29 DOI: 10.1007/s00138-024-01601-8
Hongyi Qin, Alexander G. Belyaev
{"title":"Fast no-reference deep image dehazing","authors":"Hongyi Qin, Alexander G. Belyaev","doi":"10.1007/s00138-024-01601-8","DOIUrl":"https://doi.org/10.1007/s00138-024-01601-8","url":null,"abstract":"<p>This paper presents a deep learning method for image dehazing and clarification. The main advantages of the method are high computational speed and using unpaired image data for training. The method adapts the Zero-DCE approach (Li et al. in IEEE Trans Pattern Anal Mach Intell 44(8):4225–4238, 2021) for the image dehazing problem and uses high-order curves to adjust the dynamic range of images and achieve dehazing. Training the proposed dehazing neural network does not require paired hazy and clear datasets but instead utilizes a set of loss functions, assessing the quality of dehazed images to drive the training process. Experiments on a large number of real-world hazy images demonstrate that our proposed network effectively removes haze while preserving details and enhancing brightness. Furthermore, on an affordable GPU-equipped laptop, the processing speed can reach 1000 FPS for images with 2K resolution, making it highly suitable for real-time dehazing applications.</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142189976","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Synergetic proto-pull and reciprocal points for open set recognition 用于开集识别的协同原动力和互惠点
IF 3.3 4区 计算机科学
Machine Vision and Applications Pub Date : 2024-08-23 DOI: 10.1007/s00138-024-01596-2
Xin Deng, Luyao Yang, Ao Zhang, Jingwen Wang, Hexu Wang, Tianzhang Xing, Pengfei Xu
{"title":"Synergetic proto-pull and reciprocal points for open set recognition","authors":"Xin Deng, Luyao Yang, Ao Zhang, Jingwen Wang, Hexu Wang, Tianzhang Xing, Pengfei Xu","doi":"10.1007/s00138-024-01596-2","DOIUrl":"https://doi.org/10.1007/s00138-024-01596-2","url":null,"abstract":"<p>Open set recognition (OSR) aims to accept and classify known classes while rejecting unknown classes, which is the key technology for pattern recognition algorithms to be widely applied in practice. The challenges to OSR is to reduce the empirical classification risk of known classes and the open space risk of potential unknown classes. However, the existing OSR methods less consider to optimize the open space risk, and much dark information in unknown space is not taken into account, which results in that many unknown classes are misidentified as known classes. Therefore, we present a self-supervised learningbased OSR method with synergetic proto-pull and reciprocal points, which can remarkably reduce the risks of empirical classification and open space. Especially, we propose a new concept of proto-pull point, which can be synergistically combined with reciprocal points to shrink the feature spaces of known and unknown classes, and increase the feature distance between different classes, so as to form a good feature distribution. In addition, a self-supervised learning task of identifying the directions of rotated images is introduced in OSR model training, which is benefit for the OSR mdoel to capture more distinguishing features, and decreases both empirical classification and open space risks. The final experimental results on benchmark datasets show that our propsoed approach outperforms most existing OSR methods.</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142189974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhanced keypoint information and pose-weighted re-ID features for multi-person pose estimation and tracking 用于多人姿态估计和跟踪的增强型关键点信息和姿态加权再识别特征
IF 3.3 4区 计算机科学
Machine Vision and Applications Pub Date : 2024-08-22 DOI: 10.1007/s00138-024-01602-7
Xiangyang Wang, Tao Pei, Rui Wang
{"title":"Enhanced keypoint information and pose-weighted re-ID features for multi-person pose estimation and tracking","authors":"Xiangyang Wang, Tao Pei, Rui Wang","doi":"10.1007/s00138-024-01602-7","DOIUrl":"https://doi.org/10.1007/s00138-024-01602-7","url":null,"abstract":"<p>Multi-person pose estimation and tracking are crucial research directions in the field of artificial intelligence, with widespread applications in virtual reality, action recognition, and human-computer interaction. While existing pose tracking algorithms predominantly follow the top-down paradigm, they face challenges, such as pose occlusion and motion blur in complex scenes, leading to tracking inaccuracies. To address these challenges, we leverage enhanced keypoint information and pose-weighted re-identification (re-ID) features to improve the performance of multi-person pose estimation and tracking. Specifically, our proposed Decouple Heatmap Network decouples heatmaps into keypoint confidence and position. The refined keypoint information are utilized to reconstruct occluded poses. For the pose tracking task, we introduce a more efficient pipeline founded on pose-weighted re-ID features. This pipeline integrates a Pose Embedding Network to allocate weights to re-ID features and achieves the final pose tracking through a novel tracking matching algorithm. Extensive experiments indicate that our approach performs well in both multi-person pose estimation and tracking and achieves state-of-the-art results on the PoseTrack 2017 and 2018 datasets. Our source code is available at: https://github.com/TaoTaoPei/posetracking.</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142189973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Camera-based mapping in search-and-rescue via flying and ground robot teams 通过飞行和地面机器人团队在搜救过程中使用基于摄像头的制图技术
IF 3.3 4区 计算机科学
Machine Vision and Applications Pub Date : 2024-08-20 DOI: 10.1007/s00138-024-01594-4
Bernardo Esteves Henriques, Mirko Baglioni, Anahita Jamshidnejad
{"title":"Camera-based mapping in search-and-rescue via flying and ground robot teams","authors":"Bernardo Esteves Henriques, Mirko Baglioni, Anahita Jamshidnejad","doi":"10.1007/s00138-024-01594-4","DOIUrl":"https://doi.org/10.1007/s00138-024-01594-4","url":null,"abstract":"<p>Search and rescue (SaR) is challenging, due to the unknown environmental situation after disasters occur. Robotics has become indispensable for precise mapping of the environment and for locating the victims. Combining flying and ground robots more effectively serves this purpose, due to their complementary features in terms of viewpoint and maneuvering. To this end, a novel, cost-effective framework for mapping unknown environments is introduced that leverages You Only Look Once and video streams transmitted by a ground and a flying robot. The integrated mapping approach is for performing three crucial SaR tasks: localizing the victims, i.e., determining their position in the environment and their body pose, tracking the moving victims, and providing a map of the ground elevation that assists both the ground robot and the SaR crew in navigating the SaR environment. In real-life experiments at the CyberZoo of the Delft University of Technology, the framework proved very effective and precise for all these tasks, particularly in occluded and complex environments.\u0000</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142189977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Transformer with multi-level grid features and depth pooling for image captioning 具有多级网格特征和深度汇集功能的变换器,用于图像字幕制作
IF 3.3 4区 计算机科学
Machine Vision and Applications Pub Date : 2024-08-20 DOI: 10.1007/s00138-024-01599-z
Doanh C. Bui, Tam V. Nguyen, Khang Nguyen
{"title":"Transformer with multi-level grid features and depth pooling for image captioning","authors":"Doanh C. Bui, Tam V. Nguyen, Khang Nguyen","doi":"10.1007/s00138-024-01599-z","DOIUrl":"https://doi.org/10.1007/s00138-024-01599-z","url":null,"abstract":"<p>Image captioning is an exciting yet challenging problem in both computer vision and natural language processing research. In recent years, this problem has been addressed by Transformer-based models optimized with Cross-Entropy loss and boosted performance via Self-Critical Sequence Training. Two types of representations are embedded into captioning models: grid features and region features, and there have been attempts to include 2D geometry information in the self-attention computation. However, the 3D order of object appearances is not considered, leading to confusion for the model in cases of complex scenes with overlapped objects. In addition, recent studies using only feature maps from the last layer or block of a pretrained CNN-based model may lack spatial information. In this paper, we present the Transformer-based captioning model dubbed TMDNet. Our model includes one module to aggregate multi-level grid features (MGFA) to enrich the representation ability using prior knowledge, and another module to effectively embed the image’s depth-grid aggregation (DGA) into the model space for better performance. The proposed model demonstrates its effectiveness via evaluation on the MS-COCO “Karpathy” test split across five standard metrics.\u0000</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142190020","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信