Priyadarsan Parida , Manoj Kumar Panda , Deepak Kumar Rout , Saroj Kumar Panda
{"title":"Infrared and visible image fusion using quantum computing induced edge preserving filter","authors":"Priyadarsan Parida , Manoj Kumar Panda , Deepak Kumar Rout , Saroj Kumar Panda","doi":"10.1016/j.imavis.2024.105344","DOIUrl":"10.1016/j.imavis.2024.105344","url":null,"abstract":"<div><div>Information fusion by utilization of visible and thermal images provides a more comprehensive scene understanding in the resulting image rather than individual source images. It applies to wide areas of applications such as navigation, surveillance, remote sensing, and military where significant information is obtained from diverse modalities making it quite challenging. The challenges involved in integrating the various sources of data are due to the diverse modalities of imaging sensors along with the complementary information. So, there is a need for precise information integration in terms of infrared (IR) and visible image fusion while retaining useful information from both sources. Therefore, in this article, a unique image fusion methodology is presented that focuses on enhancing the prominent details of both images, preserving the textural information with reduced noise from either of the sources. In this regard, we put forward a quantum computing-induced IR and visible image fusion technique which preserves the required information with highlighted details from the source images efficiently. Initially, the proposed edge detail preserving strategy is capable of retaining the salient details accurately from the source images. Further, the proposed quantum computing-induced weight map generation mechanism preserves the complementary details with fewer redundant details which produces quantum details. Again the prominent features of the source images are retained using highly rich information. Finally, the quantum and the prominent details are utilized to produce the fused image for the corresponding source image pair. Both subjective and objective analyses are utilized to validate the effectiveness of the proposed algorithm. The efficacy of the developed model is validated by comparing the outcomes attained by it against twenty-six existing fusion algorithms. From various experiments, it is observed that the developed framework achieved higher accuracy in terms of visual demonstration as well as quantitative assessments compared to different deep-learning and non-deep learning-based state-of-the-art (SOTA) techniques.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"153 ","pages":"Article 105344"},"PeriodicalIF":4.2,"publicationDate":"2024-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142748135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Unified Volumetric Avatar: Enabling flexible editing and rendering of neural human representations","authors":"Jinlong Fan, Xudong Lv, Xuepu Zeng, Zhengyi Bao, Zhiwei He, Mingyu Gao","doi":"10.1016/j.imavis.2024.105345","DOIUrl":"10.1016/j.imavis.2024.105345","url":null,"abstract":"<div><div>Neural Radiance Field (NeRF) has emerged as a leading method for reconstructing 3D human avatars with exceptional rendering capabilities, particularly for novel view and pose synthesis. However, current approaches for editing these avatars are limited, typically allowing only global geometry adjustments or texture modifications via neural texture maps. This paper introduces Unified Volumetric Avatar, a novel framework enabling independent and simultaneous global and local editing of both geometry and texture of 3D human avatars and user-friendly manipulation. The proposed approach seamlessly integrates implicit neural fields with an explicit polygonal mesh, leveraging distinct geometry and appearance latent codes attached to the body mesh for precise local edits. These trackable latent codes permeate through the 3D space via barycentric interpolation, mitigating spatial ambiguity with the aid of a local signed height indicator. Furthermore, our method enhances surface illumination representation across different poses by incorporating a pose-dependent shading factor instead of relying on view-dependent radiance color. Experimental results on multiple human avatars demonstrate its efficacy in achieving competitive results for novel view synthesis and novel pose rendering, showcasing its potential for versatile human representation. The source code will be made publicly available.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"153 ","pages":"Article 105345"},"PeriodicalIF":4.2,"publicationDate":"2024-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142748134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"IFE-Net: Integrated feature enhancement network for image manipulation localization","authors":"Lichao Su , Chenwei Dai , Hao Yu , Yun Chen","doi":"10.1016/j.imavis.2024.105342","DOIUrl":"10.1016/j.imavis.2024.105342","url":null,"abstract":"<div><div>Image tampering techniques can lead to distorted or misleading information, which in turn poses a threat in many areas, including social, legal and commercial. Numerous image tampering detection algorithms lose important low-level detail information when extracting deep features, reducing the accuracy and robustness of detection. In order to solve the problems of current methods, this paper proposes a new network called IFE-Net to detect three types of tampered images, namely copy-move, heterologous splicing and removal. Firstly, this paper constructs the noise stream using the attention mechanism CBAM to extract and optimize the noise features. The high-level features are extracted by the backbone network of RGB stream, and the FEASPP module is built for capturing and enhancing the features at different scales. In addition, in this paper, the initial features of RGB stream are additionally supervised so as to limit the detection area and reduce the false alarm. Finally, the final prediction results are obtained by fusing the noise features with the RGB features through the Dual Attention Mechanism (DAM) module. Extensive experimental results on multiple standard datasets show that IFE-Net can accurately locate the tampering region and effectively reduce false alarms, demonstrating superior performance.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"153 ","pages":"Article 105342"},"PeriodicalIF":4.2,"publicationDate":"2024-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142704488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Mobile-friendly and multi-feature aggregation via transformer for human pose estimation","authors":"Biao Li , Shoufeng Tang , Wenyi Li","doi":"10.1016/j.imavis.2024.105343","DOIUrl":"10.1016/j.imavis.2024.105343","url":null,"abstract":"<div><div>Human pose estimation is pivotal for human-centric visual tasks, yet deploying such models on mobile devices remains challenging due to high parameter counts and computational demands. In this paper, we study Mobile-Friendly and Multi-Feature Aggregation architectural designs for human pose estimation and propose a novel model called MobileMultiPose. Specifically, a lightweight aggregation method, incorporating multi-scale and multi-feature, mitigates redundant shallow semantic extraction and local deep semantic constraints. To efficiently aggregate diverse local and global features, a lightweight transformer module, constructed from a self-attention mechanism with linear complexity, is designed, achieving deep fusion of shallow and deep semantics. Furthermore, a multi-scale loss supervision method is incorporated into the training process to enhance model performance, facilitating the effective fusion of edge information across various scales. Extensive experiments show that the smallest variant of MobileMultiPose outperforms lightweight models (MobileNetv2, ShuffleNetv2, and Small HRNet) by 0.7, 5.4, and 10.1 points, respectively, on the COCO validation set, with fewer parameters and FLOPs. In particular, the largest MobileMultiPose variant achieves an impressive AP score of 72.4 on the COCO test-dev set, notably, its parameters and FLOPs are only 16% and 18% of HRNet-W32, and 7% and 9% of DARK, respectively. We aim to offer novel insights into designing lightweight and efficient feature extraction networks, supporting mobile-friendly model deployment.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"153 ","pages":"Article 105343"},"PeriodicalIF":4.2,"publicationDate":"2024-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142704487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Detection of fractional difference in inter vertebral disk MRI images for recognition of low back pain","authors":"Manvendra Singh , Md. Sarfaraj Alam Ansari , Mahesh Chandra Govil","doi":"10.1016/j.imavis.2024.105333","DOIUrl":"10.1016/j.imavis.2024.105333","url":null,"abstract":"<div><div>Low Back Pain (LBP) diagnosis through MR images of IVDs is a challenging task due to complex spinal anatomy and varying image quality. These factors make it difficult to analyse and segment IVD images accurately. Further, simple metrics are ineffective in interpreting nuanced features from IVD images for accurate diagnoses. Overcoming these challenges is crucial to improving the precision and reliability of IVD-based LBP diagnosis. Also, the existing systems have a very high false negative rate pushes the system towards less use. This research study proposes a new framework for the detection of LBP symptoms using the Otsu Segmented Structural and Gray-Level Co-occurrence Matrix (GLCM) feature-based ML-model (OSSG-ML model) that eliminates manual intervention for low back pain detection. The proposed framework uses Otsu segmentation’s dynamic thresholding to differentiate spinal and backdrop pixel clusters. The segmented image is then used by the feature extraction using GLCM and Wavelet-Fourier module to extract two types of features. The first feature type analyzes the structural variation between normal and low back pain symptom patients. The second feature type detects LBP using statistical measures in image analysis and texture recognition of the MRI IVD segmented image. Various machine learning models are built for LBP detection, utilizing both features separately. First, the model employs structural and geometric differences, while the second model analyzes statistical measurements. On evaluating the model’s performance, it accurately detects low back pain with a 98 to 100% accuracy rate and a very low false negative rate.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"153 ","pages":"Article 105333"},"PeriodicalIF":4.2,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142704489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yanliang Ge , Yuxi Zhong , Junchao Ren , Min He , Hongbo Bi , Qiao Zhang
{"title":"Camouflaged Object Detection via location-awareness and feature fusion","authors":"Yanliang Ge , Yuxi Zhong , Junchao Ren , Min He , Hongbo Bi , Qiao Zhang","doi":"10.1016/j.imavis.2024.105339","DOIUrl":"10.1016/j.imavis.2024.105339","url":null,"abstract":"<div><div>Camouflaged object detection aims to completely segment objects immersed in their surroundings from the background. However, existing deep learning methods often suffer from the following shortcomings: <strong>(1)</strong> They have difficulty in accurately perceiving the target location; <strong>(2)</strong> The extraction of multi-scale feature is insufficient. To address the above problems, we proposed a camouflaged object detection network(LFNet) based on location-awareness and feature fusion. Specifically, we designed a status location module(SLM) that dynamically captures the structural features of targets across spatial and channel dimensions to achieve accurate segmentation. Beyond that, a residual feature fusion module(RFFM) was devised to address the challenge of insufficient multi-scale feature integration. Experiments conducted on three standard datasets(CAMO,COD10K and NC4K) demonstrate that LFNet achieves significant improvements compared with 15 state-of-the-art methods. The code will be available at <span><span>https://github.com/ZX123445/LFNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"152 ","pages":"Article 105339"},"PeriodicalIF":4.2,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142701225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhichao Fu , Anran Wu , Zisong Zhuang , Xingjiao Wu , Jun He
{"title":"A lightweight depth completion network with spatial efficient fusion","authors":"Zhichao Fu , Anran Wu , Zisong Zhuang , Xingjiao Wu , Jun He","doi":"10.1016/j.imavis.2024.105335","DOIUrl":"10.1016/j.imavis.2024.105335","url":null,"abstract":"<div><div>Depth completion is a low-level task rebuilding the dense depth from a sparse set of measurements from LiDAR sensors and corresponding RGB images. Current state-of-the-art depth completion methods used complicated network designs with much computational cost increase, which is incompatible with the realistic-scenario limited computational environment. In this paper, we explore a lightweight and efficient depth completion model named Light-SEF. Light-SEF is a two-stage framework that introduces local fusion and global fusion modules to extract and fuse local and global information in the sparse LiDAR data and RGB images. We also propose a unit convolutional structure named spatial efficient block (SEB), which has a lightweight design and extracts spatial features efficiently. As the unit block of the whole network, SEB is much more cost-efficient compared to the baseline design. Experimental results on the KITTI benchmark demonstrate that our Light-SEF achieves significant declines in computational cost (about 53% parameters, 50% FLOPs & MACs, and 36% running time) while showing competitive results compared to state-of-the-art methods.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"153 ","pages":"Article 105335"},"PeriodicalIF":4.2,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142719587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CF-SOLT: Real-time and accurate traffic accident detection using correlation filter-based tracking","authors":"Yingjie Xia , Nan Qian , Lin Guo , Zheming Cai","doi":"10.1016/j.imavis.2024.105336","DOIUrl":"10.1016/j.imavis.2024.105336","url":null,"abstract":"<div><div>Traffic accident detection using video surveillance is valuable research work in intelligent transportation systems. It is useful for responding to traffic accidents promptly that can avoid traffic jam or prevent secondary accident. In traffic accident detection, tracking occluded vehicles in real-time and accurately is one of the major sticking points for practical applications. In order to improve the tracking of occluded vehicles for traffic accident detection, this paper proposes a simple online tracking scheme with correlation filters (CF-SOLT). The CF-SOLT method utilizes a correlation filter-based auxiliary tracker to assist the main tracker. This auxiliary tracker helps prevent target ID switching caused by occlusion, enabling accurate vehicle tracking in occluded scenes. Based on the tracking results, a precise traffic accident detection algorithm is developed by integrating behavior analysis of both vehicles and pedestrians. The improved accident detection algorithm with the correlation filter-based auxiliary tracker can provide shorter response time, enabling quick identification and detection of traffic accidents. The experiments are conducted on the VisDrone2019, MOT-Traffic and Dataset of accident to evaluate the performances metrics of MOTA, IDF1, FPS, precision, response time and others. The results show that CF-SOLT improves MOTA and IDF1 by 5.3% and 6.7%, accident detection precision by 25%, and reduces response time by 56 s.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"152 ","pages":"Article 105336"},"PeriodicalIF":4.2,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142655978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"TransWild: Enhancing 3D interacting hands recovery in the wild with IoU-guided Transformer","authors":"Wanru Zhu , Yichen Zhang , Ke Chen , Lihua Guo","doi":"10.1016/j.imavis.2024.105316","DOIUrl":"10.1016/j.imavis.2024.105316","url":null,"abstract":"<div><div>The recovery of 3D interacting hands meshes in the wild (ITW) is crucial for 3D full-body mesh reconstruction, especially when limited 3D annotations are available. The recent ITW interacting hands recovery method brings two hands to a shared 2D scale space and achieves effective learning of ITW datasets. However, they lack the deep exploitation of the intrinsic interaction dynamics of hands. In this work, we propose TransWild, a novel framework for 3D interactive hand mesh recovery that leverages a weight-shared Intersection-of-Union (IoU) guided Transformer for feature interaction. Based on harmonizing ITW and MoCap datasets within a unified 2D scale space, our hand feature interaction mechanism powered by an IoU-guided Transformer enables a more accurate estimation of interacting hands. This innovation stems from the observation that hand detection yields valuable IoU of two hands bounding box, therefore, an IOU-guided Transformer can significantly enrich the Transformer’s ability to decode and integrate these insights into the interactive hand recovery process. To ensure consistent training outcomes, we have developed a strategy for training with augmented ground truth bounding boxes to address inherent variability. Quantitative evaluations across two prominent benchmarks for 3D interacting hands underscore our method’s superior performance. The code will be released after acceptance.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"152 ","pages":"Article 105316"},"PeriodicalIF":4.2,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142655977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Machine learning applications in breast cancer prediction using mammography","authors":"G.M. Harshvardhan , Kei Mori , Sarika Verma , Lambros Athanasiou","doi":"10.1016/j.imavis.2024.105338","DOIUrl":"10.1016/j.imavis.2024.105338","url":null,"abstract":"<div><div>Breast cancer is the second leading cause of cancer-related deaths among women. Early detection of lumps and subsequent risk assessment significantly improves prognosis. In screening mammography, radiologist interpretation of mammograms is prone to high error rates and requires extensive manual effort. To this end, several computer-aided diagnosis methods using machine learning have been proposed for automatic detection of breast cancer in mammography. In this paper, we provide a comprehensive review and analysis of these methods and discuss practical issues associated with their reproducibility. We aim to aid the readers in choosing the appropriate method to implement and we guide them towards this purpose. Moreover, an effort is made to re-implement a sample of the presented methods in order to highlight the importance of providing technical details associated with those methods. Advancing the domain of breast cancer pathology classification using machine learning involves the availability of public databases and development of innovative methods. Although there is significant progress in both areas, more transparency in the latter would boost the domain progress.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"152 ","pages":"Article 105338"},"PeriodicalIF":4.2,"publicationDate":"2024-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142655920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}