Nahuel E. Garcia-D’Urso, Antonio Macia-Lillo, Higinio Mora-Mora, Jorge Azorin-Lopez, Andres Fuster-Guillo
{"title":"Automated anthropometric measurements from 3D point clouds of scanned bodies","authors":"Nahuel E. Garcia-D’Urso, Antonio Macia-Lillo, Higinio Mora-Mora, Jorge Azorin-Lopez, Andres Fuster-Guillo","doi":"10.1016/j.imavis.2024.105306","DOIUrl":"10.1016/j.imavis.2024.105306","url":null,"abstract":"<div><div>Anthropometry plays a critical role across numerous sectors, particularly within healthcare and fashion, by facilitating the analysis of the human body structure. The significance of anthropometric data cannot be overstated; it is crucial for assessing nutritional status among children and adults alike, enabling early detection of conditions such as malnutrition, obesity, and being overweight. Furthermore, it is instrumental in creating tailored dietary interventions. This study introduces a novel automated technique for extracting anthropometric measurements from any body part. The proposed method leverages a parametric model to accurately determine the measurement parameters from either an unstructured point cloud or a mesh. We conducted a comprehensive evaluation of our approach by comparing perimetral measurements from over 400 body scans with expert assessments and existing state-of-the-art methods. The results demonstrate that our approach significantly surpasses the current methods for measuring the waist, hip, thigh, chest, and wrist perimeters with exceptional accuracy. These findings indicate the potential of our method to automate anthropometric analysis and offer efficient and accurate measurements for various applications in healthcare and fashion industries.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"152 ","pages":"Article 105306"},"PeriodicalIF":4.2,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142571748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Gaussian error loss function for image smoothing","authors":"Wenzheng Dong, Lanling Zeng, Shunli Ji, Yang Yang","doi":"10.1016/j.imavis.2024.105300","DOIUrl":"10.1016/j.imavis.2024.105300","url":null,"abstract":"<div><div>Edge-preserving image smoothing plays an important role in the fields of image processing and computational photography, and is widely used for a variety of applications. The edge-preserving filters based on global optimization models have attracted widespread attention due to their nice smoothing quality. According to existing research, the edge-preserving capability is strongly correlated to the penalty function used for gradient regularization. By analyzing the edge-stopping function of existing penalties, we demonstrate that existing image smoothing models are not adequately edge-preserving. In this paper, based on a Gaussian error function (ERF), we propose a Gaussian error loss function (ERLF), which shows stronger edge-preserving capability. We embed the proposed loss function into a global optimization model for edge-preserving image smoothing. In addition, we propose an efficient solution based on additive half-quadratic minimization and Fourier-domain optimization that is capable of processing 720P color images (over 20 fps) in real-time on an NVIDIA RTX 3070 GPU. We have experimented with the proposed filter on a number of low-level vision tasks. Both quantitative and qualitative experimental results show that the proposed filter outperforms existing filters. Therefore, it can be practical for real applications.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"152 ","pages":"Article 105300"},"PeriodicalIF":4.2,"publicationDate":"2024-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142552769","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tianyi Zhao, Guanci Yang, Yang Li, Minglang Lu, Haoran Sun
{"title":"Multi-object tracking using score-driven hierarchical association strategy between predicted tracklets and objects","authors":"Tianyi Zhao, Guanci Yang, Yang Li, Minglang Lu, Haoran Sun","doi":"10.1016/j.imavis.2024.105303","DOIUrl":"10.1016/j.imavis.2024.105303","url":null,"abstract":"<div><div>Machine vision is one of the major technologies to guarantee intelligent robots’ human-centered embodied intelligence. Especially in the complex dynamic scene involving multi-person, Multi-Object Tracking (MOT), which can accurately identify and track specific targets, significantly influences intelligent robots’ performance regarding behavior perception and monitoring, autonomous decision-making, and providing personalized humanoid services. In order to solve the problem of targets lost and identity switches caused by the scale variations of objects and frequent overlaps during the tracking process, this paper presents a multi-object tracking method using score-driven hierarchical association strategy between predicted tracklets and objects (ScoreMOT). Firstly, a motion prediction of occluded objects based on bounding box variation (MPOBV) is proposed to estimate the position of occluded objects. MPOBV models the motion state of the object using the bounding box and confidence score. Then, a score-driven hierarchical association strategy between predicted tracklets and objects (SHAS) is proposed to correctly associate them in frequently overlapping scenarios. SHAS associates the predicted tracklets and detected objects with different confidence in different stages. The comparison results with 16 state-of-the-art methods on Multiple Object Tracking Benchmark 20 (MOT20) and DanceTrack datasets are conducted, and ScoreMOT outperforms the compared methods.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"152 ","pages":"Article 105303"},"PeriodicalIF":4.2,"publicationDate":"2024-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142552768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A review of recent advances in 3D Gaussian Splatting for optimization and reconstruction","authors":"Jie Luo, Tianlun Huang, Weijun Wang, Wei Feng","doi":"10.1016/j.imavis.2024.105304","DOIUrl":"10.1016/j.imavis.2024.105304","url":null,"abstract":"<div><div>3D Gaussian Splatting (3DGS) represents a significant breakthrough in computer graphics and vision, offering an explicit scene representation and novel view synthesis without the reliance on neural networks, unlike Neural Radiance Fields (NeRF). This paper provides a comprehensive survey of recent research on 3DGS optimization and reconstruction, with a particular focus on studies featuring published or forthcoming open-source code. In terms of optimization, the paper examines techniques such as compression, densification, splitting, anti-aliasing, and reflection enhancement. For reconstruction, it explores methods including surface mesh extraction, sparse-view object and scene reconstruction, large-scale scene reconstruction, and dynamic object and scene reconstruction. Through comparative analysis and case studies, the paper highlights the practical advantages of 3DGS and outlines future research directions, offering valuable insights for advancing the field.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"151 ","pages":"Article 105304"},"PeriodicalIF":4.2,"publicationDate":"2024-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142525914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yang Wang , Ao Wang , Shijia Song , Fan Xie , Chang Ma , Jiawei Xu , Lijun Zhao
{"title":"FHLight: A novel method of indoor scene illumination estimation using improved loss function","authors":"Yang Wang , Ao Wang , Shijia Song , Fan Xie , Chang Ma , Jiawei Xu , Lijun Zhao","doi":"10.1016/j.imavis.2024.105299","DOIUrl":"10.1016/j.imavis.2024.105299","url":null,"abstract":"<div><div>In augmented reality tasks, especially in indoor scenes, achieving illumination consistency between virtual objects and real environments is a critical challenge. Currently, mainstream methods are illumination parameters regression and illumination map generation. Among these two categories of methods, few works can effectively recover both high-frequency and low-frequency illumination information within indoor scenes. In this work, we argue that effective restoration of low-frequency illumination information forms the foundation for capturing high-frequency illumination details. In this way, we propose a novel illumination estimation method called FHLight. Technically, we use a low-frequency spherical harmonic irradiance map (LFSHIM) restored by the low-frequency illumination regression network (LFIRN) as prior information to guide the high-frequency illumination generator (HFIG) to restore the illumination map. Furthermore, we suggest an improved loss function to optimize the network training procedure, ensuring that the model accurately restores both low-frequency and high-frequency illumination information within the scene. We compare FHLight with several competitive methods, and the results demonstrate significant improvements in metrics such as RMSE, si-RMSE, and Angular error. In addition, visual experiments further confirm that FHLight is capable of generating scene illumination maps with genuine frequencies, effectively resolving the illumination consistency issue between virtual objects and real scenes. The code is available at <span><span>https://github.com/WA-tyro/FHLight.git</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"152 ","pages":"Article 105299"},"PeriodicalIF":4.2,"publicationDate":"2024-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142552851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Feature differences reduction and specific features preserving network for RGB-T salient object detection","authors":"Qiqi Xu, Zhenguang Di, Haoyu Dong, Gang Yang","doi":"10.1016/j.imavis.2024.105302","DOIUrl":"10.1016/j.imavis.2024.105302","url":null,"abstract":"<div><div>In RGB-T salient object detection, effective utilization of the different characteristics of RGB and thermal modalities is essential to achieve accurate detection. Most of the previous methods usually only focus on reducing the differences between modalities, which may ignore the specific features that are crucial for salient object detection, leading to suboptimal results. To address the above issue, an RGB-T SOD network that simultaneously considers the reduction of modality differences and the preservation of specific features is proposed. Specifically, we construct a modality differences reduction and specific features preserving module (MDRSFPM) which aims to bridge the gap between modalities and enhance the specific features of each modality. In MDRSFPM, the dynamic vector generated by the interaction of RGB and thermal features is used to reduce modality differences, and then a dual branch is constructed to deal with the RGB and thermal modalities separately, employing a combination of channel-level and spatial-level operations to preserve their respective specific features. In addition, a multi-scale global feature enhancement module (MGFEM) is proposed to enhance global contextual information to provide guidance information for the subsequent decoding stage, so that the model can more easily localize the salient objects. Furthermore, our approach includes a fully fusion and gate module (FFGM) that utilizes dynamically generated importance maps to selectively filter and fuse features during the decoding process. Extensive experiments demonstrate that our proposed model surpasses other state-of-the-art models on three publicly available RGB-T datasets remarkably. Our code will be released at <span><span>https://github.com/JOOOOKII/FRPNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"152 ","pages":"Article 105302"},"PeriodicalIF":4.2,"publicationDate":"2024-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142561279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhuhong Shao , Zuowei Zhang , Leding Li , Hailiang Li , Xuanyi Li , Bicao Li , Yuanyuan Shang , Bin Chen
{"title":"Pyramid quaternion discrete cosine transform based ConvNet for cancelable face recognition","authors":"Zhuhong Shao , Zuowei Zhang , Leding Li , Hailiang Li , Xuanyi Li , Bicao Li , Yuanyuan Shang , Bin Chen","doi":"10.1016/j.imavis.2024.105301","DOIUrl":"10.1016/j.imavis.2024.105301","url":null,"abstract":"<div><div>The current <em>face scanning era</em> can quickly and conveniently attain identity authentication, but face images imply sensitive information simultaneously. Under such context, we introduce a novel cancelable face recognition methodology by using quaternion transform based convolutional network. Firstly, face images in different modalities (e.g., RGB and depth or near-infrared) are encoded into full quaternion matrix for synchronous processing. Based on the designed multiresolution quaternion singular value decomposition, we can obtain pyramid representation. Then they are transformed through random projection for making the process noninvertible. Even if the feature template is compromised, a new one can be generated. Subsequently, a three-stream convolutional network is developed to learn features, where predefined filters are stemmed from quaternion two-dimensional discrete cosine transform basis. Extensive experiments on the TIII-D, NVIE and CASIA datasets have demonstrated that the proposed method obtains competitive performance, also satisfies redistributable and irreversible.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"151 ","pages":"Article 105301"},"PeriodicalIF":4.2,"publicationDate":"2024-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142445305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A decision support system for acute lymphoblastic leukemia detection based on explainable artificial intelligence","authors":"Angelo Genovese, Vincenzo Piuri, Fabio Scotti","doi":"10.1016/j.imavis.2024.105298","DOIUrl":"10.1016/j.imavis.2024.105298","url":null,"abstract":"<div><div>The detection of acute lymphoblastic leukemia (ALL) via deep learning (DL) has received great interest because of its high accuracy in detecting lymphoblasts without the need for handcrafted feature extraction. However, current DL models, such as convolutional neural networks and vision Transformers, are extremely complex, making them black boxes that perform classification in an obscure way. To compensate for this and increase the explainability of the decisions made by such methods, in this paper, we propose an innovative decision support system for ALL detection that is based on DL and explainable artificial intelligence (XAI). Our approach first introduces causality into the decision with a metric learning approach, enabling a decision to be made by analyzing the most similar images in the database. Second, our method integrates XAI techniques to allow even non-trained personnel to obtain an informed decision by analyzing which regions of the images are most similar and how the samples are organized in the latent space. The results on publicly available ALL databases confirm the validity of our approach in opening the black box while achieving similar or superior accuracy to that of existing approaches.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"151 ","pages":"Article 105298"},"PeriodicalIF":4.2,"publicationDate":"2024-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142445306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Parameter efficient finetuning of text-to-image models with trainable self-attention layer","authors":"Zhuoyuan Li, Yi Sun","doi":"10.1016/j.imavis.2024.105296","DOIUrl":"10.1016/j.imavis.2024.105296","url":null,"abstract":"<div><div>We propose a novel model to efficiently finetune pretrained Text-to-Image models by introducing additional image prompts. The model integrates information from image prompts into the text-to-image (T2I) diffusion process by locking the parameters of the large T2I model and reusing its trainable copy, rather than relying on additional adapters. The trainable copy guides the model by injecting its trainable self-attention features into the original diffusion model, enabling the synthesis of a new specific concept. We also apply Low-Rank Adaptation (LoRA) to restrict the trainable parameters in the self-attention layers. Furthermore, the network is optimized alongside a text embedding that serves as an object identifier to generate contextually relevant visual content. Our model is simple and effective, with a small memory footprint, yet can achieve comparable performance to a fully fine-tuned T2I model in both qualitative and quantitative evaluations.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"151 ","pages":"Article 105296"},"PeriodicalIF":4.2,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142525913","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Global information regulation network for multimodal sentiment analysis","authors":"Shufan Xie, Qiaohong Chen, Xian Fang, Qi Sun","doi":"10.1016/j.imavis.2024.105297","DOIUrl":"10.1016/j.imavis.2024.105297","url":null,"abstract":"<div><div>Human language is considered multimodal, containing natural language, visual elements, and acoustic signals. Multimodal Sentiment Analysis (MSA) concentrates on the integration of various modalities to capture the sentiment polarity or intensity expressed in human language. Nevertheless, the absence of a comprehensive strategy for processing and integrating multimodal representations results in the inclusion of inaccurate or noisy data from diverse modalities in the ultimate decision-making process, potentially leading to the neglect of crucial information within or across modalities. To address this issue, we propose the Global Information Regulation Network (GIRN), a novel framework designed to regulate information flow and decision-making processes across various stages, ranging from unimodal feature extraction to multimodal outcome prediction. Specifically, before modal fusion stage, we maximize the mutual information between modalities and refine the input signals through random feature erasing, yielding a more robust unimodal representation. In the process of modal fusion, we enhance the traditional Transformer encoder through the gate mechanism and stacked attention to dynamically fuse the target and auxiliary modalities. After modal fusion, cross-hierarchical contrastive learning and decision gate are employed to integrate the valuable information represented in different categories and hierarchies. Extensive experiments conducted on the CMU-MOSI and CMU-MOSEI datasets suggest that our methodology outperforms existing approaches across nearly all criteria.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"151 ","pages":"Article 105297"},"PeriodicalIF":4.2,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142438436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}