{"title":"PackMolds: computational design of packaging molds for thermoforming","authors":"Naoki Kita","doi":"10.1007/s00371-024-03462-8","DOIUrl":"https://doi.org/10.1007/s00371-024-03462-8","url":null,"abstract":"<p>We present a novel technique for designing molds suitable for desktop thermoforming, specifically for creating packaging such as blister packs. Our molds, <i>PackMolds</i>, feature neither undercuts nor negative draft angles, facilitating their easy release from thermoformed plastic sheets. In this study, we optimize the geometry of <i>PackMolds</i> to comply with user-specified draft angle constraints. Instead of simulating the traditional thermoforming process, which necessitates time discretization and specifying detailed parameters for both material properties and machine configuration to achieve an accurate simulation result, we formulate our problem as a constrained geometric optimization problem and solve it using a gradient-based solver. Additionally, in contrast to industrial thermoforming, which benefits from advanced tools, desktop thermoforming lacks such sophisticated resources. Therefore, we introduce a suite of assistive tools to enhance the success of desktop thermoforming. Furthermore, we demonstrate its wide applicability by showcasing its use in not only designing blister packs but also in creating double-sided blister packs and model stands.\u0000</p>","PeriodicalId":501186,"journal":{"name":"The Visual Computer","volume":"28 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141509207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning kernel parameter lookup tables to implement adaptive bilateral filtering","authors":"Runtao Xi, Jiahao Lyu, Kang Sun, Tian Ma","doi":"10.1007/s00371-024-03553-6","DOIUrl":"https://doi.org/10.1007/s00371-024-03553-6","url":null,"abstract":"<p>Bilateral filtering is a widely used image smoothing filter that preserves image edges while also smoothing texture. In previous research, the focus of improving the bilateral filter has primarily been on constructing an adaptive range kernel. However, recent research has shown that even slight noise perturbations can prevent the bilateral filter from effectively preserving image edges. To address this issue, we employ a neural network to learn the kernel parameters that can effectively counteract noise perturbations. Additionally, to enhance the adaptability of the learned kernel parameters to the local edge features of the image, we utilize the edge-sensitive indexing method to construct kernel parameter lookup tables (LUTs). During testing, we determine the appropriate spatial kernel and range kernel parameters for each pixel using a lookup table and interpolation. This allows us to effectively smooth the image in the presence of noise perturbation. In this paper, we conducted comparative experiments on several datasets to verify that the proposed method outperforms existing bilateral filtering methods in preserving image structure, removing image texture, and resisting slight noise perturbations. The code is available at https://github.com/FightingSrain/AdaBFLUT.</p>","PeriodicalId":501186,"journal":{"name":"The Visual Computer","volume":"43 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141529902","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tristan Wirth, Arne Rak, Max von Buelow, Volker Knauthe, Arjan Kuijper, Dieter W. Fellner
{"title":"NeRF-FF: a plug-in method to mitigate defocus blur for runtime optimized neural radiance fields","authors":"Tristan Wirth, Arne Rak, Max von Buelow, Volker Knauthe, Arjan Kuijper, Dieter W. Fellner","doi":"10.1007/s00371-024-03507-y","DOIUrl":"https://doi.org/10.1007/s00371-024-03507-y","url":null,"abstract":"<p>Neural radiance fields (NeRFs) have revolutionized novel view synthesis, leading to an unprecedented level of realism in rendered images. However, the reconstruction quality of NeRFs suffers significantly from out-of-focus regions in the input images. We propose NeRF-FF, a plug-in strategy that estimates image masks based on Focus Frustums (FFs), i.e., the visible volume in the scene space that is in-focus. NeRF-FF enables a subsequently trained NeRF model to omit out-of-focus image regions during the training process. Existing methods to mitigate the effects of defocus blurred input images often leverage dynamic ray generation. This makes them incompatible with the static ray assumptions employed by runtime-performance-optimized NeRF variants, such as Instant-NGP, leading to high training times. Our experiments show that NeRF-FF outperforms state-of-the-art approaches regarding training time by two orders of magnitude—reducing it to under 1 min on end-consumer hardware—while maintaining comparable visual quality.\u0000</p>","PeriodicalId":501186,"journal":{"name":"The Visual Computer","volume":"45 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141509208","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DCSG: data complement pseudo-label refinement and self-guided pre-training for unsupervised person re-identification","authors":"Qing Han, Jiongjin Chen, Weidong Min, Jiahao Li, Lixin Zhan, Longfei Li","doi":"10.1007/s00371-024-03542-9","DOIUrl":"https://doi.org/10.1007/s00371-024-03542-9","url":null,"abstract":"<p>Existing unsupervised person re-identification (Re-ID) methods use clustering to generate pseudo-labels that are generally noisy, and initializing the model with ImageNet pre-training weights introduces a large domain gap that severely impacts the model’s performance. To address the aforementioned issues, we propose the data complement pseudo-label refinement and self-guided pre-training framework, referred to as DCSG. Firstly, our method utilizes image information from multiple augmentation views to complement the source image data, resulting in aggregated information. We employ this aggregated information to design a correlation score that serves as a reliability evaluation for the source features and cluster centroids. By optimizing the pseudo-labels for each sample, we enhance their robustness. Secondly, we propose a pre-training strategy that leverages the potential information within the training process. This strategy involves mining classes with high similarity in the training set to guide model training and facilitate smooth pre-training. Consequently, the model acquires preliminary capabilities to distinguish pedestrian-related features at an early stage of training, thereby reducing the impact of domain gaps arising from ImageNet pre-training weights. Our method demonstrates superior performance on multiple person Re-ID datasets, validating the effectiveness of our proposed approach. Notably, it achieves an mAP metric of 84.3% on the Market1501 dataset, representing a 2.8% improvement compared to the state-of-the-art method. The code is available at https://github.com/duolaJohn/DCSG.git.</p>","PeriodicalId":501186,"journal":{"name":"The Visual Computer","volume":"22-23 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141529900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Distribution-decouple learning network: an innovative approach for single image dehazing with spatial and frequency decoupling","authors":"Yabo Wu, Wenting Li, Ziyang Chen, Hui Wen, Zhongwei Cui, Yongjun Zhang","doi":"10.1007/s00371-024-03556-3","DOIUrl":"https://doi.org/10.1007/s00371-024-03556-3","url":null,"abstract":"<p>Image dehazing methods face challenges in addressing the high coupling between haze and object feature distributions in the spatial and frequency domains. This coupling often results in oversharpening, color distortion, and blurring of details during the dehazing process. To address these issues, we introduce the distribution-decouple module (DDM) and dual-frequency attention mechanism (DFAM). The DDM works effectively in the spatial domain, decoupling haze and object features through a feature decoupler and then uses a two-stream modulator to further reduce the negative impact of haze on the distribution of object features. Simultaneously, the DFAM focuses on decoupling information in the frequency domain, separating high- and low-frequency information and applying attention to different frequency components for frequency calibration. Finally, we introduce a novel dehazing network, the distribution-decouple learning network for single image dehazing with spatial and frequency decoupling (DDLNet). This network integrates DDM and DFAM, effectively addressing the issue of coupled feature distributions in both spatial and frequency domains, thereby enhancing the clarity and fidelity of the dehazed images. Extensive experiments indicate the outperformance of our DDLNet when compared to the state-of-the-art (SOTA) methods, achieving a 1.50 dB increase in PSNR on the SOTS-indoor dataset. Concomitantly, it indicates a 1.26 dB boost on the SOTS-outdoor dataset. Additionally, our method performs significantly well on the nighttime dehazing dataset NHR, achieving a 0.91 dB improvement. Code and trained models are available at https://github.com/aoe-wyb/DDLNet.</p>","PeriodicalId":501186,"journal":{"name":"The Visual Computer","volume":"19 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141500571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-granularity hypergraph-guided transformer learning framework for visual classification","authors":"Jianjian Jiang, Ziwei Chen, Fangyuan Lei, Long Xu, Jiahao Huang, Xiaochen Yuan","doi":"10.1007/s00371-024-03541-w","DOIUrl":"https://doi.org/10.1007/s00371-024-03541-w","url":null,"abstract":"<p>Fine-grained single-label classification tasks aim to distinguish highly similar categories but often overlook inter-category relationships. Hierarchical multi-granularity visual classification strives to categorize image labels at various hierarchy levels, offering optimize label selection for people. This paper addresses the hierarchical multi-granularity classification problem from two perspectives: (1) effective utilization of labels at different levels and (2) efficient learning to distinguish multi-granularity visual features. To tackle these issues, we propose a novel multi-granularity hypergraph-guided transformer learning framework (MHTL), seamlessly integrating swin transformers and hypergraph neural networks for handling visual classification tasks. Firstly, we employ swin transformer as an image hierarchical feature learning (IHFL) module to capture hierarchical features. Secondly, a feature reassemble (FR) module is applied to rearrange features at different hierarchy levels, creating a spectrum of features from coarse to fine-grained. Thirdly, we propose a feature relationship mining (FRM) module, to unveil the correlation between features at different granularity. Within this module, we introduce a learnable hypergraph modeling method to construct coarse to fine-grained hypergraph structures. Simultaneously, multi-granularity hypergraph neural networks are employed to explore grouping relationships across different granularities, thereby enhancing the learning of semantic feature representations. Finally, we adopt a multi-granularity classifier (MC) to predict hierarchical label probabilities. Experimental results demonstrate that MHTL outperforms other state-of-the-art classification methods across three multi-granularity datasets. The source code and models are released at https://github.com/JJJTF/MHTL.</p>","PeriodicalId":501186,"journal":{"name":"The Visual Computer","volume":"33 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141529903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"TransFGVC: transformer-based fine-grained visual classification","authors":"Longfeng Shen, Bin Hou, Yulei Jian, Xisong Tu, Yingjie Zhang, Lingying Shuai, Fangzhen Ge, Debao Chen","doi":"10.1007/s00371-024-03545-6","DOIUrl":"https://doi.org/10.1007/s00371-024-03545-6","url":null,"abstract":"<p>Fine-grained visual classification (FGVC) aims to identify subcategories of objects within the same superclass. This task is challenging owing to high intra-class variance and low inter-class variance. The most recent methods focus on locating discriminative areas and then training the classification network to further capture the subtle differences among them. On the one hand, the detection network often obtains an entire part of the object, and positioning errors occur. On the other hand, these methods ignore the correlations between the extracted regions. We propose a novel highly scalable approach, called TransFGVC, that cleverly combines Swin Transformers with long short-term memory (LSTM) networks to address the above problems. The Swin Transformer is used to obtain remarkable visual tokens through self-attention layer stacking, and LSTM is used to model them globally, which not only accurately locates the discriminative region but also further introduces global information that is important for FGVC. The proposed method achieves competitive performance with accuracy rates of 92.7%, 91.4% and 91.5% using the public CUB-200-2011 and NABirds datasets and our Birds-267-2022 dataset, and the Params and FLOPs of our method are 25% and 27% lower, respectively, than the current SotA method HERBS. To effectively promote the development of FGVC, we developed the Birds-267-2022 dataset, which has 267 categories and 12,233 images.\u0000</p>","PeriodicalId":501186,"journal":{"name":"The Visual Computer","volume":"18 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141509212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Neil Patrick Del Gallego, Joel Ilao, Macario II Cordel, Conrado Ruiz
{"title":"Training a shadow removal network using only 3D primitive occluders","authors":"Neil Patrick Del Gallego, Joel Ilao, Macario II Cordel, Conrado Ruiz","doi":"10.1007/s00371-024-03536-7","DOIUrl":"https://doi.org/10.1007/s00371-024-03536-7","url":null,"abstract":"<p>Removing shadows in images is often a necessary pre-processing task for improving the performance of computer vision applications. Deep learning shadow removal approaches require a large-scale dataset that is challenging to gather. To address the issue of limited shadow data, we present a new and cost-effective method of synthetically generating shadows using 3D virtual primitives as occluders. We simulate the shadow generation process in a virtual environment where foreground objects are composed of mapped textures from the Places-365 dataset. We argue that complex shadow regions can be approximated by mixing primitives, analogous to how 3D models in computer graphics can be represented as triangle meshes. We use the proposed synthetic shadow removal dataset, <i>DLSUSynthPlaces-100K</i>, to train a feature-attention-based shadow removal network without explicit domain adaptation or style transfer strategy. The results of this study show that the trained network achieves competitive results with state-of-the-art shadow removal networks that were trained purely on typical SR datasets such as ISTD or SRD. Using a synthetic shadow dataset of only triangular prisms and spheres as occluders produces the best results. Therefore, the synthetic shadow removal dataset can be a viable alternative for future deep-learning shadow removal methods. The source code and dataset can be accessed at this link: https://neildg.github.io/SynthShadowRemoval/.</p>","PeriodicalId":501186,"journal":{"name":"The Visual Computer","volume":"101 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141509210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Vision transformers (ViT) and deep convolutional neural network (D-CNN)-based models for MRI brain primary tumors images multi-classification supported by explainable artificial intelligence (XAI)","authors":"Hiba Mzoughi, Ines Njeh, Mohamed BenSlima, Nouha Farhat, Chokri Mhiri","doi":"10.1007/s00371-024-03524-x","DOIUrl":"https://doi.org/10.1007/s00371-024-03524-x","url":null,"abstract":"<p>The manual classification of primary brain tumors through Magnetic Resonance Imaging (MRI) is considered as a critical task during the clinical routines that requires highly qualified neuroradiologists. Deep Learning (DL)-based computer-aided diagnosis tools are established to support the neurosurgeons’ opinion during the diagnosis. However, the black-box nature and the lack of transparency and interpretability of such DL-based models make their implementation, especially in critical and sensitive medical applications, very difficult. The explainable artificial intelligence techniques help to gain clinicians’ confidence and to provide explanations about the models' predictions. Typical and existing Convolutional Neural Network (CNN)-based architectures could not capture long-range global information and feature from pathology MRI scans. Recently, Vision Transformer (ViT) networks have been introduced to solve the issue of long-range dependency in CNN-based architecture by introducing a self-attention mechanism to analyze images, allowing the network to capture deep long-range reliance between pixels. The purpose of the proposed study is to provide efficient CAD tool for MRI brain tumor classification. At the same, we aim to enhance the neuroradiologists' confidence when using DL in clinical and medical standards. In this paper, we investigated a deep ViT architecture trained from scratch for the multi-classification task of common primary tumors (gliomas, meningiomas, and pituitary brain tumors), using T1-weighted contrast-enhanced MRI sequences. Several XAI techniques have been adopted: Gradient-weighted Class Activation Mapping (Grad-CAM), Local Interpretable Model-agnostic Explanations (LIME), and SHapley Additive exPlanations (SHAP), to visualize the most significant and distinguishing features related to the model prediction results. A publicly available benchmark dataset has been used for the evaluation task. The comparative study confirms the efficiency of ViT architecture compared to the CNN model using the testing dataset. The test accuracy of 83.37% for the Convolutional Neural Network (CNN) and 91.61% for the Vision Transformer (ViT) indicates that the ViT model outperformed the CNN model in the classification task. Based on the experimental results, we could confirm that the proposed ViT model presents a competitive performance outperforming the multi-classification state-of-the-art models using MRI sequences. Further, the proposed models present an exact and correct interpretation. Thus, we could confirm that the proposed CAD could be established during the clinical diagnosis routines.</p>","PeriodicalId":501186,"journal":{"name":"The Visual Computer","volume":"93 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141509214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Advancing autism prediction through visual-based AI approaches: integrating advanced eye movement analysis and shape recognition with Kalman filtering","authors":"Suresh Cheekaty, G. Muneeswari","doi":"10.1007/s00371-024-03529-6","DOIUrl":"https://doi.org/10.1007/s00371-024-03529-6","url":null,"abstract":"<p>In the recent past, the global prevalence of autism spectrum disorder (ASD) has witnessed a remarkable surge, underscoring its significance as a widespread neurodevelopmental disorder affecting children, with an incidence rate of 0.62%. Individuals diagnosed with ASD often grapple with challenges in language acquisition and comprehending verbal communication, compounded by difficulties in nonverbal communication aspects such as gestures and eye contact. Eye movement analysis, a multifaceted field spanning industrial engineering to psychology, offers invaluable insights into human attention and behavior patterns. The present study proposes an economical eye movement analysis system that adroitly integrates Neuro Spectrum Net (NSN) techniques with Kalman filtering, enabling precise eye position estimation. The overarching objective is to enhance deep learning models for early autism detection by leveraging eye-tracking data, a critical consideration given the pivotal role of early intervention in mitigating the disorder’s impact. Through the synergistic incorporation of NSN and contrast-limited adaptive histogram equalization for feature extraction, the proposed model exhibits superior scalability and accuracy when compared to existing methodologies, thereby holding promising potential for clinical applications. A comprehensive series of experiments and rigorous evaluations underscore the system’s efficacy in eye movement classification and pupil position identification, outperforming traditional Recurrent Neural Network approaches. The dataset utilized in the aforementioned scholarly article is accessible through the Zenodo repository and can be retrieved via the following link: [https://zenodo.org/records/10935303?preview=1].</p>","PeriodicalId":501186,"journal":{"name":"The Visual Computer","volume":"36 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141509213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}