Haichuan Zhao, Xudong Ru, Peng Du, Shaolong Liu, Na Liu, Xingce Wang, Zhongke Wu
{"title":"Achieving view-distance and -angle invariance in motion prediction using a simple network.","authors":"Haichuan Zhao, Xudong Ru, Peng Du, Shaolong Liu, Na Liu, Xingce Wang, Zhongke Wu","doi":"10.1186/s42492-024-00176-5","DOIUrl":"10.1186/s42492-024-00176-5","url":null,"abstract":"<p><p>Recently, human motion prediction has gained significant attention and achieved notable success. However, current methods primarily rely on training and testing with ideal datasets, overlooking the impact of variations in the viewing distance and viewing angle, which are commonly encountered in practical scenarios. In this study, we address the issue of model invariance by ensuring robust performance despite variations in view distances and angles. To achieve this, we employed Riemannian geometry methods to constrain the learning process of neural networks, enabling the prediction of invariances using a simple network. Furthermore, this enhances the application of motion prediction in various scenarios. Our framework uses Riemannian geometry to encode motion into a novel motion space to achieve prediction with an invariant viewing distance and angle using a simple network. Specifically, the specified path transport square-root velocity function is proposed to aid in removing the view-angle equivalence class and encode motion sequences into a flattened space. Motion coding by the geometry method linearizes the optimization problem in a non-flattened space and effectively extracts motion information, allowing the proposed method to achieve competitive performance using a simple network. Experimental results on Human 3.6M and CMU MoCap demonstrate that the proposed framework has competitive performance and invariance to the viewing distance and viewing angle.</p>","PeriodicalId":29931,"journal":{"name":"Visual Computing for Industry Biomedicine and Art","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11519277/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142523255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Real-time volume rendering for three-dimensional fetal ultrasound using volumetric photon mapping.","authors":"Jing Zou, Jing Qin","doi":"10.1186/s42492-024-00177-4","DOIUrl":"https://doi.org/10.1186/s42492-024-00177-4","url":null,"abstract":"<p><p>Three-dimensional (3D) fetal ultrasound has been widely used in prenatal examinations. Realistic and real-time volumetric ultrasound volume rendering can enhance the effectiveness of diagnoses and assist obstetricians and pregnant mothers in communicating. However, this remains a challenging task because (1) there is a large amount of speckle noise in ultrasound images and (2) ultrasound images usually have low contrasts, making it difficult to distinguish different tissues and organs. However, traditional local-illumination-based methods do not achieve satisfactory results. This real-time requirement makes the task increasingly challenging. This study presents a novel real-time volume-rendering method equipped with a global illumination model for 3D fetal ultrasound visualization. This method can render direct illumination and indirect illumination separately by calculating single scattering and multiple scattering radiances, respectively. The indirect illumination effect was simulated using volumetric photon mapping. Calculating each photon's brightness is proposed using a novel screen-space destiny estimation to avoid complicated storage structures and accelerate computation. This study proposes a high dynamic range approach to address the issue of fetal skin with a dynamic range exceeding that of the display device. Experiments show that our technology, compared to conventional methodologies, can generate realistic rendering results with far more depth information.</p>","PeriodicalId":29931,"journal":{"name":"Visual Computing for Industry Biomedicine and Art","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11511803/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142509383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dennis Hein, Staffan Holmin, Timothy Szczykutowicz, Jonathan S Maltz, Mats Danielsson, Ge Wang, Mats Persson
{"title":"Noise suppression in photon-counting computed tomography using unsupervised Poisson flow generative models.","authors":"Dennis Hein, Staffan Holmin, Timothy Szczykutowicz, Jonathan S Maltz, Mats Danielsson, Ge Wang, Mats Persson","doi":"10.1186/s42492-024-00175-6","DOIUrl":"10.1186/s42492-024-00175-6","url":null,"abstract":"<p><p>Deep learning (DL) has proven to be important for computed tomography (CT) image denoising. However, such models are usually trained under supervision, requiring paired data that may be difficult to obtain in practice. Diffusion models offer unsupervised means of solving a wide range of inverse problems via posterior sampling. In particular, using the estimated unconditional score function of the prior distribution, obtained via unsupervised learning, one can sample from the desired posterior via hijacking and regularization. However, due to the iterative solvers used, the number of function evaluations (NFE) required may be orders of magnitudes larger than for single-step samplers. In this paper, we present a novel image denoising technique for photon-counting CT by extending the unsupervised approach to inverse problem solving to the case of Poisson flow generative models (PFGM)++. By hijacking and regularizing the sampling process we obtain a single-step sampler, that is NFE = 1. Our proposed method incorporates posterior sampling using diffusion models as a special case. We demonstrate that the added robustness afforded by the PFGM++ framework yields significant performance gains. Our results indicate competitive performance compared to popular supervised, including state-of-the-art diffusion-style models with NFE = 1 (consistency models), unsupervised, and non-DL-based image denoising techniques, on clinical low-dose CT data and clinical images from a prototype photon-counting CT system developed by GE HealthCare.</p>","PeriodicalId":29931,"journal":{"name":"Visual Computing for Industry Biomedicine and Art","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11420411/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142297060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Natalie Hube, Melissa Reinelt, Kresimir Vidackovic, Michael Sedlmair
{"title":"A study on the influence of situations on personal avatar characteristics.","authors":"Natalie Hube, Melissa Reinelt, Kresimir Vidackovic, Michael Sedlmair","doi":"10.1186/s42492-024-00174-7","DOIUrl":"10.1186/s42492-024-00174-7","url":null,"abstract":"<p><p>Avatars play a key role in how persons interact within virtual environments, acting as the digital selves. There are many types of avatars, each serving the purpose of representing users or others in these immersive spaces. However, the optimal approach for these avatars remains unclear. Although consumer applications often use cartoon-like avatars, this trend is not as common in work settings. To gain a better understanding of the kinds of avatars people prefer, three studies were conducted involving both screen-based and virtual reality setups, looking into how social settings might affect the way people choose their avatars. Personalized avatars were created for 91 participants, including 71 employees in the automotive field and 20 participants not affiliated with the company. The research shows that work-type situations influence the chosen avatar. At the same time, a correlation between the type of display medium used to display the avatar or the person's personality and their avatar choice was not found. Based on the findings, recommendations are made for future avatar representations in work environments and implications and research questions derived that can guide future research.</p>","PeriodicalId":29931,"journal":{"name":"Visual Computing for Industry Biomedicine and Art","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11420416/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142297059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Machine learning approach for the prediction of macrosomia.","authors":"Xiaochen Gu, Ping Huang, Xiaohua Xu, Zhicheng Zheng, Kaiju Luo, Yujie Xu, Yizhen Jia, Yongjin Zhou","doi":"10.1186/s42492-024-00172-9","DOIUrl":"10.1186/s42492-024-00172-9","url":null,"abstract":"<p><p>Fetal macrosomia is associated with maternal and newborn complications due to incorrect fetal weight estimation or inappropriate choice of delivery models. The early screening and evaluation of macrosomia in the third trimester can improve delivery outcomes and reduce complications. However, traditional clinical and ultrasound examinations face difficulties in obtaining accurate fetal measurements during the third trimester of pregnancy. This study aims to develop a comprehensive predictive model for detecting macrosomia using machine learning (ML) algorithms. The accuracy of macrosomia prediction using logistic regression, k-nearest neighbors, support vector machine, random forest (RF), XGBoost, and LightGBM algorithms was explored. Each approach was trained and validated using data from 3244 pregnant women at a hospital in southern China. The information gain method was employed to identify deterministic features associated with the occurrence of macrosomia. The performance of six ML algorithms based on the recall and area under the curve evaluation metrics were compared. To develop an efficient prediction model, two sets of experiments based on ultrasound examination records within 1-7 days and 8-14 days prior to delivery were conducted. The ensemble model, comprising the RF, XGBoost, and LightGBM algorithms, showed encouraging results. For each experimental group, the proposed ensemble model outperformed other ML approaches and the traditional Hadlock formula. The experimental results indicate that, with the most risk-relevant features, the ML algorithms presented in this study can predict macrosomia and assist obstetricians in selecting more appropriate delivery models.</p>","PeriodicalId":29931,"journal":{"name":"Visual Computing for Industry Biomedicine and Art","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11349957/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142074113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qiushi Nie, Xiaoqing Zhang, Yan Hu, Mingdao Gong, Jiang Liu
{"title":"Medical image registration and its application in retinal images: a review.","authors":"Qiushi Nie, Xiaoqing Zhang, Yan Hu, Mingdao Gong, Jiang Liu","doi":"10.1186/s42492-024-00173-8","DOIUrl":"10.1186/s42492-024-00173-8","url":null,"abstract":"<p><p>Medical image registration is vital for disease diagnosis and treatment with its ability to merge diverse information of images, which may be captured under different times, angles, or modalities. Although several surveys have reviewed the development of medical image registration, they have not systematically summarized the existing medical image registration methods. To this end, a comprehensive review of these methods is provided from traditional and deep-learning-based perspectives, aiming to help audiences quickly understand the development of medical image registration. In particular, we review recent advances in retinal image registration, which has not attracted much attention. In addition, current challenges in retinal image registration are discussed and insights and prospects for future research provided.</p>","PeriodicalId":29931,"journal":{"name":"Visual Computing for Industry Biomedicine and Art","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11339199/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142018904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhihao Chen, Bin Hu, Chuang Niu, Tao Chen, Yuxin Li, Hongming Shan, Ge Wang
{"title":"IQAGPT: computed tomography image quality assessment with vision-language and ChatGPT models.","authors":"Zhihao Chen, Bin Hu, Chuang Niu, Tao Chen, Yuxin Li, Hongming Shan, Ge Wang","doi":"10.1186/s42492-024-00171-w","DOIUrl":"10.1186/s42492-024-00171-w","url":null,"abstract":"<p><p>Large language models (LLMs), such as ChatGPT, have demonstrated impressive capabilities in various tasks and attracted increasing interest as a natural language interface across many domains. Recently, large vision-language models (VLMs) that learn rich vision-language correlation from image-text pairs, like BLIP-2 and GPT-4, have been intensively investigated. However, despite these developments, the application of LLMs and VLMs in image quality assessment (IQA), particularly in medical imaging, remains unexplored. This is valuable for objective performance evaluation and potential supplement or even replacement of radiologists' opinions. To this end, this study introduces IQAGPT, an innovative computed tomography (CT) IQA system that integrates image-quality captioning VLM with ChatGPT to generate quality scores and textual reports. First, a CT-IQA dataset comprising 1,000 CT slices with diverse quality levels is professionally annotated and compiled for training and evaluation. To better leverage the capabilities of LLMs, the annotated quality scores are converted into semantically rich text descriptions using a prompt template. Second, the image-quality captioning VLM is fine-tuned on the CT-IQA dataset to generate quality descriptions. The captioning model fuses image and text features through cross-modal attention. Third, based on the quality descriptions, users verbally request ChatGPT to rate image-quality scores or produce radiological quality reports. Results demonstrate the feasibility of assessing image quality using LLMs. The proposed IQAGPT outperformed GPT-4 and CLIP-IQA, as well as multitask classification and regression models that solely rely on images.</p>","PeriodicalId":29931,"journal":{"name":"Visual Computing for Industry Biomedicine and Art","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11300764/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141890259","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Muhammad Ramzan, Jinfang Sheng, Muhammad Usman Saeed, Bin Wang, Faisal Z Duraihem
{"title":"Revolutionizing anemia detection: integrative machine learning models and advanced attention mechanisms.","authors":"Muhammad Ramzan, Jinfang Sheng, Muhammad Usman Saeed, Bin Wang, Faisal Z Duraihem","doi":"10.1186/s42492-024-00169-4","DOIUrl":"10.1186/s42492-024-00169-4","url":null,"abstract":"<p><p>This study addresses the critical issue of anemia detection using machine learning (ML) techniques. Although a widespread blood disorder with significant health implications, anemia often remains undetected. This necessitates timely and efficient diagnostic methods, as traditional approaches that rely on manual assessment are time-consuming and subjective. The present study explored the application of ML - particularly classification models, such as logistic regression, decision trees, random forest, support vector machines, Naïve Bayes, and k-nearest neighbors - in conjunction with innovative models incorporating attention modules and spatial attention to detect anemia. The proposed models demonstrated promising results, achieving high accuracy, precision, recall, and F1 scores for both textual and image datasets. In addition, an integrated approach that combines textual and image data was found to outperform the individual modalities. Specifically, the proposed AlexNet Multiple Spatial Attention model achieved an exceptional accuracy of 99.58%, emphasizing its potential to revolutionize automated anemia detection. The results of ablation studies confirm the significance of key components - including the blue-green-red, multiple, and spatial attentions - in enhancing model performance. Overall, this study presents a comprehensive and innovative framework for noninvasive anemia detection, contributing valuable insights to the field.</p>","PeriodicalId":29931,"journal":{"name":"Visual Computing for Industry Biomedicine and Art","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11255163/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141627889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yufei Li, Yufei Xin, Xinni Li, Yinrui Zhang, Cheng Liu, Zhengwen Cao, Shaoyi Du, Lin Wang
{"title":"Omni-dimensional dynamic convolution feature coordinate attention network for pneumonia classification.","authors":"Yufei Li, Yufei Xin, Xinni Li, Yinrui Zhang, Cheng Liu, Zhengwen Cao, Shaoyi Du, Lin Wang","doi":"10.1186/s42492-024-00168-5","DOIUrl":"10.1186/s42492-024-00168-5","url":null,"abstract":"<p><p>Pneumonia is a serious disease that can be fatal, particularly among children and the elderly. The accuracy of pneumonia diagnosis can be improved by combining artificial-intelligence technology with X-ray imaging. This study proposes X-ODFCANet, which addresses the issues of low accuracy and excessive parameters in existing deep-learning-based pneumonia-classification methods. This network incorporates a feature coordination attention module and an omni-dimensional dynamic convolution (ODConv) module, leveraging the residual module for feature extraction from X-ray images. The feature coordination attention module utilizes two one-dimensional feature encoding processes to aggregate feature information from different spatial directions. Additionally, the ODConv module extracts and fuses feature information in four dimensions: the spatial dimension of the convolution kernel, input and output channel quantities, and convolution kernel quantity. The experimental results demonstrate that the proposed method can effectively improve the accuracy of pneumonia classification, which is 3.77% higher than that of ResNet18. The model parameters are 4.45M, which was reduced by approximately 2.5 times. The code is available at https://github.com/limuni/X-ODFCANET .</p>","PeriodicalId":29931,"journal":{"name":"Visual Computing for Industry Biomedicine and Art","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11231110/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141555547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}