International Journal of Computer Vision最新文献

筛选
英文 中文
Image-Based Virtual Try-On: A Survey 基于图像的虚拟试穿:一项调查
IF 19.5 2区 计算机科学
International Journal of Computer Vision Pub Date : 2024-12-10 DOI: 10.1007/s11263-024-02305-2
Dan Song, Xuanpu Zhang, Juan Zhou, Weizhi Nie, Ruofeng Tong, Mohan Kankanhalli, An-An Liu
{"title":"Image-Based Virtual Try-On: A Survey","authors":"Dan Song, Xuanpu Zhang, Juan Zhou, Weizhi Nie, Ruofeng Tong, Mohan Kankanhalli, An-An Liu","doi":"10.1007/s11263-024-02305-2","DOIUrl":"https://doi.org/10.1007/s11263-024-02305-2","url":null,"abstract":"<p>Image-based virtual try-on aims to synthesize a naturally dressed person image with a clothing image, which revolutionizes online shopping and inspires related topics within image generation, showing both research significance and commercial potential. However, there is a gap between current research progress and commercial applications and an absence of comprehensive overview of this field to accelerate the development. In this survey, we provide a comprehensive analysis of the state-of-the-art techniques and methodologies in aspects of pipeline architecture, person representation and key modules such as try-on indication, clothing warping and try-on stage. We additionally apply CLIP to assess the semantic alignment of try-on results, and evaluate representative methods with uniformly implemented evaluation metrics on the same dataset. In addition to quantitative and qualitative evaluation of current open-source methods, unresolved issues are highlighted and future research directions are prospected to identify key trends and inspire further exploration. The uniformly implemented evaluation metrics, dataset and collected methods will be made public available at https://github.com/little-misfit/Survey-Of-Virtual-Try-On.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"89 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2024-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142805365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Evaluation of Zero-Cost Proxies - from Neural Architecture Performance Prediction to Model Robustness 零成本代理的评估——从神经结构性能预测到模型鲁棒性
IF 19.5 2区 计算机科学
International Journal of Computer Vision Pub Date : 2024-12-09 DOI: 10.1007/s11263-024-02265-7
Jovita Lukasik, Michael Moeller, Margret Keuper
{"title":"An Evaluation of Zero-Cost Proxies - from Neural Architecture Performance Prediction to Model Robustness","authors":"Jovita Lukasik, Michael Moeller, Margret Keuper","doi":"10.1007/s11263-024-02265-7","DOIUrl":"https://doi.org/10.1007/s11263-024-02265-7","url":null,"abstract":"<p>Zero-cost proxies are nowadays frequently studied and used to search for neural architectures. They show an impressive ability to predict the performance of architectures by making use of their untrained weights. These techniques allow for immense search speed-ups. So far the joint search for well performing and robust architectures has received much less attention in the field of NAS. Therefore, the main focus of zero-cost proxies is the clean accuracy of architectures, whereas the model robustness should play an evenly important part. In this paper, we analyze the ability of common zero-cost proxies to serve as performance predictors for robustness in the popular NAS-Bench-201 search space. We are interested in the single prediction task for robustness and the joint multi-objective of clean and robust accuracy. We further analyze the feature importance of the proxies and show that predicting the robustness makes the prediction task from existing zero-cost proxies more challenging. As a result, the joint consideration of several proxies becomes necessary to predict a model’s robustness while the clean accuracy can be regressed from a single such feature. Our code is available at https://github.com/jovitalukasik/zcp_eval.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"47 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2024-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142797127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Occlusion-Preserved Surveillance Video Synopsis with Flexible Object Graph 基于柔性目标图的遮挡保留监控视频摘要
IF 19.5 2区 计算机科学
International Journal of Computer Vision Pub Date : 2024-12-09 DOI: 10.1007/s11263-024-02302-5
Yongwei Nie, Wei Ge, Siming Zeng, Qing Zhang, Guiqing Li, Ping Li, Hongmin Cai
{"title":"Occlusion-Preserved Surveillance Video Synopsis with Flexible Object Graph","authors":"Yongwei Nie, Wei Ge, Siming Zeng, Qing Zhang, Guiqing Li, Ping Li, Hongmin Cai","doi":"10.1007/s11263-024-02302-5","DOIUrl":"https://doi.org/10.1007/s11263-024-02302-5","url":null,"abstract":"<p>Video synopsis is a technique that condenses a long surveillance video to a short summary. It faces challenges to process objects originally occluding each other in the source video. Previous approaches either treat occlusion objects as a single object, which however reduce compression ratio; or have to separate occlusion objects individually, but destroy interactions between them and yield visual artifacts. This paper presents a novel data structure called Flexible Object Graph (FOG) to handle original occlusions. Our FOG-based video synopsis approach can manipulate each object flexibly while preserving the original occlusions between them, achieving high synopsis ratio while maintaining interactions of objects. A challenging issue that comes with the introduction of FOG is that FOG may contain circulations that yield conflicts. We solve this problem by proposing a circulation conflict resolving algorithm. Furthermore, video synopsis methods usually minimize a multi-objective energy function. Previous approaches optimize the multiple objectives simultaneously which needs to strike a balance between them. Instead, we propose a stepwise optimization strategy consuming less running time while producing higher quality. Experiments demonstrate the effectiveness of our method.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"212 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2024-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142797123","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Object Pose Estimation Based on Multi-precision Vectors and Seg-Driven PnP 基于多精度矢量和分段驱动PnP的目标姿态估计
IF 19.5 2区 计算机科学
International Journal of Computer Vision Pub Date : 2024-12-07 DOI: 10.1007/s11263-024-02317-y
Yulin Wang, Hongli Li, Chen Luo
{"title":"Object Pose Estimation Based on Multi-precision Vectors and Seg-Driven PnP","authors":"Yulin Wang, Hongli Li, Chen Luo","doi":"10.1007/s11263-024-02317-y","DOIUrl":"https://doi.org/10.1007/s11263-024-02317-y","url":null,"abstract":"<p>Object pose estimation based on a single RGB image has wide application potential but is difficult to achieve. Existing pose estimation involves various inference pipelines. One popular pipeline is to first use Convolutional Neural Networks (CNN) to predict 2D projections of 3D keypoints in a single RGB image and then calculate the 6D pose via a Perspective-n-Point (PnP) solver. Due to the gap between synthetic data and real data, the model trained on synthetic data has difficulty predicting the 6D pose accurately when applied to real data. To address the acute problem, we propose a two-stage pipeline of object pose estimation based upon multi-precision vectors and segmentation-driven (Seg-Driven) PnP. In keypoint localization stage, we first develop a CNN-based three-branch network to predict multi-precision 2D vectors pointing to 2D keypoints. Then we introduce an accurate and fast Keypoint Voting scheme of Multi-precision vectors (KVM), which computes low-precision 2D keypoints using low-precision vectors and refines 2D keypoints on mid- and high-precision vectors. In the pose calculation stage, we propose Seg-Driven PnP to refine the 3D Translation of poses and get the optimal pose by minimizing the non-overlapping area between segmented and rendered masks. The Seg-Driven PnP leverages 2D segmentation trained on real images to improve the accuracy of pose estimation trained on synthetic data, thereby reducing the synthetic-to-real gap. Extensive experiments show our approach materially outperforms state-of-the-art methods on LM and HB datasets. Importantly, our proposed method works reasonably well for weakly textured and occluded objects in diverse scenes.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"6 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2024-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142788543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Modality-missing RGBT Tracking: Invertible Prompt Learning and High-quality Benchmarks 模态缺失的rbt跟踪:可逆提示学习和高质量基准
IF 19.5 2区 计算机科学
International Journal of Computer Vision Pub Date : 2024-12-07 DOI: 10.1007/s11263-024-02311-4
Andong Lu, Chenglong Li, Jiacong Zhao, Jin Tang, Bin Luo
{"title":"Modality-missing RGBT Tracking: Invertible Prompt Learning and High-quality Benchmarks","authors":"Andong Lu, Chenglong Li, Jiacong Zhao, Jin Tang, Bin Luo","doi":"10.1007/s11263-024-02311-4","DOIUrl":"https://doi.org/10.1007/s11263-024-02311-4","url":null,"abstract":"<p>Current RGBT tracking research relies on the complete multi-modality input, but modal information might miss due to some factors such as thermal sensor self-calibration and data transmission error, called modality-missing challenge in this work. To address this challenge, we propose a novel invertible prompt learning approach, which integrates the content-preserving prompts into a well-trained tracking model to adapt to various modality-missing scenarios, for robust RGBT tracking. Given one modality-missing scenario, we propose to utilize the available modality to generate the prompt of the missing modality to adapt to RGBT tracking model. However, the cross-modality gap between available and missing modalities usually causes semantic distortion and information loss in prompt generation. To handle this issue, we design the invertible prompter by incorporating the full reconstruction of the input available modality from the generated prompt. To provide a comprehensive evaluation platform, we construct several high-quality benchmark datasets, in which various modality-missing scenarios are considered to simulate real-world challenges. Extensive experiments on three modality-missing benchmark datasets show that our method achieves significant performance improvements compared with state-of-the-art methods. We have released the code and simulation datasets at: https://github.com/mmic-lcl.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"20 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2024-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142788758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CLIP-Powered TASS: Target-Aware Single-Stream Network for Audio-Visual Question Answering CLIP-Powered TASS:目标感知的视听问答单流网络
IF 19.5 2区 计算机科学
International Journal of Computer Vision Pub Date : 2024-12-05 DOI: 10.1007/s11263-024-02289-z
Yuanyuan Jiang, Jianqin Yin
{"title":"CLIP-Powered TASS: Target-Aware Single-Stream Network for Audio-Visual Question Answering","authors":"Yuanyuan Jiang, Jianqin Yin","doi":"10.1007/s11263-024-02289-z","DOIUrl":"https://doi.org/10.1007/s11263-024-02289-z","url":null,"abstract":"<p>While vision-language pretrained models (VLMs) excel in various multimodal understanding tasks, their potential in fine-grained audio-visual reasoning, particularly for audio-visual question answering (AVQA), remains largely unexplored. AVQA presents specific challenges for VLMs due to the requirement of visual understanding at the region level and seamless integration with audio modality. Previous VLM-based AVQA methods merely used CLIP as a feature encoder but underutilized its knowledge, and mistreated audio and video as separate entities in a dual-stream framework as most AVQA methods. This paper proposes a new CLIP-powered target-aware single-stream (TASS) network for AVQA using the pretrained knowledge of the CLIP model through the audio-visual matching characteristic of nature. It consists of two key components: the target-aware spatial grounding module (TSG+) and the single-stream joint temporal grounding module (JTG). Specifically, TSG+ module transfers the image-text matching knowledge from CLIP models to the required region-text matching process without corresponding ground-truth labels. Moreover, unlike previous separate dual-stream networks that still required an additional audio-visual fusion module, JTG unifies audio-visual fusion and question-aware temporal grounding in a simplified single-stream architecture. It treats audio and video as a cohesive entity and further extends the image-text matching knowledge to audio-text matching by preserving their temporal correlation with our proposed cross-modal synchrony (CMS) loss. Besides, we propose a simple yet effective preprocessing strategy to optimize accuracy-efficiency trade-offs. Extensive experiments conducted on the MUSIC-AVQA benchmark verified the effectiveness of our proposed method over existing state-of-the-art methods. The code is available at https://github.com/Bravo5542/CLIP-TASS.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"67 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142776602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Instance-dependent Label Distribution Estimation for Learning with Label Noise 基于实例的标签噪声学习估计
IF 19.5 2区 计算机科学
International Journal of Computer Vision Pub Date : 2024-12-02 DOI: 10.1007/s11263-024-02299-x
Zehui Liao, Shishuai Hu, Yutong Xie, Yong Xia
{"title":"Instance-dependent Label Distribution Estimation for Learning with Label Noise","authors":"Zehui Liao, Shishuai Hu, Yutong Xie, Yong Xia","doi":"10.1007/s11263-024-02299-x","DOIUrl":"https://doi.org/10.1007/s11263-024-02299-x","url":null,"abstract":"<p>Noise transition matrix estimation is a promising approach for learning with label noise. It can infer clean posterior probabilities, known as Label Distribution (LD), based on noisy ones and reduce the impact of noisy labels. However, this estimation is challenging, since the ground truth labels are not always available. Most existing methods estimate a global noise transition matrix using either correctly labeled samples (anchor points) or detected reliable samples (pseudo anchor points). These methods heavily rely on the existence of anchor points or the quality of pseudo ones, and the global noise transition matrix can hardly provide accurate label transition information for each sample, since the label noise in real applications is mostly instance-dependent. To address these challenges, we propose an Instance-dependent Label Distribution Estimation (ILDE) method to learn from noisy labels for image classification. The method’s workflow has three major steps. First, we estimate each sample’s noisy posterior probability, supervised by noisy labels. Second, since mislabeling probability closely correlates with inter-class correlation, we compute the inter-class correlation matrix to estimate the noise transition matrix, bypassing the need for (pseudo) anchor points. Moreover, for a precise approximation of the instance-dependent noise transition matrix, we calculate the inter-class correlation matrix using only mini-batch samples rather than the entire training dataset. Third, we transform the noisy posterior probability into instance-dependent LD by multiplying it with the estimated noise transition matrix, using the resulting LD for enhanced supervision to prevent DCNNs from memorizing noisy labels. The proposed ILDE method has been evaluated against several state-of-the-art methods on two synthetic and three real-world noisy datasets. Our results indicate that the proposed ILDE method outperforms all competing methods, no matter whether the noise is synthetic or real noise.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2024-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142760581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ReFusion: Learning Image Fusion from Reconstruction with Learnable Loss Via Meta-Learning 融合:基于元学习的可学习损失重构学习图像融合
IF 19.5 2区 计算机科学
International Journal of Computer Vision Pub Date : 2024-12-02 DOI: 10.1007/s11263-024-02256-8
Haowen Bai, Zixiang Zhao, Jiangshe Zhang, Yichen Wu, Lilun Deng, Yukun Cui, Baisong Jiang, Shuang Xu
{"title":"ReFusion: Learning Image Fusion from Reconstruction with Learnable Loss Via Meta-Learning","authors":"Haowen Bai, Zixiang Zhao, Jiangshe Zhang, Yichen Wu, Lilun Deng, Yukun Cui, Baisong Jiang, Shuang Xu","doi":"10.1007/s11263-024-02256-8","DOIUrl":"https://doi.org/10.1007/s11263-024-02256-8","url":null,"abstract":"<p>Image fusion aims to combine information from multiple source images into a single one with more comprehensive informational content. Deep learning-based image fusion algorithms face significant challenges, including the lack of a definitive ground truth and the corresponding distance measurement. Additionally, current manually defined loss functions limit the model’s flexibility and generalizability for various fusion tasks. To address these limitations, we propose <b>ReFusion</b>, a unified meta-learning based image fusion framework that dynamically optimizes the fusion loss for various tasks through source image reconstruction. Compared to existing methods, ReFusion employs a parameterized loss function, that allows the training framework to be dynamically adapted according to the specific fusion scenario and task. ReFusion consists of three key components: a fusion module, a source reconstruction module, and a loss proposal module. We employ a meta-learning strategy to train the loss proposal module using the reconstruction loss. This strategy forces the fused image to be more conducive to reconstruct source images, allowing the loss proposal module to generate a adaptive fusion loss that preserves the optimal information from the source images. The update of the fusion module relies on the learnable fusion loss proposed by the loss proposal module. The three modules update alternately, enhancing each other to optimize the fusion loss for different tasks and consistently achieve satisfactory results. Extensive experiments demonstrate that ReFusion is capable of adapting to various tasks, including infrared-visible, medical, multi-focus, and multi-exposure image fusion. The code is available at https://github.com/HaowenBai/ReFusion.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"12 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2024-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142758162","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DiffLLE: Diffusion-based Domain Calibration for Weak Supervised Low-light Image Enhancement DiffLLE:基于扩散的弱监督弱光图像增强域校准
IF 19.5 2区 计算机科学
International Journal of Computer Vision Pub Date : 2024-11-27 DOI: 10.1007/s11263-024-02292-4
Shuzhou Yang, Xuanyu Zhang, Yinhuai Wang, Jiwen Yu, Yuhan Wang, Jian Zhang
{"title":"DiffLLE: Diffusion-based Domain Calibration for Weak Supervised Low-light Image Enhancement","authors":"Shuzhou Yang, Xuanyu Zhang, Yinhuai Wang, Jiwen Yu, Yuhan Wang, Jian Zhang","doi":"10.1007/s11263-024-02292-4","DOIUrl":"https://doi.org/10.1007/s11263-024-02292-4","url":null,"abstract":"<p>Existing weak supervised low-light image enhancement methods lack enough effectiveness and generalization in practical applications. We suppose this is because of the absence of explicit supervision and the inherent gap between real-world low-light domain and the training low-light domain. For example, low-light datasets are well-designed, but real-world night scenes are plagued with sophisticated interference such as noise, artifacts, and extreme lighting conditions. In this paper, we develop <b>Diff</b>usion-based domain calibration to realize more robust and effective weak supervised <b>L</b>ow-<b>L</b>ight <b>E</b>nhancement, called <b>DiffLLE</b>. Since the diffusion model performs impressive denoising capability and has been trained on massive clean images, we adopt it to bridge the gap between the real low-light domain and training degradation domain, while providing efficient priors of real-world content for weak supervised models. Specifically, we adopt a naive weak supervised enhancement algorithm to realize preliminary restoration and design two zero-shot plug-and-play modules based on diffusion model to improve generalization and effectiveness. The Diffusion-guided Degradation Calibration (DDC) module narrows the gap between real-world and training low-light degradation through diffusion-based domain calibration and a lightness enhancement curve, which makes the enhancement model perform robustly even in sophisticated wild degradation. Due to the limited enhancement effect of the weak supervised model, we further develop the Fine-grained Target domain Distillation (FTD) module to find a more visual-friendly solution space. It exploits the priors of the pre-trained diffusion model to generate pseudo-references, which shrinks the preliminary restored results from a coarse normal-light domain to a finer high-quality clean field, addressing the lack of strong explicit supervision for weak supervised methods. Benefiting from these, our approach even outperforms some supervised methods by using only a simple weak supervised baseline. Extensive experiments demonstrate the superior effectiveness of the proposed DiffLLE, especially in real-world dark scenes.\u0000</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"1 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2024-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142753768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Draw Sketch, Draw Flesh: Whole-Body Computed Tomography from Any X-Ray Views 绘制草图,绘制肉体:从任何x射线视图的全身计算机断层扫描
IF 19.5 2区 计算机科学
International Journal of Computer Vision Pub Date : 2024-11-27 DOI: 10.1007/s11263-024-02286-2
Yongsheng Pan, Yiwen Ye, Yanning Zhang, Yong Xia, Dinggang Shen
{"title":"Draw Sketch, Draw Flesh: Whole-Body Computed Tomography from Any X-Ray Views","authors":"Yongsheng Pan, Yiwen Ye, Yanning Zhang, Yong Xia, Dinggang Shen","doi":"10.1007/s11263-024-02286-2","DOIUrl":"https://doi.org/10.1007/s11263-024-02286-2","url":null,"abstract":"<p>Stereoscopic observation is a common foundation of medical image analysis and is generally achieved by 3D medical imaging based on settled scanners, such as CT and MRI, that are not as convenient as X-ray machines in some flexible scenarios. However, X-ray images can only provide perspective 2D observation and lack view in the third dimension. If 3D information can be deduced from X-ray images, it would broaden the application of X-ray machines. Focus on the above objective, this paper dedicates to the generation of pseudo 3D CT scans from non-parallel 2D perspective X-ray (PXR) views and proposes the <i>Draw Sketch and Draw Flesh</i> (DSDF) framework to first roughly predict the tissue distribution (Sketch) from PXR views and then render the tissue details (Flesh) from the tissue distribution and PXR views. Different from previous studies that focus only on partial locations, e.g., chest or neck, this study theoretically investigates the feasibility of head-to-leg reconstruction, i.e., generally applicable to any body parts. Experiments on 559 whole-body samples from 4 cohorts suggest that our DSDF can reconstruct more reasonable pseudo CT images than state-of-the-art methods and achieve promising results in both visualization and various downstream tasks. The source code and well-trained models are available a https://github.com/YongshengPan/WholeBodyXraytoCT.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"26 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2024-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142753766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信