IQAGPT: computed tomography image quality assessment with vision-language and ChatGPT models.

IF 3.2 4区计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Visual Computing for Industry Biomedicine and Art Pub Date : 2024-08-05 DOI:10.1186/s42492-024-00171-w

Zhihao Chen, Bin Hu, Chuang Niu, Tao Chen, Yuxin Li, Hongming Shan, Ge Wang

{"title":"IQAGPT: computed tomography image quality assessment with vision-language and ChatGPT models.","authors":"Zhihao Chen, Bin Hu, Chuang Niu, Tao Chen, Yuxin Li, Hongming Shan, Ge Wang","doi":"10.1186/s42492-024-00171-w","DOIUrl":null,"url":null,"abstract":"<p><p>Large language models (LLMs), such as ChatGPT, have demonstrated impressive capabilities in various tasks and attracted increasing interest as a natural language interface across many domains. Recently, large vision-language models (VLMs) that learn rich vision-language correlation from image-text pairs, like BLIP-2 and GPT-4, have been intensively investigated. However, despite these developments, the application of LLMs and VLMs in image quality assessment (IQA), particularly in medical imaging, remains unexplored. This is valuable for objective performance evaluation and potential supplement or even replacement of radiologists' opinions. To this end, this study introduces IQAGPT, an innovative computed tomography (CT) IQA system that integrates image-quality captioning VLM with ChatGPT to generate quality scores and textual reports. First, a CT-IQA dataset comprising 1,000 CT slices with diverse quality levels is professionally annotated and compiled for training and evaluation. To better leverage the capabilities of LLMs, the annotated quality scores are converted into semantically rich text descriptions using a prompt template. Second, the image-quality captioning VLM is fine-tuned on the CT-IQA dataset to generate quality descriptions. The captioning model fuses image and text features through cross-modal attention. Third, based on the quality descriptions, users verbally request ChatGPT to rate image-quality scores or produce radiological quality reports. Results demonstrate the feasibility of assessing image quality using LLMs. The proposed IQAGPT outperformed GPT-4 and CLIP-IQA, as well as multitask classification and regression models that solely rely on images.</p>","PeriodicalId":29931,"journal":{"name":"Visual Computing for Industry Biomedicine and Art","volume":"7 1","pages":"20"},"PeriodicalIF":3.2000,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11300764/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Visual Computing for Industry Biomedicine and Art","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1186/s42492-024-00171-w","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

Large language models (LLMs), such as ChatGPT, have demonstrated impressive capabilities in various tasks and attracted increasing interest as a natural language interface across many domains. Recently, large vision-language models (VLMs) that learn rich vision-language correlation from image-text pairs, like BLIP-2 and GPT-4, have been intensively investigated. However, despite these developments, the application of LLMs and VLMs in image quality assessment (IQA), particularly in medical imaging, remains unexplored. This is valuable for objective performance evaluation and potential supplement or even replacement of radiologists' opinions. To this end, this study introduces IQAGPT, an innovative computed tomography (CT) IQA system that integrates image-quality captioning VLM with ChatGPT to generate quality scores and textual reports. First, a CT-IQA dataset comprising 1,000 CT slices with diverse quality levels is professionally annotated and compiled for training and evaluation. To better leverage the capabilities of LLMs, the annotated quality scores are converted into semantically rich text descriptions using a prompt template. Second, the image-quality captioning VLM is fine-tuned on the CT-IQA dataset to generate quality descriptions. The captioning model fuses image and text features through cross-modal attention. Third, based on the quality descriptions, users verbally request ChatGPT to rate image-quality scores or produce radiological quality reports. Results demonstrate the feasibility of assessing image quality using LLMs. The proposed IQAGPT outperformed GPT-4 and CLIP-IQA, as well as multitask classification and regression models that solely rely on images.

查看原文本刊更多论文

IQAGPT：利用视觉语言和 ChatGPT 模型进行计算机断层扫描图像质量评估。

大型语言模型（LLM），如 ChatGPT，已在各种任务中展示出令人印象深刻的能力，并作为许多领域的自然语言界面吸引了越来越多的关注。最近，从图像-文本对中学习丰富的视觉-语言相关性的大型视觉-语言模型（VLM），如 BLIP-2 和 GPT-4，也得到了深入研究。然而，尽管取得了这些进展，LLMs 和 VLMs 在图像质量评估（IQA）中的应用，尤其是在医学成像中的应用，仍有待探索。这对于进行客观的性能评估和潜在地补充甚至取代放射科医生的意见非常有价值。为此，本研究介绍了一种创新的计算机断层扫描（CT）IQA 系统 IQAGPT，该系统将图像质量字幕 VLM 与 ChatGPT 整合在一起，生成质量评分和文本报告。首先，我们对由 1,000 张不同质量水平的 CT 切片组成的 CT-IQA 数据集进行了专业注释和编译，以用于训练和评估。为了更好地利用 LLM 的功能，使用提示模板将注释的质量分数转换为语义丰富的文本描述。其次，在 CT-IQA 数据集上对图像质量字幕 VLM 进行微调，以生成质量描述。该字幕模型通过跨模态关注融合了图像和文本特征。第三，基于质量描述，用户口头要求 ChatGPT 对图像质量评分或生成放射质量报告。结果证明了使用 LLM 评估图像质量的可行性。所提出的 IQAGPT 优于 GPT-4 和 CLIP-IQA，也优于仅依赖图像的多任务分类和回归模型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Visual Computing for Industry Biomedicine and Art Multiple-

CiteScore

5.60

自引率

0.00%

发文量