Medical Visual Textual Entailment for Numerical Understanding of Vision-and-Language Models

Clinical Natural Language Processing Workshop Pub Date : 1900-01-01 DOI:10.18653/v1/2023.clinicalnlp-1.2

Hitomi Yanaka, Yuta Nakamura, Yuki Chida, Tomoya Kurosawa

引用次数: 0

Abstract

Assessing the capacity of numerical understanding of vision-and-language models over images and texts is crucial for real vision-and-language applications, such as systems for automated medical image analysis.We provide a visual reasoning dataset focusing on numerical understanding in the medical domain.The experiments using our dataset show that current vision-and-language models fail to perform numerical inference in the medical domain.However, the data augmentation with only a small amount of our dataset improves the model performance, while maintaining the performance in the general domain.

查看原文本刊更多论文

医学视觉文本蕴涵对视觉和语言模型的数值理解

评估对图像和文本的视觉和语言模型的数值理解能力对于真正的视觉和语言应用至关重要，例如自动医学图像分析系统。我们提供了一个视觉推理数据集，专注于医学领域的数值理解。使用我们的数据集进行的实验表明，目前的视觉和语言模型无法在医学领域进行数值推理。然而，仅使用少量数据集的数据增强提高了模型性能，同时保持了一般领域的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Clinical Natural Language Processing Workshop

自引率

0.00%

发文量