医学视觉文本蕴涵对视觉和语言模型的数值理解

Clinical Natural Language Processing Workshop Pub Date : 1900-01-01 DOI:10.18653/v1/2023.clinicalnlp-1.2

Hitomi Yanaka, Yuta Nakamura, Yuki Chida, Tomoya Kurosawa

{"title":"医学视觉文本蕴涵对视觉和语言模型的数值理解","authors":"Hitomi Yanaka, Yuta Nakamura, Yuki Chida, Tomoya Kurosawa","doi":"10.18653/v1/2023.clinicalnlp-1.2","DOIUrl":null,"url":null,"abstract":"Assessing the capacity of numerical understanding of vision-and-language models over images and texts is crucial for real vision-and-language applications, such as systems for automated medical image analysis.We provide a visual reasoning dataset focusing on numerical understanding in the medical domain.The experiments using our dataset show that current vision-and-language models fail to perform numerical inference in the medical domain.However, the data augmentation with only a small amount of our dataset improves the model performance, while maintaining the performance in the general domain.","PeriodicalId":216954,"journal":{"name":"Clinical Natural Language Processing Workshop","volume":"28 24","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Medical Visual Textual Entailment for Numerical Understanding of Vision-and-Language Models\",\"authors\":\"Hitomi Yanaka, Yuta Nakamura, Yuki Chida, Tomoya Kurosawa\",\"doi\":\"10.18653/v1/2023.clinicalnlp-1.2\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Assessing the capacity of numerical understanding of vision-and-language models over images and texts is crucial for real vision-and-language applications, such as systems for automated medical image analysis.We provide a visual reasoning dataset focusing on numerical understanding in the medical domain.The experiments using our dataset show that current vision-and-language models fail to perform numerical inference in the medical domain.However, the data augmentation with only a small amount of our dataset improves the model performance, while maintaining the performance in the general domain.\",\"PeriodicalId\":216954,\"journal\":{\"name\":\"Clinical Natural Language Processing Workshop\",\"volume\":\"28 24\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Clinical Natural Language Processing Workshop\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.18653/v1/2023.clinicalnlp-1.2\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Clinical Natural Language Processing Workshop","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18653/v1/2023.clinicalnlp-1.2","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

评估对图像和文本的视觉和语言模型的数值理解能力对于真正的视觉和语言应用至关重要，例如自动医学图像分析系统。我们提供了一个视觉推理数据集，专注于医学领域的数值理解。使用我们的数据集进行的实验表明，目前的视觉和语言模型无法在医学领域进行数值推理。然而，仅使用少量数据集的数据增强提高了模型性能，同时保持了一般领域的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Medical Visual Textual Entailment for Numerical Understanding of Vision-and-Language Models

Assessing the capacity of numerical understanding of vision-and-language models over images and texts is crucial for real vision-and-language applications, such as systems for automated medical image analysis.We provide a visual reasoning dataset focusing on numerical understanding in the medical domain.The experiments using our dataset show that current vision-and-language models fail to perform numerical inference in the medical domain.However, the data augmentation with only a small amount of our dataset improves the model performance, while maintaining the performance in the general domain.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Clinical Natural Language Processing Workshop

自引率

0.00%

发文量