Medical Vision-Language Pre-Training for Brain Abnormalities.

Proceedings of the conference. Association for Computational Linguistics. Meeting Pub Date : 2024-05-01

Masoud Monajatipoor, Zi-Yi Dou, Aichi Chien, Nanyun Peng, Kai-Wei Chang

{"title":"Medical Vision-Language Pre-Training for Brain Abnormalities.","authors":"Masoud Monajatipoor, Zi-Yi Dou, Aichi Chien, Nanyun Peng, Kai-Wei Chang","doi":"","DOIUrl":null,"url":null,"abstract":"Vision-language models have become increasingly powerful for tasks that require an understanding of both visual and linguistic elements, bridging the gap between these modalities. In the context of multimodal clinical AI, there is a growing need for models that possess domain-specific knowledge, as existing models often lack the expertise required for medical applications. In this paper, we take brain abnormalities as an example to demonstrate how to automatically collect medical image-text aligned data for pretraining from public resources such as PubMed. In particular, we present a pipeline that streamlines the pre-training process by initially collecting a large brain image-text dataset from case reports and published journals and subsequently constructing a high-performance vision-language model tailored to specific medical tasks. We also investigate the unique challenge of mapping subfigures to subcaptions in the medical domain. We evaluated the resulting model with quantitative and qualitative intrinsic evaluations. The resulting dataset and our code can be found here https://github.com/masoud-monajati/MedVL_pretraining_pipeline.","PeriodicalId":74541,"journal":{"name":"Proceedings of the conference. Association for Computational Linguistics. Meeting","volume":"2024 LREC/COLING","pages":"11159-11164"},"PeriodicalIF":0.0000,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11238846/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the conference. Association for Computational Linguistics. Meeting","FirstCategoryId":"1085","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Vision-language models have become increasingly powerful for tasks that require an understanding of both visual and linguistic elements, bridging the gap between these modalities. In the context of multimodal clinical AI, there is a growing need for models that possess domain-specific knowledge, as existing models often lack the expertise required for medical applications. In this paper, we take brain abnormalities as an example to demonstrate how to automatically collect medical image-text aligned data for pretraining from public resources such as PubMed. In particular, we present a pipeline that streamlines the pre-training process by initially collecting a large brain image-text dataset from case reports and published journals and subsequently constructing a high-performance vision-language model tailored to specific medical tasks. We also investigate the unique challenge of mapping subfigures to subcaptions in the medical domain. We evaluated the resulting model with quantitative and qualitative intrinsic evaluations. The resulting dataset and our code can be found here https://github.com/masoud-monajati/MedVL_pretraining_pipeline.

本刊更多论文

针对大脑异常的医学视觉语言预培训。

对于需要理解视觉和语言元素的任务来说，视觉语言模型已变得越来越强大，在这些模态之间架起了一座桥梁。在多模态临床人工智能的背景下，对拥有特定领域知识的模型的需求日益增长，因为现有模型往往缺乏医疗应用所需的专业知识。在本文中，我们以大脑异常为例，演示如何从公共资源（如 PubMed）中自动收集医学图像-文本对齐数据进行预训练。特别是，我们提出了一个简化预训练过程的管道，首先从病例报告和已出版期刊中收集大量脑图像-文本数据集，然后构建一个为特定医疗任务量身定制的高性能视觉语言模型。我们还研究了医疗领域中将子图标映射到子标题的独特挑战。我们通过定量和定性的内在评估，对由此产生的模型进行了评估。由此产生的数据集和我们的代码可以在这里找到 https://github.com/masoud-monajati/MedVL_pretraining_pipeline。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the conference. Association for Computational Linguistics. Meeting

自引率

0.00%

发文量