{"title":"Multimodal representations of biomedical knowledge from limited training whole slide images and reports using deep learning","authors":"Niccolò Marini , Stefano Marchesin , Marek Wodzinski , Alessandro Caputo , Damian Podareanu , Bryan Cardenas Guevara , Svetla Boytcheva , Simona Vatrano , Filippo Fraggetta , Francesco Ciompi , Gianmaria Silvello , Henning Müller , Manfredo Atzori","doi":"10.1016/j.media.2024.103303","DOIUrl":null,"url":null,"abstract":"<div><p>The increasing availability of biomedical data creates valuable resources for developing new deep learning algorithms to support experts, especially in domains where collecting large volumes of annotated data is not trivial. Biomedical data include several modalities containing complementary information, such as medical images and reports: images are often large and encode low-level information, while reports include a summarized high-level description of the findings identified within data and often only concerning a small part of the image. However, only a few methods allow to effectively link the visual content of images with the textual content of reports, preventing medical specialists from properly benefitting from the recent opportunities offered by deep learning models. This paper introduces a multimodal architecture creating a robust biomedical data representation encoding fine-grained text representations within image embeddings. The architecture aims to tackle data scarcity (combining supervised and self-supervised learning) and to create multimodal biomedical ontologies. The architecture is trained on over 6,000 colon whole slide Images (WSI), paired with the corresponding report, collected from two digital pathology workflows. The evaluation of the multimodal architecture involves three tasks: WSI classification (on data from pathology workflow and from public repositories), multimodal data retrieval, and linking between textual and visual concepts. Noticeably, the latter two tasks are available by architectural design without further training, showing that the multimodal architecture that can be adopted as a backbone to solve peculiar tasks. The multimodal data representation outperforms the unimodal one on the classification of colon WSIs and allows to halve the data needed to reach accurate performance, reducing the computational power required and thus the carbon footprint. The combination of images and reports exploiting self-supervised algorithms allows to mine databases without needing new annotations provided by experts, extracting new information. In particular, the multimodal visual ontology, linking semantic concepts to images, may pave the way to advancements in medicine and biomedical analysis domains, not limited to histopathology.</p></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"97 ","pages":"Article 103303"},"PeriodicalIF":10.7000,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1361841524002287/pdfft?md5=73a7966410c3f9ed908cd48c6bfefa5b&pid=1-s2.0-S1361841524002287-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medical image analysis","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1361841524002287","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
The increasing availability of biomedical data creates valuable resources for developing new deep learning algorithms to support experts, especially in domains where collecting large volumes of annotated data is not trivial. Biomedical data include several modalities containing complementary information, such as medical images and reports: images are often large and encode low-level information, while reports include a summarized high-level description of the findings identified within data and often only concerning a small part of the image. However, only a few methods allow to effectively link the visual content of images with the textual content of reports, preventing medical specialists from properly benefitting from the recent opportunities offered by deep learning models. This paper introduces a multimodal architecture creating a robust biomedical data representation encoding fine-grained text representations within image embeddings. The architecture aims to tackle data scarcity (combining supervised and self-supervised learning) and to create multimodal biomedical ontologies. The architecture is trained on over 6,000 colon whole slide Images (WSI), paired with the corresponding report, collected from two digital pathology workflows. The evaluation of the multimodal architecture involves three tasks: WSI classification (on data from pathology workflow and from public repositories), multimodal data retrieval, and linking between textual and visual concepts. Noticeably, the latter two tasks are available by architectural design without further training, showing that the multimodal architecture that can be adopted as a backbone to solve peculiar tasks. The multimodal data representation outperforms the unimodal one on the classification of colon WSIs and allows to halve the data needed to reach accurate performance, reducing the computational power required and thus the carbon footprint. The combination of images and reports exploiting self-supervised algorithms allows to mine databases without needing new annotations provided by experts, extracting new information. In particular, the multimodal visual ontology, linking semantic concepts to images, may pave the way to advancements in medicine and biomedical analysis domains, not limited to histopathology.
生物医学数据的可用性越来越高,为开发支持专家的新型深度学习算法创造了宝贵的资源,尤其是在收集大量注释数据并非易事的领域。生物医学数据包括几种包含互补信息的模式,如医学图像和报告:图像通常很大,编码低层次信息,而报告包括对数据中确定的研究结果的高层次摘要描述,通常只涉及图像的一小部分。然而,只有少数方法可以有效地将图像的视觉内容与报告的文本内容联系起来,这使得医学专家无法从深度学习模型提供的最新机会中适当获益。本文介绍了一种多模态架构,该架构创建了一种在图像嵌入中编码细粒度文本表示的稳健生物医学数据表示。该架构旨在解决数据稀缺问题(结合监督学习和自我监督学习),并创建多模态生物医学本体。该架构在从两个数字病理工作流中收集的 6000 多张结肠全切片图像(WSI)和相应的报告上进行了训练。多模态架构的评估包括三项任务:WSI分类(来自病理工作流和公共资料库的数据)、多模态数据检索以及文本和视觉概念之间的链接。值得注意的是,后两项任务无需进一步训练即可通过架构设计完成,这表明多模态架构可以作为解决特殊任务的骨干。多模态数据表示在结肠 WSI 分类方面优于单模态数据表示,并能将达到准确性能所需的数据减半,从而降低所需计算能力,减少碳足迹。利用自监督算法将图像和报告结合起来,无需专家提供新注释即可挖掘数据库,提取新信息。特别是将语义概念与图像联系起来的多模态视觉本体,可能会为医学和生物医学分析领域(不限于组织病理学)的进步铺平道路。
期刊介绍:
Medical Image Analysis serves as a platform for sharing new research findings in the realm of medical and biological image analysis, with a focus on applications of computer vision, virtual reality, and robotics to biomedical imaging challenges. The journal prioritizes the publication of high-quality, original papers contributing to the fundamental science of processing, analyzing, and utilizing medical and biological images. It welcomes approaches utilizing biomedical image datasets across all spatial scales, from molecular/cellular imaging to tissue/organ imaging.