生物医学图像的复合图分离:用于自监督学习的大数据集挖掘

The journal of machine learning for biomedical imaging Pub Date : 2022-08-01 DOI:10.48550/arXiv.2208.14357

Tianyuan Yao, Changbing Qu, Jun Long, Quan Liu, Ruining Deng, Yuanhan Tian, Jiachen Xu, Aadarsh Jha, Zuhayr Asad, S. Bao, Mengyang Zhao, A. Fogo, Bennett A.Landman, Haichun Yang, Catie Chang, Yuankai Huo

{"title":"生物医学图像的复合图分离:用于自监督学习的大数据集挖掘","authors":"Tianyuan Yao, Changbing Qu, Jun Long, Quan Liu, Ruining Deng, Yuanhan Tian, Jiachen Xu, Aadarsh Jha, Zuhayr Asad, S. Bao, Mengyang Zhao, A. Fogo, Bennett A.Landman, Haichun Yang, Catie Chang, Yuankai Huo","doi":"10.48550/arXiv.2208.14357","DOIUrl":null,"url":null,"abstract":"With the rapid development of self-supervised learning (e.g., contrastive learning), the importance of having large-scale images (even without annotations) for training a more generalizable AI model has been widely recognized in medical image analysis. However, collecting large-scale task-specific unannotated data at scale can be challenging for individual labs. Existing online resources, such as digital books, publications, and search engines, provide a new resource for obtaining large-scale images. However, published images in healthcare (e.g., radiology and pathology) consist of a considerable amount of compound figures with subplots. In order to extract and separate compound figures into usable individual images for downstream learning, we propose a simple compound figure separation (SimCFS) framework without using the traditionally required detection bounding box annotations, with a new loss function and a hard case simulation. Our technical contribution is four-fold: (1) we introduce a simulation-based training framework that minimizes the need for resource extensive bounding box annotations; (2) we propose a new side loss that is optimized for compound figure separation; (3) we propose an intra-class image augmentation method to simulate hard cases; and (4) to the best of our knowledge, this is the first study that evaluates the efficacy of leveraging self-supervised learning with compound image separation. From the results, the proposed SimCFS achieved state-of-the-art performance on the ImageCLEF 2016 Compound Figure Separation Database. The pretrained self-supervised learning model using large-scale mined figures improved the accuracy of downstream image classification tasks with a contrastive learning algorithm. The source code of SimCFS is made publicly available at https://github.com/hrlblab/ImageSeperation.","PeriodicalId":75083,"journal":{"name":"The journal of machine learning for biomedical imaging","volume":"22 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Compound Figure Separation of Biomedical Images: Mining Large Datasets for Self-supervised Learning\",\"authors\":\"Tianyuan Yao, Changbing Qu, Jun Long, Quan Liu, Ruining Deng, Yuanhan Tian, Jiachen Xu, Aadarsh Jha, Zuhayr Asad, S. Bao, Mengyang Zhao, A. Fogo, Bennett A.Landman, Haichun Yang, Catie Chang, Yuankai Huo\",\"doi\":\"10.48550/arXiv.2208.14357\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the rapid development of self-supervised learning (e.g., contrastive learning), the importance of having large-scale images (even without annotations) for training a more generalizable AI model has been widely recognized in medical image analysis. However, collecting large-scale task-specific unannotated data at scale can be challenging for individual labs. Existing online resources, such as digital books, publications, and search engines, provide a new resource for obtaining large-scale images. However, published images in healthcare (e.g., radiology and pathology) consist of a considerable amount of compound figures with subplots. In order to extract and separate compound figures into usable individual images for downstream learning, we propose a simple compound figure separation (SimCFS) framework without using the traditionally required detection bounding box annotations, with a new loss function and a hard case simulation. Our technical contribution is four-fold: (1) we introduce a simulation-based training framework that minimizes the need for resource extensive bounding box annotations; (2) we propose a new side loss that is optimized for compound figure separation; (3) we propose an intra-class image augmentation method to simulate hard cases; and (4) to the best of our knowledge, this is the first study that evaluates the efficacy of leveraging self-supervised learning with compound image separation. From the results, the proposed SimCFS achieved state-of-the-art performance on the ImageCLEF 2016 Compound Figure Separation Database. The pretrained self-supervised learning model using large-scale mined figures improved the accuracy of downstream image classification tasks with a contrastive learning algorithm. The source code of SimCFS is made publicly available at https://github.com/hrlblab/ImageSeperation.\",\"PeriodicalId\":75083,\"journal\":{\"name\":\"The journal of machine learning for biomedical imaging\",\"volume\":\"22 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The journal of machine learning for biomedical imaging\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.48550/arXiv.2208.14357\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The journal of machine learning for biomedical imaging","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2208.14357","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

随着自监督学习(如对比学习)的快速发展，在医学图像分析中，拥有大规模图像(即使没有注释)对于训练更一般化的AI模型的重要性已得到广泛认可。然而，对于单个实验室来说，大规模收集特定于任务的未注释数据可能具有挑战性。现有的在线资源，如数字图书、出版物和搜索引擎，为获取大规模图像提供了新的资源。然而，在医疗保健(例如，放射学和病理学)中发表的图像由相当数量的带有子图的复合图形组成。为了将复合图提取并分离成可用的单个图像用于下游学习，我们提出了一个简单的复合图分离(SimCFS)框架，不使用传统的检测边界框注释，具有新的损失函数和硬案例模拟。我们的技术贡献有四个方面:(1)我们引入了一个基于模拟的训练框架，该框架最大限度地减少了对资源扩展边界框注释的需求;(2)提出了一种优化的复合图分离新边损;(3)提出了一种类内图像增强方法来模拟硬案例;(4)据我们所知，这是第一个评估利用复合图像分离的自监督学习效果的研究。从结果来看，所提出的SimCFS在ImageCLEF 2016复合图分离数据库上取得了最先进的性能。基于大规模挖掘图的预训练自监督学习模型通过对比学习算法提高了下游图像分类任务的准确率。SimCFS的源代码可以在https://github.com/hrlblab/ImageSeperation上公开获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Compound Figure Separation of Biomedical Images: Mining Large Datasets for Self-supervised Learning

With the rapid development of self-supervised learning (e.g., contrastive learning), the importance of having large-scale images (even without annotations) for training a more generalizable AI model has been widely recognized in medical image analysis. However, collecting large-scale task-specific unannotated data at scale can be challenging for individual labs. Existing online resources, such as digital books, publications, and search engines, provide a new resource for obtaining large-scale images. However, published images in healthcare (e.g., radiology and pathology) consist of a considerable amount of compound figures with subplots. In order to extract and separate compound figures into usable individual images for downstream learning, we propose a simple compound figure separation (SimCFS) framework without using the traditionally required detection bounding box annotations, with a new loss function and a hard case simulation. Our technical contribution is four-fold: (1) we introduce a simulation-based training framework that minimizes the need for resource extensive bounding box annotations; (2) we propose a new side loss that is optimized for compound figure separation; (3) we propose an intra-class image augmentation method to simulate hard cases; and (4) to the best of our knowledge, this is the first study that evaluates the efficacy of leveraging self-supervised learning with compound image separation. From the results, the proposed SimCFS achieved state-of-the-art performance on the ImageCLEF 2016 Compound Figure Separation Database. The pretrained self-supervised learning model using large-scale mined figures improved the accuracy of downstream image classification tasks with a contrastive learning algorithm. The source code of SimCFS is made publicly available at https://github.com/hrlblab/ImageSeperation.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

The journal of machine learning for biomedical imaging

自引率

0.00%

发文量