Efficient annotation bootstrapping for cell identification in follicular lymphoma

IF 4.9 2区医学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Computer methods and programs in biomedicine Pub Date : 2025-03-27 DOI:10.1016/j.cmpb.2025.108728

Adam Krawczyk , Aleksandra Osowska-Kurczab , Sławomir Pakuło , Wojciech Kotłowski , Zaneta Swiderska-Chadaj

{"title":"Efficient annotation bootstrapping for cell identification in follicular lymphoma","authors":"Adam Krawczyk , Aleksandra Osowska-Kurczab , Sławomir Pakuło , Wojciech Kotłowski , Zaneta Swiderska-Chadaj","doi":"10.1016/j.cmpb.2025.108728","DOIUrl":null,"url":null,"abstract":"<div><h3>Background and Objective:</h3><div>In the medical field of digital pathology, many tasks rely on visual assessments of tissue patterns or cells, presenting an opportunity to apply computer vision methods. However, acquiring a substantial number of annotations for developing deep learning algorithms remains a bottleneck. The annotation process is inherently biased due to various constraints, including labor shortages, high costs, time inefficiencies, and a strongly imbalanced distribution of labels. This study explores available solutions for reducing the costs of annotation bootstrapping in the challenging task of follicular lymphoma diagnosis.</div></div><div><h3>Methods:</h3><div>We compare three distinct approaches to annotation bootstrapping: extensive manual annotations, active learning, and weak supervision. We propose a hybrid architecture for centroblast and centrocyte detection from whole slide images, based on a custom cell encoder and contextual encoding derived from foundation models for digital pathology. We collected a dataset of 41 whole slide images scanned with a 20x objective lens and resolution <span><math><mrow><mn>0</mn><mo>.</mo><mn>24</mn><mspace></mspace><mi>μ</mi></mrow></math></span>m/pixel, from which 12,704 cell annotations were gathered.</div></div><div><h3>Results:</h3><div>Applying our proposed active learning workflow led to an almost twofold increase in the number of samples within the minority class. The best bootstrapping method improved the overall performance of the detection algorithm by 18 percentage points, yielding a macro-averaged F1-score, precision, and recall of 63%.</div></div><div><h3>Conclusions:</h3><div>The results of this study may find applications in other digital pathology problems, particularly for tasks involving a lack of homogeneous cell clusters within whole slide images.</div></div>","PeriodicalId":10624,"journal":{"name":"Computer methods and programs in biomedicine","volume":"265 ","pages":"Article 108728"},"PeriodicalIF":4.9000,"publicationDate":"2025-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer methods and programs in biomedicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169260725001452","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

Background and Objective:

In the medical field of digital pathology, many tasks rely on visual assessments of tissue patterns or cells, presenting an opportunity to apply computer vision methods. However, acquiring a substantial number of annotations for developing deep learning algorithms remains a bottleneck. The annotation process is inherently biased due to various constraints, including labor shortages, high costs, time inefficiencies, and a strongly imbalanced distribution of labels. This study explores available solutions for reducing the costs of annotation bootstrapping in the challenging task of follicular lymphoma diagnosis.

Methods:

We compare three distinct approaches to annotation bootstrapping: extensive manual annotations, active learning, and weak supervision. We propose a hybrid architecture for centroblast and centrocyte detection from whole slide images, based on a custom cell encoder and contextual encoding derived from foundation models for digital pathology. We collected a dataset of 41 whole slide images scanned with a 20x objective lens and resolution

0.24 μ

m/pixel, from which 12,704 cell annotations were gathered.

Results:

Applying our proposed active learning workflow led to an almost twofold increase in the number of samples within the minority class. The best bootstrapping method improved the overall performance of the detection algorithm by 18 percentage points, yielding a macro-averaged F1-score, precision, and recall of 63%.

Conclusions:

The results of this study may find applications in other digital pathology problems, particularly for tasks involving a lack of homogeneous cell clusters within whole slide images.

查看原文本刊更多论文

利用高效注释引导法识别滤泡性淋巴瘤细胞

背景与目的：在数字病理学的医学领域，许多任务依赖于对组织模式或细胞的视觉评估，这为应用计算机视觉方法提供了机会。然而，获取大量用于开发深度学习算法的注释仍然是一个瓶颈。由于各种限制，包括劳动力短缺、高成本、时间效率低下和标签分布严重不平衡，标注过程本身就存在偏差。本研究探讨了在滤泡性淋巴瘤诊断的挑战性任务中降低注释引导成本的可行解决方案。方法：我们比较了三种不同的注释自举方法：大量手工注释、主动学习和弱监督。我们提出了一种基于自定义细胞编码器和源自数字病理学基础模型的上下文编码的混合结构，用于从整个幻灯片图像中检测成丝细胞和着丝细胞。我们收集了41张完整的幻灯片图像，用20倍物镜扫描，分辨率为0.24μm/pixel，从中收集了12,704个细胞注释。结果：应用我们提出的主动学习工作流程导致少数族裔班级的样本数量增加了近两倍。最佳的自举方法将检测算法的整体性能提高了18个百分点，产生了63%的宏观平均f1分数、精度和召回率。结论：本研究的结果可能会在其他数字病理问题中找到应用，特别是在整个幻灯片图像中缺乏均匀细胞团的任务。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computer methods and programs in biomedicine 工程技术-工程：生物医学

CiteScore

12.30

自引率

6.60%

发文量

601

审稿时长

135 days

期刊介绍： To encourage the development of formal computing methods, and their application in biomedical research and medical practice, by illustration of fundamental principles in biomedical informatics research; to stimulate basic research into application software design; to report the state of research of biomedical information processing projects; to report new computer methodologies applied in biomedical areas; the eventual distribution of demonstrable software to avoid duplication of effort; to provide a forum for discussion and improvement of existing software; to optimize contact between national organizations and regional user groups by promoting an international exchange of information on formal methods, standards and software in biomedicine. Computer Methods and Programs in Biomedicine covers computing methodology and software systems derived from computing science for implementation in all aspects of biomedical research and medical practice. It is designed to serve: biochemists; biologists; geneticists; immunologists; neuroscientists; pharmacologists; toxicologists; clinicians; epidemiologists; psychiatrists; psychologists; cardiologists; chemists; (radio)physicists; computer scientists; programmers and systems analysts; biomedical, clinical, electrical and other engineers; teachers of medical informatics and users of educational software.