Adam Krawczyk , Aleksandra Osowska-Kurczab , Sławomir Pakuło , Wojciech Kotłowski , Zaneta Swiderska-Chadaj
{"title":"Efficient annotation bootstrapping for cell identification in follicular lymphoma","authors":"Adam Krawczyk , Aleksandra Osowska-Kurczab , Sławomir Pakuło , Wojciech Kotłowski , Zaneta Swiderska-Chadaj","doi":"10.1016/j.cmpb.2025.108728","DOIUrl":null,"url":null,"abstract":"<div><h3>Background and Objective:</h3><div>In the medical field of digital pathology, many tasks rely on visual assessments of tissue patterns or cells, presenting an opportunity to apply computer vision methods. However, acquiring a substantial number of annotations for developing deep learning algorithms remains a bottleneck. The annotation process is inherently biased due to various constraints, including labor shortages, high costs, time inefficiencies, and a strongly imbalanced distribution of labels. This study explores available solutions for reducing the costs of annotation bootstrapping in the challenging task of follicular lymphoma diagnosis.</div></div><div><h3>Methods:</h3><div>We compare three distinct approaches to annotation bootstrapping: extensive manual annotations, active learning, and weak supervision. We propose a hybrid architecture for centroblast and centrocyte detection from whole slide images, based on a custom cell encoder and contextual encoding derived from foundation models for digital pathology. We collected a dataset of 41 whole slide images scanned with a 20x objective lens and resolution <span><math><mrow><mn>0</mn><mo>.</mo><mn>24</mn><mspace></mspace><mi>μ</mi></mrow></math></span>m/pixel, from which 12,704 cell annotations were gathered.</div></div><div><h3>Results:</h3><div>Applying our proposed active learning workflow led to an almost twofold increase in the number of samples within the minority class. The best bootstrapping method improved the overall performance of the detection algorithm by 18 percentage points, yielding a macro-averaged F1-score, precision, and recall of 63%.</div></div><div><h3>Conclusions:</h3><div>The results of this study may find applications in other digital pathology problems, particularly for tasks involving a lack of homogeneous cell clusters within whole slide images.</div></div>","PeriodicalId":10624,"journal":{"name":"Computer methods and programs in biomedicine","volume":"265 ","pages":"Article 108728"},"PeriodicalIF":4.9000,"publicationDate":"2025-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer methods and programs in biomedicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169260725001452","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
Background and Objective:
In the medical field of digital pathology, many tasks rely on visual assessments of tissue patterns or cells, presenting an opportunity to apply computer vision methods. However, acquiring a substantial number of annotations for developing deep learning algorithms remains a bottleneck. The annotation process is inherently biased due to various constraints, including labor shortages, high costs, time inefficiencies, and a strongly imbalanced distribution of labels. This study explores available solutions for reducing the costs of annotation bootstrapping in the challenging task of follicular lymphoma diagnosis.
Methods:
We compare three distinct approaches to annotation bootstrapping: extensive manual annotations, active learning, and weak supervision. We propose a hybrid architecture for centroblast and centrocyte detection from whole slide images, based on a custom cell encoder and contextual encoding derived from foundation models for digital pathology. We collected a dataset of 41 whole slide images scanned with a 20x objective lens and resolution m/pixel, from which 12,704 cell annotations were gathered.
Results:
Applying our proposed active learning workflow led to an almost twofold increase in the number of samples within the minority class. The best bootstrapping method improved the overall performance of the detection algorithm by 18 percentage points, yielding a macro-averaged F1-score, precision, and recall of 63%.
Conclusions:
The results of this study may find applications in other digital pathology problems, particularly for tasks involving a lack of homogeneous cell clusters within whole slide images.
期刊介绍:
To encourage the development of formal computing methods, and their application in biomedical research and medical practice, by illustration of fundamental principles in biomedical informatics research; to stimulate basic research into application software design; to report the state of research of biomedical information processing projects; to report new computer methodologies applied in biomedical areas; the eventual distribution of demonstrable software to avoid duplication of effort; to provide a forum for discussion and improvement of existing software; to optimize contact between national organizations and regional user groups by promoting an international exchange of information on formal methods, standards and software in biomedicine.
Computer Methods and Programs in Biomedicine covers computing methodology and software systems derived from computing science for implementation in all aspects of biomedical research and medical practice. It is designed to serve: biochemists; biologists; geneticists; immunologists; neuroscientists; pharmacologists; toxicologists; clinicians; epidemiologists; psychiatrists; psychologists; cardiologists; chemists; (radio)physicists; computer scientists; programmers and systems analysts; biomedical, clinical, electrical and other engineers; teachers of medical informatics and users of educational software.