Jonghyun Lee, Sangjeong Ahn, Hyun-Soo Kim, Jungsuk An, Jongmin Sim
{"title":"A robust model training strategy using hard negative mining in a weakly labeled dataset for lymphatic invasion in gastric cancer","authors":"Jonghyun Lee, Sangjeong Ahn, Hyun-Soo Kim, Jungsuk An, Jongmin Sim","doi":"10.1002/cjp2.355","DOIUrl":null,"url":null,"abstract":"<p>Gastric cancer is a significant public health concern, emphasizing the need for accurate evaluation of lymphatic invasion (LI) for determining prognosis and treatment options. However, this task is time-consuming, labor-intensive, and prone to intra- and interobserver variability. Furthermore, the scarcity of annotated data presents a challenge, particularly in the field of digital pathology. Therefore, there is a demand for an accurate and objective method to detect LI using a small dataset, benefiting pathologists. In this study, we trained convolutional neural networks to classify LI using a four-step training process: (1) weak model training, (2) identification of false positives, (3) hard negative mining in a weakly labeled dataset, and (4) strong model training. To overcome the lack of annotated datasets, we applied a hard negative mining approach in a weakly labeled dataset, which contained only final diagnostic information, resembling the typical data found in hospital databases, and improved classification performance. Ablation studies were performed to simulate the lack of datasets and severely unbalanced datasets, further confirming the effectiveness of our proposed approach. Notably, our results demonstrated that, despite the small number of annotated datasets, efficient training was achievable, with the potential to extend to other image classification approaches used in medicine.</p>","PeriodicalId":48612,"journal":{"name":"Journal of Pathology Clinical Research","volume":"10 1","pages":""},"PeriodicalIF":3.7000,"publicationDate":"2023-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cjp2.355","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Pathology Clinical Research","FirstCategoryId":"3","ListUrlMain":"https://pathsocjournals.onlinelibrary.wiley.com/doi/10.1002/cjp2.355","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PATHOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Gastric cancer is a significant public health concern, emphasizing the need for accurate evaluation of lymphatic invasion (LI) for determining prognosis and treatment options. However, this task is time-consuming, labor-intensive, and prone to intra- and interobserver variability. Furthermore, the scarcity of annotated data presents a challenge, particularly in the field of digital pathology. Therefore, there is a demand for an accurate and objective method to detect LI using a small dataset, benefiting pathologists. In this study, we trained convolutional neural networks to classify LI using a four-step training process: (1) weak model training, (2) identification of false positives, (3) hard negative mining in a weakly labeled dataset, and (4) strong model training. To overcome the lack of annotated datasets, we applied a hard negative mining approach in a weakly labeled dataset, which contained only final diagnostic information, resembling the typical data found in hospital databases, and improved classification performance. Ablation studies were performed to simulate the lack of datasets and severely unbalanced datasets, further confirming the effectiveness of our proposed approach. Notably, our results demonstrated that, despite the small number of annotated datasets, efficient training was achievable, with the potential to extend to other image classification approaches used in medicine.
胃癌是一个重大的公共卫生问题,因此需要对淋巴管侵犯(LI)进行准确评估,以确定预后和治疗方案。然而,这项工作耗时耗力,而且容易出现观察者内部和观察者之间的差异。此外,注释数据的缺乏也是一个挑战,尤其是在数字病理学领域。因此,我们需要一种准确、客观的方法,利用少量数据集检测病理组织缺损,使病理学家从中受益。在这项研究中,我们采用四步训练流程训练卷积神经网络对 LI 进行分类:(1)弱模型训练;(2)识别假阳性;(3)在弱标记数据集中挖掘硬阴性;(4)强模型训练。为了克服缺乏标注数据集的问题,我们在弱标注数据集中应用了硬阴性挖掘方法,该方法仅包含最终诊断信息,与医院数据库中的典型数据相似,从而提高了分类性能。为了模拟缺乏数据集和严重不平衡数据集的情况,我们进行了消融研究,进一步证实了我们提出的方法的有效性。值得注意的是,我们的结果表明,尽管注释数据集的数量较少,但可以实现高效的训练,并有可能扩展到医学中使用的其他图像分类方法。
期刊介绍:
The Journal of Pathology: Clinical Research and The Journal of Pathology serve as translational bridges between basic biomedical science and clinical medicine with particular emphasis on, but not restricted to, tissue based studies.
The focus of The Journal of Pathology: Clinical Research is the publication of studies that illuminate the clinical relevance of research in the broad area of the study of disease. Appropriately powered and validated studies with novel diagnostic, prognostic and predictive significance, and biomarker discover and validation, will be welcomed. Studies with a predominantly mechanistic basis will be more appropriate for the companion Journal of Pathology.