Uncertainty-aware automatic TNM staging classification for [¹⁸F] Fluorodeoxyglucose PET-CT reports for lung cancer utilising transformer-based language models and multi-task learning.

IF 3.3 3区医学 Q2 MEDICAL INFORMATICS

BMC Medical Informatics and Decision Making Pub Date : 2024-12-18 DOI:10.1186/s12911-024-02814-7

Stephen H Barlow, Sugama Chicklore, Yulan He, Sebastien Ourselin, Thomas Wagner, Anna Barnes, Gary J R Cook

{"title":"Uncertainty-aware automatic TNM staging classification for [18F] Fluorodeoxyglucose PET-CT reports for lung cancer utilising transformer-based language models and multi-task learning.","authors":"Stephen H Barlow, Sugama Chicklore, Yulan He, Sebastien Ourselin, Thomas Wagner, Anna Barnes, Gary J R Cook","doi":"10.1186/s12911-024-02814-7","DOIUrl":null,"url":null,"abstract":"Background: [18F] Fluorodeoxyglucose (FDG) PET-CT is a clinical imaging modality widely used in diagnosing and staging lung cancer. The clinical findings of PET-CT studies are contained within free text reports, which can currently only be categorised by experts manually reading them. Pre-trained transformer-based language models (PLMs) have shown success in extracting complex linguistic features from text. Accordingly, we developed a multi-task 'TNMu' classifier to classify the presence/absence of tumour, node, metastasis ('TNM') findings (as defined by The Eight Edition of TNM Staging for Lung Cancer). This is combined with an uncertainty classification task ('u') to account for studies with ambiguous TNM status.Methods: 2498 reports were annotated by a nuclear medicine physician and split into train, validation, and test datasets. For additional evaluation an external dataset (n = 461 reports) was created, and annotated by two nuclear medicine physicians with agreement reached on all examples. We trained and evaluated eleven publicly available PLMs to determine which is most effective for PET-CT reports, and compared multi-task, single task and traditional machine learning approaches.Results: We find that a multi-task approach with GatorTron as PLM achieves the best performance, with an overall accuracy (all four tasks correct) of 84% and a Hamming loss of 0.05 on the internal test dataset, and 79% and 0.07 on the external test dataset. Performance on the individual TNM tasks approached expert performance with macro average F1 scores of 0.91, 0.95 and 0.90 respectively on external data. For uncertainty an F1 of 0.77 is achieved.Conclusions: Our 'TNMu' classifier successfully extracts TNM staging information from internal and external PET-CT reports. We concluded that multi-task approaches result in the best performance, and better computational efficiency over single task PLM approaches. We believe these models can improve PET-CT services by assisting in auditing, creating research cohorts, and developing decision support systems. Our approach to handling uncertainty represents a novel first step but has room for further refinement.","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":"24 1","pages":"396"},"PeriodicalIF":3.3000,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11657742/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Medical Informatics and Decision Making","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12911-024-02814-7","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}

引用次数: 0

Abstract

Background: [¹⁸F] Fluorodeoxyglucose (FDG) PET-CT is a clinical imaging modality widely used in diagnosing and staging lung cancer. The clinical findings of PET-CT studies are contained within free text reports, which can currently only be categorised by experts manually reading them. Pre-trained transformer-based language models (PLMs) have shown success in extracting complex linguistic features from text. Accordingly, we developed a multi-task 'TNMu' classifier to classify the presence/absence of tumour, node, metastasis ('TNM') findings (as defined by The Eight Edition of TNM Staging for Lung Cancer). This is combined with an uncertainty classification task ('u') to account for studies with ambiguous TNM status.

Methods: 2498 reports were annotated by a nuclear medicine physician and split into train, validation, and test datasets. For additional evaluation an external dataset (n = 461 reports) was created, and annotated by two nuclear medicine physicians with agreement reached on all examples. We trained and evaluated eleven publicly available PLMs to determine which is most effective for PET-CT reports, and compared multi-task, single task and traditional machine learning approaches.

Results: We find that a multi-task approach with GatorTron as PLM achieves the best performance, with an overall accuracy (all four tasks correct) of 84% and a Hamming loss of 0.05 on the internal test dataset, and 79% and 0.07 on the external test dataset. Performance on the individual TNM tasks approached expert performance with macro average F1 scores of 0.91, 0.95 and 0.90 respectively on external data. For uncertainty an F1 of 0.77 is achieved.

Conclusions: Our 'TNMu' classifier successfully extracts TNM staging information from internal and external PET-CT reports. We concluded that multi-task approaches result in the best performance, and better computational efficiency over single task PLM approaches. We believe these models can improve PET-CT services by assisting in auditing, creating research cohorts, and developing decision support systems. Our approach to handling uncertainty represents a novel first step but has room for further refinement.

查看原文本刊更多论文

基于转换器的语言模型和多任务学习的肺癌氟脱氧葡萄糖PET-CT报告的不确定性自动TNM分期分类[18F]

背景：[18F]氟脱氧葡萄糖（Fluorodeoxyglucose， FDG） PET-CT是一种广泛应用于肺癌诊断和分期的临床影像学方式。PET-CT研究的临床结果包含在免费文本报告中，目前只能由专家手动阅读它们进行分类。预训练的基于变压器的语言模型（PLMs）在从文本中提取复杂的语言特征方面取得了成功。因此，我们开发了一个多任务的“TNMu”分类器来对肿瘤、淋巴结、转移（TNM）的存在/不存在进行分类（TNM）的发现（由第八版肺癌TNM分期定义）。这与不确定性分类任务（'u'）相结合，以解释具有模糊TNM状态的研究。方法：2498份报告由一名核医学医师注释，并分为训练、验证和测试数据集。为了进一步评估，我们创建了一个外部数据集（n = 461份报告），并由两名核医学医生对所有示例进行了注释，并达成了一致意见。我们训练并评估了11种公开可用的plm，以确定哪种plm对PET-CT报告最有效，并比较了多任务、单任务和传统机器学习方法。结果：我们发现以GatorTron作为PLM的多任务方法达到了最佳性能，内部测试数据集的总体准确率（所有四个任务都正确）为84%，Hamming损失为0.05，外部测试数据集为79%和0.07。个体TNM任务的绩效接近专家绩效，外部数据的宏观平均F1得分分别为0.91、0.95和0.90。对于不确定度，F1为0.77。结论：我们的“TNMu”分类器成功地从内部和外部PET-CT报告中提取了TNM分期信息。我们的结论是，与单任务PLM方法相比，多任务方法具有最佳性能和更好的计算效率。我们相信这些模型可以通过协助审计、创建研究队列和开发决策支持系统来改善PET-CT服务。我们处理不确定性的方法是新颖的第一步，但还有进一步改进的空间。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

BMC Medical Informatics and Decision Making 医学-医学：信息

CiteScore

7.20

自引率

5.70%

发文量

297

审稿时长

1 months

期刊介绍： BMC Medical Informatics and Decision Making is an open access journal publishing original peer-reviewed research articles in relation to the design, development, implementation, use, and evaluation of health information technologies and decision-making for human health.