使用知识蒸馏转换器从单一角度生成放射学报告

IF 4.9 2区 医学 Q1 ENGINEERING, BIOMEDICAL
Asad Mansoor Khan , Mashood Mohammad Mohsan , Muhammad Usman Akram , Taimur Hassan , Sajid Gul Khawaja , Adil Qayyum
{"title":"使用知识蒸馏转换器从单一角度生成放射学报告","authors":"Asad Mansoor Khan ,&nbsp;Mashood Mohammad Mohsan ,&nbsp;Muhammad Usman Akram ,&nbsp;Taimur Hassan ,&nbsp;Sajid Gul Khawaja ,&nbsp;Adil Qayyum","doi":"10.1016/j.bspc.2025.108340","DOIUrl":null,"url":null,"abstract":"<div><div>Nearly two billion chest X-rays (CXRs) are performed annually, making them the most used imaging technique in radiology for the diagnosis of pulmonary disorders. The accompanying report with the findings from a chest X-ray forms a crucial part of the examination. By providing an accurate report, healthcare professionals can be enabled to make better decisions about the care being provided. To this end, we propose an end-to-end radiology report generation framework built on transformers trained on text reports in conjunction with visual characteristics of the chest X-ray to generate a reliable report that astutely describes the findings from a single CXR taken either from the Anterior-Posterior or Posterior-Anterior position. A foundation model is utilised to perform Knowledge Distillation (KD) in conjunction with the Encoder which is fine-tuned during the training phase. In addition, using a large corpus of radiology reports to pre-train the foundation model in an unsupervised manner is shown to improve the performance on smaller datasets. This training methodology results in comparable performance to architectures that employ a lot more parameters. The proposed framework is evaluated on multiple datasets including the Indiana University dataset, MIMIC dataset, MIMIC-PRO dataset, and BRAX dataset. The incorporation of KD results in an increase of BLEU-1 score for Indiana dataset by 4% and BERTScore by 7.5%. Similarly, pre-training on larger datasets in combination with KD, further increases BLEU-1 score for Indiana dataset by 7.2% and BERTScore by 3%. For MIMIC dataset, comparable performance is achieved for the Findings and the Impression sections of the report while the proposed framework outperforms other techniques when both of these sections are combined. For MIMIC-PRO dataset, an s<span><math><msub><mrow></mrow><mrow><mi>e</mi><mi>m</mi><mi>b</mi></mrow></msub></math></span> score of 0.4069 while a RadGraph F1 score of 0.1165 is achieved outperforming other techniques in the literature. Finally, the proposed framework is also evaluated on locally gathered dataset and BRAX subset without any re-training or fine-tuning resulting in BLEU-1 score of 0.3827 and a BERTScore of 0.4392 for the former and BLEU-1 score 0.1671 of and a BERTScore of 0.2186 for latter showing generalisation ability.</div></div>","PeriodicalId":55362,"journal":{"name":"Biomedical Signal Processing and Control","volume":"111 ","pages":"Article 108340"},"PeriodicalIF":4.9000,"publicationDate":"2025-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Radiology report generation from a singular perspective using transformers with Knowledge Distillation\",\"authors\":\"Asad Mansoor Khan ,&nbsp;Mashood Mohammad Mohsan ,&nbsp;Muhammad Usman Akram ,&nbsp;Taimur Hassan ,&nbsp;Sajid Gul Khawaja ,&nbsp;Adil Qayyum\",\"doi\":\"10.1016/j.bspc.2025.108340\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Nearly two billion chest X-rays (CXRs) are performed annually, making them the most used imaging technique in radiology for the diagnosis of pulmonary disorders. The accompanying report with the findings from a chest X-ray forms a crucial part of the examination. By providing an accurate report, healthcare professionals can be enabled to make better decisions about the care being provided. To this end, we propose an end-to-end radiology report generation framework built on transformers trained on text reports in conjunction with visual characteristics of the chest X-ray to generate a reliable report that astutely describes the findings from a single CXR taken either from the Anterior-Posterior or Posterior-Anterior position. A foundation model is utilised to perform Knowledge Distillation (KD) in conjunction with the Encoder which is fine-tuned during the training phase. In addition, using a large corpus of radiology reports to pre-train the foundation model in an unsupervised manner is shown to improve the performance on smaller datasets. This training methodology results in comparable performance to architectures that employ a lot more parameters. The proposed framework is evaluated on multiple datasets including the Indiana University dataset, MIMIC dataset, MIMIC-PRO dataset, and BRAX dataset. The incorporation of KD results in an increase of BLEU-1 score for Indiana dataset by 4% and BERTScore by 7.5%. Similarly, pre-training on larger datasets in combination with KD, further increases BLEU-1 score for Indiana dataset by 7.2% and BERTScore by 3%. For MIMIC dataset, comparable performance is achieved for the Findings and the Impression sections of the report while the proposed framework outperforms other techniques when both of these sections are combined. For MIMIC-PRO dataset, an s<span><math><msub><mrow></mrow><mrow><mi>e</mi><mi>m</mi><mi>b</mi></mrow></msub></math></span> score of 0.4069 while a RadGraph F1 score of 0.1165 is achieved outperforming other techniques in the literature. Finally, the proposed framework is also evaluated on locally gathered dataset and BRAX subset without any re-training or fine-tuning resulting in BLEU-1 score of 0.3827 and a BERTScore of 0.4392 for the former and BLEU-1 score 0.1671 of and a BERTScore of 0.2186 for latter showing generalisation ability.</div></div>\",\"PeriodicalId\":55362,\"journal\":{\"name\":\"Biomedical Signal Processing and Control\",\"volume\":\"111 \",\"pages\":\"Article 108340\"},\"PeriodicalIF\":4.9000,\"publicationDate\":\"2025-07-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Biomedical Signal Processing and Control\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1746809425008511\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, BIOMEDICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biomedical Signal Processing and Control","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1746809425008511","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}
引用次数: 0

摘要

每年进行近20亿次胸部x光检查(cxr),使其成为放射学中用于诊断肺部疾病的最常用成像技术。随附的胸部x光检查报告是检查的关键部分。通过提供准确的报告,医疗保健专业人员可以对所提供的护理做出更好的决策。为此,我们提出了一个端到端的放射学报告生成框架,该框架建立在文本报告训练的转换器上,并结合胸部x光片的视觉特征,生成可靠的报告,准确描述从前后位或前后位拍摄的单个CXR的结果。一个基础模型被用来执行知识蒸馏(KD)与编码器,这是在训练阶段微调。此外,使用大型放射学报告语料库以无监督的方式预训练基础模型可以提高较小数据集的性能。这种训练方法的结果与使用更多参数的体系结构的性能相当。该框架在多个数据集上进行了评估,包括印第安纳大学数据集、MIMIC数据集、MIMIC- pro数据集和BRAX数据集。纳入KD后,印第安纳数据集的blue -1得分提高了4%,BERTScore提高了7.5%。同样,结合KD对更大的数据集进行预训练,进一步将印第安纳数据集的BLEU-1分数提高7.2%,将BERTScore提高3%。对于MIMIC数据集,报告的调查结果和印象部分实现了可比较的性能,而当这两个部分结合在一起时,建议的框架优于其他技术。对于MIMIC-PRO数据集,semb得分为0.4069,RadGraph F1得分为0.1165,优于文献中的其他技术。最后,在没有任何重新训练或微调的情况下,在本地收集的数据集和BRAX子集上对所提出的框架进行了评估,前者的BLEU-1得分为0.3827,BERTScore为0.4392,后者的BLEU-1得分为0.1671,BERTScore为0.2186,显示出泛化能力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Radiology report generation from a singular perspective using transformers with Knowledge Distillation
Nearly two billion chest X-rays (CXRs) are performed annually, making them the most used imaging technique in radiology for the diagnosis of pulmonary disorders. The accompanying report with the findings from a chest X-ray forms a crucial part of the examination. By providing an accurate report, healthcare professionals can be enabled to make better decisions about the care being provided. To this end, we propose an end-to-end radiology report generation framework built on transformers trained on text reports in conjunction with visual characteristics of the chest X-ray to generate a reliable report that astutely describes the findings from a single CXR taken either from the Anterior-Posterior or Posterior-Anterior position. A foundation model is utilised to perform Knowledge Distillation (KD) in conjunction with the Encoder which is fine-tuned during the training phase. In addition, using a large corpus of radiology reports to pre-train the foundation model in an unsupervised manner is shown to improve the performance on smaller datasets. This training methodology results in comparable performance to architectures that employ a lot more parameters. The proposed framework is evaluated on multiple datasets including the Indiana University dataset, MIMIC dataset, MIMIC-PRO dataset, and BRAX dataset. The incorporation of KD results in an increase of BLEU-1 score for Indiana dataset by 4% and BERTScore by 7.5%. Similarly, pre-training on larger datasets in combination with KD, further increases BLEU-1 score for Indiana dataset by 7.2% and BERTScore by 3%. For MIMIC dataset, comparable performance is achieved for the Findings and the Impression sections of the report while the proposed framework outperforms other techniques when both of these sections are combined. For MIMIC-PRO dataset, an semb score of 0.4069 while a RadGraph F1 score of 0.1165 is achieved outperforming other techniques in the literature. Finally, the proposed framework is also evaluated on locally gathered dataset and BRAX subset without any re-training or fine-tuning resulting in BLEU-1 score of 0.3827 and a BERTScore of 0.4392 for the former and BLEU-1 score 0.1671 of and a BERTScore of 0.2186 for latter showing generalisation ability.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Biomedical Signal Processing and Control
Biomedical Signal Processing and Control 工程技术-工程:生物医学
CiteScore
9.80
自引率
13.70%
发文量
822
审稿时长
4 months
期刊介绍: Biomedical Signal Processing and Control aims to provide a cross-disciplinary international forum for the interchange of information on research in the measurement and analysis of signals and images in clinical medicine and the biological sciences. Emphasis is placed on contributions dealing with the practical, applications-led research on the use of methods and devices in clinical diagnosis, patient monitoring and management. Biomedical Signal Processing and Control reflects the main areas in which these methods are being used and developed at the interface of both engineering and clinical science. The scope of the journal is defined to include relevant review papers, technical notes, short communications and letters. Tutorial papers and special issues will also be published.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信