BERT-based natural language processing analysis of French CT reports: Application to the measurement of the positivity rate for pulmonary embolism

Émilien Jupin-Delevaux , Aissam Djahnine , François Talbot , Antoine Richard , Sylvain Gouttard , Adeline Mansuy , Philippe Douek , Salim Si-Mohamed , Loïc Boussel
{"title":"BERT-based natural language processing analysis of French CT reports: Application to the measurement of the positivity rate for pulmonary embolism","authors":"Émilien Jupin-Delevaux ,&nbsp;Aissam Djahnine ,&nbsp;François Talbot ,&nbsp;Antoine Richard ,&nbsp;Sylvain Gouttard ,&nbsp;Adeline Mansuy ,&nbsp;Philippe Douek ,&nbsp;Salim Si-Mohamed ,&nbsp;Loïc Boussel","doi":"10.1016/j.redii.2023.100027","DOIUrl":null,"url":null,"abstract":"<div><h3>Rationale and objectives</h3><p>To develop a Natural Language Processing (NLP) method based on Bidirectional Encoder Representations from Transformers (BERT) adapted to French CT reports and to evaluate its performance to calculate the diagnostic yield of CT in patients with clinical suspicion of pulmonary embolism (PE).</p></div><div><h3>Materials and methods</h3><p>All the CT reports performed in our institution in 2019 (99,510 reports, training and validation dataset) and 2018 (94,559 reports, testing dataset) were included after anonymization. Two BERT-based NLP sentence classifiers were trained on 27.700, manually labeled, sentences from the training dataset. The first one aimed to classify the reports’ sentences into three classes (“Non chest”, “Healthy chest”, and \"Pathological chest\" related sentences), the second one to classify the last class into eleven sub classes pathologies including \"pulmonary embolism\". F1-score was reported on the validation dataset. These NLP classifiers were then applied to requested CT reports for pulmonary embolism from the testing dataset. Sensitivity, specificity, and accuracy for detection of the presence of a pulmonary embolism were reported in comparison to human analysis of the reports.</p></div><div><h3>Results</h3><p>The F1-score for the 3-Classes and 11-SubClasses classifiers was 0.984 and 0.985, respectively. 4,042 examinations from the testing dataset were requested for pulmonary embolism of which 641 (15.8%) were positively evaluated by radiologists. The sensitivity, specificity, and accuracy of the NLP network for identifying pulmonary embolism in these reports were 98.2%, 99.3% and 99.1%, respectively.</p></div><div><h3>Conclusion</h3><p>BERT-based NLP sentences classifier enables the analysis of large databases of radiological reports to accurately determine the diagnostic yield of CT screening.</p></div>","PeriodicalId":74676,"journal":{"name":"Research in diagnostic and interventional imaging","volume":"6 ","pages":"Article 100027"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Research in diagnostic and interventional imaging","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2772652523000066","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Rationale and objectives

To develop a Natural Language Processing (NLP) method based on Bidirectional Encoder Representations from Transformers (BERT) adapted to French CT reports and to evaluate its performance to calculate the diagnostic yield of CT in patients with clinical suspicion of pulmonary embolism (PE).

Materials and methods

All the CT reports performed in our institution in 2019 (99,510 reports, training and validation dataset) and 2018 (94,559 reports, testing dataset) were included after anonymization. Two BERT-based NLP sentence classifiers were trained on 27.700, manually labeled, sentences from the training dataset. The first one aimed to classify the reports’ sentences into three classes (“Non chest”, “Healthy chest”, and "Pathological chest" related sentences), the second one to classify the last class into eleven sub classes pathologies including "pulmonary embolism". F1-score was reported on the validation dataset. These NLP classifiers were then applied to requested CT reports for pulmonary embolism from the testing dataset. Sensitivity, specificity, and accuracy for detection of the presence of a pulmonary embolism were reported in comparison to human analysis of the reports.

Results

The F1-score for the 3-Classes and 11-SubClasses classifiers was 0.984 and 0.985, respectively. 4,042 examinations from the testing dataset were requested for pulmonary embolism of which 641 (15.8%) were positively evaluated by radiologists. The sensitivity, specificity, and accuracy of the NLP network for identifying pulmonary embolism in these reports were 98.2%, 99.3% and 99.1%, respectively.

Conclusion

BERT-based NLP sentences classifier enables the analysis of large databases of radiological reports to accurately determine the diagnostic yield of CT screening.

基于BERT的法语CT报告自然语言处理分析:在肺栓塞阳性率测量中的应用
原理和目的开发一种基于变压器双向编码器表示(BERT)的自然语言处理(NLP)方法,该方法适用于法国CT报告,并评估其性能,以计算临床怀疑肺栓塞(PE)患者的CT诊断率。材料和方法2019年在我院进行的所有CT报告(99510份报告,培训和验证数据集)和2018年(94559份报告,测试数据集)在匿名化后被纳入。两个基于BERT的NLP句子分类器在来自训练数据集的27.700个手动标记的句子上进行训练。第一个是将报告的句子分为三类(“非胸部”、“健康胸部”和“病理性胸部”相关句子),第二个是将最后一类分为11个子类,包括“肺栓塞”。在验证数据集中报告了F1分数。然后将这些NLP分类器应用于来自测试数据集的肺栓塞CT报告。与人类对报告的分析相比,报告了检测肺栓塞存在的敏感性、特异性和准确性。结果3类分类器和11个子类分类器的F1得分分别为0.984和0.985。要求从测试数据集中进行4042次肺栓塞检查,其中641次(15.8%)得到放射科医生的积极评价。在这些报告中,NLP网络识别肺栓塞的敏感性、特异性和准确性分别为98.2%、99.3%和99.1%。结论基于BERT的NLP语句分类器能够对大型放射学报告数据库进行分析,准确地确定CT筛查的诊断率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信