Text Analysis of Radiology Reports with Signs of Intracranial Hemorrhage on Brain CT Scans Using the Decision Tree Algorithm.

IF 1.1 Q4 MEDICINE, RESEARCH & EXPERIMENTAL
А N Khoruzhaya, D V Kozlov, К M Arzamasov, E I Kremneva
{"title":"Text Analysis of Radiology Reports with Signs of Intracranial Hemorrhage on Brain CT Scans Using the Decision Tree Algorithm.","authors":"А N Khoruzhaya,&nbsp;D V Kozlov,&nbsp;К M Arzamasov,&nbsp;E I Kremneva","doi":"10.17691/stm2022.14.6.04","DOIUrl":null,"url":null,"abstract":"<p><p><b>The aim of the study</b> is to create, train, and test the algorithm for the analysis of brain CT text reports using a decision tree model to solve the task of simple binary classification of presence/absence of intracranial hemorrhage (ICH) signs.</p><p><strong>Materials and methods: </strong>The initial data is a download from the Unified Radiological Information Service of the Unified Medical Information and Analytical System (URIS UMIAS) containing 34,188 studies obtained by a non-contrast CT of the brain in 56 inpatient medical settings. Data analysis and preprocessing were carried out using NLTK (Natural Language Toolkit, version 3.6.5), a library for symbolic and statistical processing of natural language, and scikit-learn, a machine learning library containing tools for classification tasks. According to 14 selected ICH-related key words, as well as 33 stop-phrases with key words denoting absence of ICH, an automatic selection of the CT investigations and their subsequent expert verification were carried out. Two classes of investigations were formed based on the sample from 3980 protocol descriptions: containing descriptions of ICH and without them. The problem of binary classification was solved using the decision tree algorithm as a model. To evaluate the performance of the model, the CT investigations were divided randomly into samples in the ratio of 7:3. Of 3980 protocols, 2786 were assigned to the training data set, 1194 - to the test one.</p><p><strong>Results: </strong>According to the test results, the designed and trained algorithm in the binary classification of the CT reports \"with signs of ICH\" and \"without signs of ICH\" has shown sensitivity of 0.94, specificity of 0.88, F-score of 0.83.</p><p><strong>Conclusion: </strong>The developed and trained algorithm for the analysis of radiology reports has demonstrated high accuracy in relation to brain CT with signs of intracranial hemorrhage and can be used to solve binary classification problems and create appropriate data sets. However, it is limited by the need for manual revision of CT studies to ensure quality control.</p>","PeriodicalId":51886,"journal":{"name":"Sovremennye Tehnologii v Medicine","volume":null,"pages":null},"PeriodicalIF":1.1000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10171057/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Sovremennye Tehnologii v Medicine","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.17691/stm2022.14.6.04","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"MEDICINE, RESEARCH & EXPERIMENTAL","Score":null,"Total":0}
引用次数: 0

Abstract

The aim of the study is to create, train, and test the algorithm for the analysis of brain CT text reports using a decision tree model to solve the task of simple binary classification of presence/absence of intracranial hemorrhage (ICH) signs.

Materials and methods: The initial data is a download from the Unified Radiological Information Service of the Unified Medical Information and Analytical System (URIS UMIAS) containing 34,188 studies obtained by a non-contrast CT of the brain in 56 inpatient medical settings. Data analysis and preprocessing were carried out using NLTK (Natural Language Toolkit, version 3.6.5), a library for symbolic and statistical processing of natural language, and scikit-learn, a machine learning library containing tools for classification tasks. According to 14 selected ICH-related key words, as well as 33 stop-phrases with key words denoting absence of ICH, an automatic selection of the CT investigations and their subsequent expert verification were carried out. Two classes of investigations were formed based on the sample from 3980 protocol descriptions: containing descriptions of ICH and without them. The problem of binary classification was solved using the decision tree algorithm as a model. To evaluate the performance of the model, the CT investigations were divided randomly into samples in the ratio of 7:3. Of 3980 protocols, 2786 were assigned to the training data set, 1194 - to the test one.

Results: According to the test results, the designed and trained algorithm in the binary classification of the CT reports "with signs of ICH" and "without signs of ICH" has shown sensitivity of 0.94, specificity of 0.88, F-score of 0.83.

Conclusion: The developed and trained algorithm for the analysis of radiology reports has demonstrated high accuracy in relation to brain CT with signs of intracranial hemorrhage and can be used to solve binary classification problems and create appropriate data sets. However, it is limited by the need for manual revision of CT studies to ensure quality control.

Abstract Image

Abstract Image

Abstract Image

基于决策树算法的颅内出血CT影像学报告文本分析。
本研究的目的是创建、训练和测试使用决策树模型分析脑CT文本报告的算法,以解决颅内出血(ICH)迹象存在/不存在的简单二值分类任务。材料和方法:最初的数据是从统一医疗信息和分析系统(URIS UMIAS)的统一放射信息服务下载的,其中包含56个住院医疗机构中通过非对比CT获得的34,188项研究。使用自然语言符号和统计处理库NLTK (Natural Language Toolkit, version 3.6.5)和机器学习库scikit-learn进行数据分析和预处理,scikit-learn包含用于分类任务的工具。根据选定的14个ICH相关关键词,以及33个表示ICH不存在的停止短语,自动选择CT调查并进行后续专家验证。根据3980份方案描述的样本,形成了两类调查:包含ICH描述和不包含ICH描述。以决策树算法为模型,解决了二值分类问题。为了评估模型的性能,将CT调查按7:3的比例随机分成样本。在3980个协议中,2786个分配给训练数据集,1194个分配给测试数据集。结果:根据试验结果,设计并训练的算法对“有脑出血征象”和“无脑出血征象”的CT报告进行二值分类,敏感性为0.94,特异性为0.88,f值为0.83。结论:所开发和训练的放射学报告分析算法对颅内出血征象的脑CT具有较高的准确率,可用于解决二值分类问题和创建合适的数据集。然而,需要手工修改CT研究以确保质量控制,这是有限的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Sovremennye Tehnologii v Medicine
Sovremennye Tehnologii v Medicine MEDICINE, RESEARCH & EXPERIMENTAL-
CiteScore
1.80
自引率
0.00%
发文量
38
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信