非结构化胸透检查方案自动标记系统中的语义分析方法

Social Aspects of Population Health Pub Date : 1900-01-01 DOI:10.21045/2071-5021-2023-69-1-12

L. Ronzhin, P. Astanin, D. Kokina, S. Semenov, K. Arzamasov, S. Rauzina

{"title":"非结构化胸透检查方案自动标记系统中的语义分析方法","authors":"L. Ronzhin, P. Astanin, D. Kokina, S. Semenov, K. Arzamasov, S. Rauzina","doi":"10.21045/2071-5021-2023-69-1-12","DOIUrl":null,"url":null,"abstract":"Currently, a unified structured standard for describing radiological chest examination does not exist. The complexity of developing such text report templates lies in the diversity of instrumental methods, variety of diagnostic objectives and specific work characteristics of individual medical organizations. Development of tools for marking the unstructured radiological chest examination protocols makes it possible to improve the system of electronic document management in healthcare due to automation of data formalization processes as well as develop data sets for machine learning. The purpose of this study is to develop a system for automated marking of text reports of the unstructured radiological chest examination protocols using heuristic approach and machine learning algorithms. Material and methods. The study used patient data on radiological chest examinations of medical organizations connected to the Unified Radiological Information Service of the Unified Medical Information and Analysis System of inpatient and outpatient medical organizations of Moscow and the Moscow region. Semantic analysis methods, expert rules and machine learning algorithms were used for processing the unstructured text reports. Results. The study has identified language patterns associated with important pathological conditions and “norm” class as well as developed regular expressions for these classes. A dictionary of radiological concepts and abbreviations (397 items) was compiled, followed by the development of an algorithm for correcting grammar mistakes in the protocols. In collaboration with the expert group, the rules of multilabel classification of the radiological examination protocols were created and their efficiency was tested. When solving the multilabel classification problem using only the expert rules, the percentage of exact matches equaled to 84%. Inasmuch as classifiers for conditions such as “infiltration/consolidation” and “blackout focus” were not effective, we have adjusted the models of machine learning. Conclusion. The best classification results were demonstrated by the recurrent neural network with the long-short term memory architecture ensuring sensitivity of 89% and 99% for “infiltration/consolidation” and “blackout focus” classes, respectively. This made it possible to statistically significantly (p=0.039) increase the total percentage of the exact matches up to 87%.","PeriodicalId":279998,"journal":{"name":"Social Aspects of Population Health","volume":"78 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"SEMANTIC ANALYSIS METHODS IN THE SYSTEM FOR AUTHOMATED MARKING OF THE UNSTRUCTURED RADIOLOGICAL CHEST EXAMINATION PROTOCOLS\",\"authors\":\"L. Ronzhin, P. Astanin, D. Kokina, S. Semenov, K. Arzamasov, S. Rauzina\",\"doi\":\"10.21045/2071-5021-2023-69-1-12\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Currently, a unified structured standard for describing radiological chest examination does not exist. The complexity of developing such text report templates lies in the diversity of instrumental methods, variety of diagnostic objectives and specific work characteristics of individual medical organizations. Development of tools for marking the unstructured radiological chest examination protocols makes it possible to improve the system of electronic document management in healthcare due to automation of data formalization processes as well as develop data sets for machine learning. The purpose of this study is to develop a system for automated marking of text reports of the unstructured radiological chest examination protocols using heuristic approach and machine learning algorithms. Material and methods. The study used patient data on radiological chest examinations of medical organizations connected to the Unified Radiological Information Service of the Unified Medical Information and Analysis System of inpatient and outpatient medical organizations of Moscow and the Moscow region. Semantic analysis methods, expert rules and machine learning algorithms were used for processing the unstructured text reports. Results. The study has identified language patterns associated with important pathological conditions and “norm” class as well as developed regular expressions for these classes. A dictionary of radiological concepts and abbreviations (397 items) was compiled, followed by the development of an algorithm for correcting grammar mistakes in the protocols. In collaboration with the expert group, the rules of multilabel classification of the radiological examination protocols were created and their efficiency was tested. When solving the multilabel classification problem using only the expert rules, the percentage of exact matches equaled to 84%. Inasmuch as classifiers for conditions such as “infiltration/consolidation” and “blackout focus” were not effective, we have adjusted the models of machine learning. Conclusion. The best classification results were demonstrated by the recurrent neural network with the long-short term memory architecture ensuring sensitivity of 89% and 99% for “infiltration/consolidation” and “blackout focus” classes, respectively. This made it possible to statistically significantly (p=0.039) increase the total percentage of the exact matches up to 87%.\",\"PeriodicalId\":279998,\"journal\":{\"name\":\"Social Aspects of Population Health\",\"volume\":\"78 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Social Aspects of Population Health\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.21045/2071-5021-2023-69-1-12\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Social Aspects of Population Health","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21045/2071-5021-2023-69-1-12","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

目前，还没有一个统一的描述胸部放射学检查的结构化标准。开发此类文本报告模板的复杂性在于工具方法的多样性、诊断目标的多样性以及个别医疗机构的具体工作特点。由于数据形式化过程的自动化以及为机器学习开发数据集，用于标记非结构化放射胸部检查协议的工具的开发使得改进医疗保健中的电子文档管理系统成为可能。本研究的目的是开发一个使用启发式方法和机器学习算法对非结构化放射胸部检查协议文本报告进行自动标记的系统。材料和方法。该研究使用了与莫斯科和莫斯科地区住院和门诊医疗机构统一医疗信息和分析系统的统一放射信息服务相连接的医疗机构的放射胸部检查患者数据。使用语义分析方法、专家规则和机器学习算法对非结构化文本报告进行处理。结果。该研究确定了与重要病理条件和“规范”类别相关的语言模式，并为这些类别开发了正则表达式。编写了一本放射学概念和缩略语词典(397项)，随后开发了一种算法，用于纠正协议中的语法错误。与专家组合作，创建了放射检查方案的多标签分类规则，并对其效率进行了测试。当仅使用专家规则解决多标签分类问题时，精确匹配的百分比等于84%。由于“渗透/巩固”和“停电焦点”等条件的分类器效果不佳，我们调整了机器学习的模型。结论。具有长短期记忆结构的递归神经网络对“渗透/巩固”和“停电焦点”类别的分类灵敏度分别为89%和99%，显示出最好的分类结果。这使得有可能在统计上显著地(p=0.039)将精确匹配的总百分比提高到87%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

SEMANTIC ANALYSIS METHODS IN THE SYSTEM FOR AUTHOMATED MARKING OF THE UNSTRUCTURED RADIOLOGICAL CHEST EXAMINATION PROTOCOLS

Currently, a unified structured standard for describing radiological chest examination does not exist. The complexity of developing such text report templates lies in the diversity of instrumental methods, variety of diagnostic objectives and specific work characteristics of individual medical organizations. Development of tools for marking the unstructured radiological chest examination protocols makes it possible to improve the system of electronic document management in healthcare due to automation of data formalization processes as well as develop data sets for machine learning. The purpose of this study is to develop a system for automated marking of text reports of the unstructured radiological chest examination protocols using heuristic approach and machine learning algorithms. Material and methods. The study used patient data on radiological chest examinations of medical organizations connected to the Unified Radiological Information Service of the Unified Medical Information and Analysis System of inpatient and outpatient medical organizations of Moscow and the Moscow region. Semantic analysis methods, expert rules and machine learning algorithms were used for processing the unstructured text reports. Results. The study has identified language patterns associated with important pathological conditions and “norm” class as well as developed regular expressions for these classes. A dictionary of radiological concepts and abbreviations (397 items) was compiled, followed by the development of an algorithm for correcting grammar mistakes in the protocols. In collaboration with the expert group, the rules of multilabel classification of the radiological examination protocols were created and their efficiency was tested. When solving the multilabel classification problem using only the expert rules, the percentage of exact matches equaled to 84%. Inasmuch as classifiers for conditions such as “infiltration/consolidation” and “blackout focus” were not effective, we have adjusted the models of machine learning. Conclusion. The best classification results were demonstrated by the recurrent neural network with the long-short term memory architecture ensuring sensitivity of 89% and 99% for “infiltration/consolidation” and “blackout focus” classes, respectively. This made it possible to statistically significantly (p=0.039) increase the total percentage of the exact matches up to 87%.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Social Aspects of Population Health

自引率

0.00%

发文量