The Pipeline for Standardizing Russian Unstructured Allergy Anamnesis Using FHIR AllergyIntolerance Resource.

IF 1.8 4区医学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

Methods of Information in Medicine Pub Date : 2021-09-01 Epub Date: 2021-08-23 DOI:10.1055/s-0041-1733945

Iuliia D Lenivtceva, Georgy Kopanitsa

{"title":"The Pipeline for Standardizing Russian Unstructured Allergy Anamnesis Using FHIR AllergyIntolerance Resource.","authors":"Iuliia D Lenivtceva, Georgy Kopanitsa","doi":"10.1055/s-0041-1733945","DOIUrl":null,"url":null,"abstract":"Background: The larger part of essential medical knowledge is stored as free text which is complicated to process. Standardization of medical narratives is an important task for data exchange, integration, and semantic interoperability.Objectives: The article aims to develop the end-to-end pipeline for structuring Russian free-text allergy anamnesis using international standards.Methods: The pipeline for free-text data standardization is based on FHIR (Fast Healthcare Interoperability Resources) and SNOMED CT (Systematized Nomenclature of Medicine Clinical Terms) to ensure semantic interoperability. The pipeline solves common tasks such as data preprocessing, classification, categorization, entities extraction, and semantic codes assignment. Machine learning methods, rule-based, and dictionary-based approaches were used to compose the pipeline. The pipeline was evaluated on 166 randomly chosen medical records.Results: AllergyIntolerance resource was used to represent allergy anamnesis. The module for data preprocessing included the dictionary with over 90,000 words, including specific medication terms, and more than 20 regular expressions for errors correction, classification, and categorization modules resulted in four dictionaries with allergy terms (total 2,675 terms), which were mapped to SNOMED CT concepts. F-scores for different steps are: 0.945 for filtering, 0.90 to 0.96 for allergy categorization, 0.90 and 0.93 for allergens reactions extraction, respectively. The allergy terminology coverage is more than 95%.Conclusion: The proposed pipeline is a step to ensure semantic interoperability of Russian free-text medical records and could be effective in standardization systems for further data exchange and integration.","PeriodicalId":49822,"journal":{"name":"Methods of Information in Medicine","volume":"60 3-04","pages":"95-103"},"PeriodicalIF":1.8000,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Methods of Information in Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1055/s-0041-1733945","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2021/8/23 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 4

Abstract

Background: The larger part of essential medical knowledge is stored as free text which is complicated to process. Standardization of medical narratives is an important task for data exchange, integration, and semantic interoperability.

Objectives: The article aims to develop the end-to-end pipeline for structuring Russian free-text allergy anamnesis using international standards.

Methods: The pipeline for free-text data standardization is based on FHIR (Fast Healthcare Interoperability Resources) and SNOMED CT (Systematized Nomenclature of Medicine Clinical Terms) to ensure semantic interoperability. The pipeline solves common tasks such as data preprocessing, classification, categorization, entities extraction, and semantic codes assignment. Machine learning methods, rule-based, and dictionary-based approaches were used to compose the pipeline. The pipeline was evaluated on 166 randomly chosen medical records.

Results: AllergyIntolerance resource was used to represent allergy anamnesis. The module for data preprocessing included the dictionary with over 90,000 words, including specific medication terms, and more than 20 regular expressions for errors correction, classification, and categorization modules resulted in four dictionaries with allergy terms (total 2,675 terms), which were mapped to SNOMED CT concepts. F-scores for different steps are: 0.945 for filtering, 0.90 to 0.96 for allergy categorization, 0.90 and 0.93 for allergens reactions extraction, respectively. The allergy terminology coverage is more than 95%.

Conclusion: The proposed pipeline is a step to ensure semantic interoperability of Russian free-text medical records and could be effective in standardization systems for further data exchange and integration.

查看原文本刊更多论文

利用FHIR过敏症耐受资源标准化俄罗斯非结构化过敏记忆的管道。

背景:大部分医学基础知识以自由文本的形式存储，处理起来比较复杂。医学叙事的标准化是数据交换、集成和语义互操作性的重要任务。目的:本文旨在开发端到端的管道结构俄语自由文本过敏记忆使用国际标准。方法:基于FHIR (Fast Healthcare Interoperability Resources)和SNOMED CT (system系统化医学临床术语命名法)构建自由文本数据标准化管道，确保语义互操作性。该管道解决了常见的任务，如数据预处理、分类、分类、实体提取和语义代码分配。使用机器学习方法、基于规则的方法和基于字典的方法来组成管道。研究人员对随机选择的166份医疗记录进行了评估。结果:allergintolerance资源代表过敏反应记忆。数据预处理模块包括9万多个单词的字典，包括特定的药物术语，20多个正则表达式用于纠错、分类和分类模块，产生4个包含过敏术语的字典(总计2675个术语)，并将其映射到SNOMED CT概念。不同步骤的f值分别为:过滤为0.945，过敏分类为0.90 ~ 0.96，过敏原反应提取为0.90 ~ 0.93。过敏术语的覆盖率超过95%。结论:提出的管道是确保俄语自由文本病历语义互操作性的一个步骤，可以有效地用于进一步的数据交换和集成的标准化系统。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Methods of Information in Medicine 医学-计算机：信息系统

CiteScore

3.70

自引率

11.80%

发文量

审稿时长

6-12 weeks

期刊介绍： Good medicine and good healthcare demand good information. Since the journal''s founding in 1962, Methods of Information in Medicine has stressed the methodology and scientific fundamentals of organizing, representing and analyzing data, information and knowledge in biomedicine and health care. Covering publications in the fields of biomedical and health informatics, medical biometry, and epidemiology, the journal publishes original papers, reviews, reports, opinion papers, editorials, and letters to the editor. From time to time, the journal publishes articles on particular focus themes as part of a journal''s issue.