人工智能驱动的自动合成患者状态语料库生成方法

2020 4th International Conference on Artificial Intelligence and Virtual Reality Pub Date : 2020-10-23 DOI:10.1145/3439133.3439141

Boris Velichkov, K. Ivanova, Valeri Hristov, I. Borisov, Alexander Peychev, Ivan Koychev, S. Boytcheva

{"title":"人工智能驱动的自动合成患者状态语料库生成方法","authors":"Boris Velichkov, K. Ivanova, Valeri Hristov, I. Borisov, Alexander Peychev, Ivan Koychev, S. Boytcheva","doi":"10.1145/3439133.3439141","DOIUrl":null,"url":null,"abstract":"Medical data for patients is sensitive personal information and therefore to be used in the original form is unacceptable. On the other hand, in order to be able to do various studies and analysis, we need such data. In many cases, such data even anonymized, by removing the personal identifiers, which are not suitable to be shared. Therefore we decided to create a corpus of synthetic statuses of patients that GPs place when performing a general examination. Each status consists of several sentences, each sentence describing the condition of an organ, system or part of the patient's body. We divided the status into its constituent sentences and then each sentence was classified based on the organ it refers to. We build a gold standard of manually classified sentences into list of human body organs and systems. Then we use it to train a neural network classifier of sentences that reaches almost 99% accuracy. Finally, from the all classified sentences we generate synthetic statuses, composed according to statistics in the available real statuses and medical domain constrains. The proposed approach can be easily adapted to other languages.","PeriodicalId":291985,"journal":{"name":"2020 4th International Conference on Artificial Intelligence and Virtual Reality","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"AI-driven Approach for Automatic Synthetic Patient Status Corpus Generation\",\"authors\":\"Boris Velichkov, K. Ivanova, Valeri Hristov, I. Borisov, Alexander Peychev, Ivan Koychev, S. Boytcheva\",\"doi\":\"10.1145/3439133.3439141\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Medical data for patients is sensitive personal information and therefore to be used in the original form is unacceptable. On the other hand, in order to be able to do various studies and analysis, we need such data. In many cases, such data even anonymized, by removing the personal identifiers, which are not suitable to be shared. Therefore we decided to create a corpus of synthetic statuses of patients that GPs place when performing a general examination. Each status consists of several sentences, each sentence describing the condition of an organ, system or part of the patient's body. We divided the status into its constituent sentences and then each sentence was classified based on the organ it refers to. We build a gold standard of manually classified sentences into list of human body organs and systems. Then we use it to train a neural network classifier of sentences that reaches almost 99% accuracy. Finally, from the all classified sentences we generate synthetic statuses, composed according to statistics in the available real statuses and medical domain constrains. The proposed approach can be easily adapted to other languages.\",\"PeriodicalId\":291985,\"journal\":{\"name\":\"2020 4th International Conference on Artificial Intelligence and Virtual Reality\",\"volume\":\"33 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-10-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 4th International Conference on Artificial Intelligence and Virtual Reality\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3439133.3439141\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 4th International Conference on Artificial Intelligence and Virtual Reality","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3439133.3439141","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

患者的医疗数据是敏感的个人信息，因此不能以原始形式使用。另一方面，为了能够进行各种研究和分析，我们需要这样的数据。在许多情况下，这些数据甚至匿名化，通过删除个人标识符，这是不适合共享的。因此，我们决定创建一个全科医生在进行一般检查时放置的患者综合状态语料库。每个状态由几个句子组成，每个句子描述患者身体某个器官、系统或部分的状况。我们将状态划分为其组成句，然后根据其所指的器官对每个句子进行分类。我们建立了一个黄金标准，人工分类句子到人体器官和系统的列表。然后我们用它来训练一个句子的神经网络分类器，准确率接近99%。最后，我们从所有分类的句子中生成合成状态，根据可用的真实状态和医学领域约束的统计组成。所提出的方法可以很容易地适用于其他语言。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

AI-driven Approach for Automatic Synthetic Patient Status Corpus Generation

Medical data for patients is sensitive personal information and therefore to be used in the original form is unacceptable. On the other hand, in order to be able to do various studies and analysis, we need such data. In many cases, such data even anonymized, by removing the personal identifiers, which are not suitable to be shared. Therefore we decided to create a corpus of synthetic statuses of patients that GPs place when performing a general examination. Each status consists of several sentences, each sentence describing the condition of an organ, system or part of the patient's body. We divided the status into its constituent sentences and then each sentence was classified based on the organ it refers to. We build a gold standard of manually classified sentences into list of human body organs and systems. Then we use it to train a neural network classifier of sentences that reaches almost 99% accuracy. Finally, from the all classified sentences we generate synthetic statuses, composed according to statistics in the available real statuses and medical domain constrains. The proposed approach can be easily adapted to other languages.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 4th International Conference on Artificial Intelligence and Virtual Reality

自引率

0.00%

发文量