Boris Velichkov, K. Ivanova, Valeri Hristov, I. Borisov, Alexander Peychev, Ivan Koychev, S. Boytcheva
{"title":"人工智能驱动的自动合成患者状态语料库生成方法","authors":"Boris Velichkov, K. Ivanova, Valeri Hristov, I. Borisov, Alexander Peychev, Ivan Koychev, S. Boytcheva","doi":"10.1145/3439133.3439141","DOIUrl":null,"url":null,"abstract":"Medical data for patients is sensitive personal information and therefore to be used in the original form is unacceptable. On the other hand, in order to be able to do various studies and analysis, we need such data. In many cases, such data even anonymized, by removing the personal identifiers, which are not suitable to be shared. Therefore we decided to create a corpus of synthetic statuses of patients that GPs place when performing a general examination. Each status consists of several sentences, each sentence describing the condition of an organ, system or part of the patient's body. We divided the status into its constituent sentences and then each sentence was classified based on the organ it refers to. We build a gold standard of manually classified sentences into list of human body organs and systems. Then we use it to train a neural network classifier of sentences that reaches almost 99% accuracy. Finally, from the all classified sentences we generate synthetic statuses, composed according to statistics in the available real statuses and medical domain constrains. The proposed approach can be easily adapted to other languages.","PeriodicalId":291985,"journal":{"name":"2020 4th International Conference on Artificial Intelligence and Virtual Reality","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"AI-driven Approach for Automatic Synthetic Patient Status Corpus Generation\",\"authors\":\"Boris Velichkov, K. Ivanova, Valeri Hristov, I. Borisov, Alexander Peychev, Ivan Koychev, S. Boytcheva\",\"doi\":\"10.1145/3439133.3439141\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Medical data for patients is sensitive personal information and therefore to be used in the original form is unacceptable. On the other hand, in order to be able to do various studies and analysis, we need such data. In many cases, such data even anonymized, by removing the personal identifiers, which are not suitable to be shared. Therefore we decided to create a corpus of synthetic statuses of patients that GPs place when performing a general examination. Each status consists of several sentences, each sentence describing the condition of an organ, system or part of the patient's body. We divided the status into its constituent sentences and then each sentence was classified based on the organ it refers to. We build a gold standard of manually classified sentences into list of human body organs and systems. Then we use it to train a neural network classifier of sentences that reaches almost 99% accuracy. Finally, from the all classified sentences we generate synthetic statuses, composed according to statistics in the available real statuses and medical domain constrains. The proposed approach can be easily adapted to other languages.\",\"PeriodicalId\":291985,\"journal\":{\"name\":\"2020 4th International Conference on Artificial Intelligence and Virtual Reality\",\"volume\":\"33 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-10-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 4th International Conference on Artificial Intelligence and Virtual Reality\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3439133.3439141\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 4th International Conference on Artificial Intelligence and Virtual Reality","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3439133.3439141","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
AI-driven Approach for Automatic Synthetic Patient Status Corpus Generation
Medical data for patients is sensitive personal information and therefore to be used in the original form is unacceptable. On the other hand, in order to be able to do various studies and analysis, we need such data. In many cases, such data even anonymized, by removing the personal identifiers, which are not suitable to be shared. Therefore we decided to create a corpus of synthetic statuses of patients that GPs place when performing a general examination. Each status consists of several sentences, each sentence describing the condition of an organ, system or part of the patient's body. We divided the status into its constituent sentences and then each sentence was classified based on the organ it refers to. We build a gold standard of manually classified sentences into list of human body organs and systems. Then we use it to train a neural network classifier of sentences that reaches almost 99% accuracy. Finally, from the all classified sentences we generate synthetic statuses, composed according to statistics in the available real statuses and medical domain constrains. The proposed approach can be easily adapted to other languages.