Daniel Bruneß, Matthias Bay, Christian Schulze, Michael Guckert, Mirjam Minor
{"title":"基于本体的医学文献分类迁移学习方法","authors":"Daniel Bruneß, Matthias Bay, Christian Schulze, Michael Guckert, Mirjam Minor","doi":"10.1109/ICMLA55696.2022.00065","DOIUrl":null,"url":null,"abstract":"Automatic classification of documents is a well known problem and can be solved with Machine Learning methods. However, such approaches require large sets of training data which are not always available. Moreover, in data protection sensitive domains, e.g. electronic health records, Machine Learning models often cannot directly be transferred to other environments. We present a transfer learning method which uses ontologies to normalise the feature space of text classifiers. With this we can guarantee that the trained models do not contain any person related data and can therefore be widely reused without raising General Data Protection Regulation (GDPR) issues. Furthermore, we describe a process with which the ontologies can be enriched so that the classifiers can be reused in different contexts with deviating terminology without any additional training of the classifiers. Our transfer learning method follows a combined paradigm of transfer by copy and transfer by enrichment. As proof of concept we apply classifiers trained on hospital medical documents together with appropriately enriched ontologies to medical texts written in colloquial language. The promising results show the potential of our transfer learning approach that respects GDPR requirements and can flexibly be adapted to drifting terminology.","PeriodicalId":128160,"journal":{"name":"2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"An Ontology-based transfer learning method improving classification of medical documents\",\"authors\":\"Daniel Bruneß, Matthias Bay, Christian Schulze, Michael Guckert, Mirjam Minor\",\"doi\":\"10.1109/ICMLA55696.2022.00065\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Automatic classification of documents is a well known problem and can be solved with Machine Learning methods. However, such approaches require large sets of training data which are not always available. Moreover, in data protection sensitive domains, e.g. electronic health records, Machine Learning models often cannot directly be transferred to other environments. We present a transfer learning method which uses ontologies to normalise the feature space of text classifiers. With this we can guarantee that the trained models do not contain any person related data and can therefore be widely reused without raising General Data Protection Regulation (GDPR) issues. Furthermore, we describe a process with which the ontologies can be enriched so that the classifiers can be reused in different contexts with deviating terminology without any additional training of the classifiers. Our transfer learning method follows a combined paradigm of transfer by copy and transfer by enrichment. As proof of concept we apply classifiers trained on hospital medical documents together with appropriately enriched ontologies to medical texts written in colloquial language. The promising results show the potential of our transfer learning approach that respects GDPR requirements and can flexibly be adapted to drifting terminology.\",\"PeriodicalId\":128160,\"journal\":{\"name\":\"2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)\",\"volume\":\"42 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICMLA55696.2022.00065\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA55696.2022.00065","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An Ontology-based transfer learning method improving classification of medical documents
Automatic classification of documents is a well known problem and can be solved with Machine Learning methods. However, such approaches require large sets of training data which are not always available. Moreover, in data protection sensitive domains, e.g. electronic health records, Machine Learning models often cannot directly be transferred to other environments. We present a transfer learning method which uses ontologies to normalise the feature space of text classifiers. With this we can guarantee that the trained models do not contain any person related data and can therefore be widely reused without raising General Data Protection Regulation (GDPR) issues. Furthermore, we describe a process with which the ontologies can be enriched so that the classifiers can be reused in different contexts with deviating terminology without any additional training of the classifiers. Our transfer learning method follows a combined paradigm of transfer by copy and transfer by enrichment. As proof of concept we apply classifiers trained on hospital medical documents together with appropriately enriched ontologies to medical texts written in colloquial language. The promising results show the potential of our transfer learning approach that respects GDPR requirements and can flexibly be adapted to drifting terminology.