{"title":"Predicting chronic diseases using clinical notes and fine-tuned transformers","authors":"Swati Saigaonkar, Dr. Vaibhav Eknath Narawade","doi":"10.1109/IBSSC56953.2022.10037512","DOIUrl":null,"url":null,"abstract":"Electronic health records(EHR) have been used extensively by researchers lately to gain insights and use them as clinical informatics. EHR data contains structured data, as a result of having information systems in-place, and also unstructured data like clinical notes. These unstructured data have a huge scope of exploration and can derive meaningful insights. Challenges exists like the heterogeneous and multi modal nature of such data. This work provides insights into the EHR data, the datasets available for research, the tasks that can be performed on them, the methods that can be applied on them, and then demonstrates how BERT and DistilBERT can be fine-tuned on the medical datasets to predict chronic diseases like asthma, renal diseases, heart diseases and arthritis and how DISTILBERT can be a preferred option over BERT. Both the models BERT and DISTILBERT have been pre-trained and then fine tuned to predict the chronic diseases from the clinical notes.","PeriodicalId":426897,"journal":{"name":"2022 IEEE Bombay Section Signature Conference (IBSSC)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE Bombay Section Signature Conference (IBSSC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IBSSC56953.2022.10037512","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Electronic health records(EHR) have been used extensively by researchers lately to gain insights and use them as clinical informatics. EHR data contains structured data, as a result of having information systems in-place, and also unstructured data like clinical notes. These unstructured data have a huge scope of exploration and can derive meaningful insights. Challenges exists like the heterogeneous and multi modal nature of such data. This work provides insights into the EHR data, the datasets available for research, the tasks that can be performed on them, the methods that can be applied on them, and then demonstrates how BERT and DistilBERT can be fine-tuned on the medical datasets to predict chronic diseases like asthma, renal diseases, heart diseases and arthritis and how DISTILBERT can be a preferred option over BERT. Both the models BERT and DISTILBERT have been pre-trained and then fine tuned to predict the chronic diseases from the clinical notes.