Xiaoyu Wang, Dipankar Gupta, Michael Killian, Zhe He
{"title":"Benchmarking Transformer-Based Models for Identifying Social Determinants of Health in Clinical Notes.","authors":"Xiaoyu Wang, Dipankar Gupta, Michael Killian, Zhe He","doi":"10.1109/ichi57859.2023.00102","DOIUrl":null,"url":null,"abstract":"<p><p>Electronic health records (EHR) have been widely used in building machine learning models for health outcomes prediction. However, many EHR-based models are inherently biased due to lack of risk factors on social determinants of health (SDoH), which are responsible for up to 40% preventive deaths. As SDoH information is often captured in clinical notes, recent efforts have been made to extract such information from notes with natural language processing and append it to other structured data. In this work, we benchmark 7 pre-trained transformer-based models, including BERT, ALBERT, BioBERT, BioClinicalBERT, RoBERTa, ELECTRA, and RoBERTa-MIMIC-Trial, for recognizing SDoH terms using a previously annotated corpus of MIMIC-III clinical notes. Our study shows that BioClinicalBERT model performs best on F-1 scores (0.911, 0.923) under both strict and relaxed criteria. This work shows the promise of using transformer-based models for recognizing SDoH information from clinical notes.</p>","PeriodicalId":73284,"journal":{"name":"IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics","volume":"2023 ","pages":"570-574"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10795706/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ichi57859.2023.00102","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/12/11 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Electronic health records (EHR) have been widely used in building machine learning models for health outcomes prediction. However, many EHR-based models are inherently biased due to lack of risk factors on social determinants of health (SDoH), which are responsible for up to 40% preventive deaths. As SDoH information is often captured in clinical notes, recent efforts have been made to extract such information from notes with natural language processing and append it to other structured data. In this work, we benchmark 7 pre-trained transformer-based models, including BERT, ALBERT, BioBERT, BioClinicalBERT, RoBERTa, ELECTRA, and RoBERTa-MIMIC-Trial, for recognizing SDoH terms using a previously annotated corpus of MIMIC-III clinical notes. Our study shows that BioClinicalBERT model performs best on F-1 scores (0.911, 0.923) under both strict and relaxed criteria. This work shows the promise of using transformer-based models for recognizing SDoH information from clinical notes.