Farhana Pethani, Alec Chapman, Mike Conway, Xiang Dai, Demiana Bishay, Victor Jun Xiang Choh, Alexander He, Su-Elle Lim, Huey Ying Ng, Tanya Mahony, Albert Yaacoub, Sarvnaz Karimi, Heiko Spallek, Adam G Dunn
{"title":"Extracting social determinants of health from dental clinical notes.","authors":"Farhana Pethani, Alec Chapman, Mike Conway, Xiang Dai, Demiana Bishay, Victor Jun Xiang Choh, Alexander He, Su-Elle Lim, Huey Ying Ng, Tanya Mahony, Albert Yaacoub, Sarvnaz Karimi, Heiko Spallek, Adam G Dunn","doi":"10.1055/a-2616-9858","DOIUrl":null,"url":null,"abstract":"<p><p>Objective In dentistry, social determinants of health (SDoH) are potentially recorded in the clinical notes of Electronic Dental Records (EDRs). The objective of this study was to examine the availability of SDoH data in dental clinical notes and evaluate NLP methods to extract SDoH from dental clinical notes. Methods A set of 1,000 dental clinical notes was sampled from a dataset of 105,311 patient visits to a dental clinic and manually annotated for information pertaining to sugar, tobacco, alcohol, methamphetamine, housing, and employment. Annotations included temporality, dose, type, duration, and frequency where appropriate. Experiments were to compare extraction using fine-tuned pre-trained language models (PLMs) with a rule-based approach. Performance was measured by F1-score. Results For identifying SDoH, the best performing PLM method produced F1-scores of 0.75 (sugar), 0.69 (tobacco), 0.67 (alcohol), 0.42 (housing), and 0 (employment). The rule-based method produced F1-scores of 0.70 (sugar), 0.69 (tobacco), 0.53 (alcohol), 0.44 (housing), and 0 (employment). The overall difference between PLMs and rule-based methods was F1-score of 0.04 (95% confidence interval -0.01, 0.09). SDoH were relatively rare in dental clinical notes, from sugar (9.1%), tobacco (3.9%), alcohol (1.2%), housing (1.2%), employment (0.2%), and methamphetamine use (0%). Conclusions The main challenge of extracting SDoH information from dental clinical notes was the frequency with which they are recorded, and the brevity and inconsistency where they are recorded. Improved surveillance likely needs new ways to standardise how SDoH are reported in dental clinical notes.</p>","PeriodicalId":48956,"journal":{"name":"Applied Clinical Informatics","volume":" ","pages":""},"PeriodicalIF":2.1000,"publicationDate":"2025-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Clinical Informatics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1055/a-2616-9858","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}
引用次数: 0
Abstract
Objective In dentistry, social determinants of health (SDoH) are potentially recorded in the clinical notes of Electronic Dental Records (EDRs). The objective of this study was to examine the availability of SDoH data in dental clinical notes and evaluate NLP methods to extract SDoH from dental clinical notes. Methods A set of 1,000 dental clinical notes was sampled from a dataset of 105,311 patient visits to a dental clinic and manually annotated for information pertaining to sugar, tobacco, alcohol, methamphetamine, housing, and employment. Annotations included temporality, dose, type, duration, and frequency where appropriate. Experiments were to compare extraction using fine-tuned pre-trained language models (PLMs) with a rule-based approach. Performance was measured by F1-score. Results For identifying SDoH, the best performing PLM method produced F1-scores of 0.75 (sugar), 0.69 (tobacco), 0.67 (alcohol), 0.42 (housing), and 0 (employment). The rule-based method produced F1-scores of 0.70 (sugar), 0.69 (tobacco), 0.53 (alcohol), 0.44 (housing), and 0 (employment). The overall difference between PLMs and rule-based methods was F1-score of 0.04 (95% confidence interval -0.01, 0.09). SDoH were relatively rare in dental clinical notes, from sugar (9.1%), tobacco (3.9%), alcohol (1.2%), housing (1.2%), employment (0.2%), and methamphetamine use (0%). Conclusions The main challenge of extracting SDoH information from dental clinical notes was the frequency with which they are recorded, and the brevity and inconsistency where they are recorded. Improved surveillance likely needs new ways to standardise how SDoH are reported in dental clinical notes.
期刊介绍:
ACI is the third Schattauer journal dealing with biomedical and health informatics. It perfectly complements our other journals Öffnet internen Link im aktuellen FensterMethods of Information in Medicine and the Öffnet internen Link im aktuellen FensterYearbook of Medical Informatics. The Yearbook of Medical Informatics being the “Milestone” or state-of-the-art journal and Methods of Information in Medicine being the “Science and Research” journal of IMIA, ACI intends to be the “Practical” journal of IMIA.