Large-scale identification of social and behavioral determinants of health from clinical notes: comparison of Latent Semantic Indexing and Generative Pretrained Transformer (GPT) models.

IF 4.3 3区材料科学 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

ACS Applied Electronic Materials Pub Date : 2024-10-10 DOI:10.1186/s12911-024-02705-x

Sujoy Roy, Shane Morrell, Lili Zhao, Ramin Homayouni

{"title":"Large-scale identification of social and behavioral determinants of health from clinical notes: comparison of Latent Semantic Indexing and Generative Pretrained Transformer (GPT) models.","authors":"Sujoy Roy, Shane Morrell, Lili Zhao, Ramin Homayouni","doi":"10.1186/s12911-024-02705-x","DOIUrl":null,"url":null,"abstract":"Background: Social and behavioral determinants of health (SBDH) are associated with a variety of health and utilization outcomes, yet these factors are not routinely documented in the structured fields of electronic health records (EHR). The objective of this study was to evaluate different machine learning approaches for detection of SBDH from the unstructured clinical notes in the EHR.Methods: Latent Semantic Indexing (LSI) was applied to 2,083,180 clinical notes corresponding to 46,146 patients in the MIMIC-III dataset. Using LSI, patients were ranked based on conceptual relevance to a set of keywords (lexicons) pertaining to 15 different SBDH categories. For Generative Pretrained Transformer (GPT) models, API requests were made with a Python script to connect to the OpenAI services in Azure, using gpt-3.5-turbo-1106 and gpt-4-1106-preview models. Prediction of SBDH categories were performed using a logistic regression model that included age, gender, race and SBDH ICD-9 codes.Results: LSI retrieved patients according to 15 SBDH domains, with an overall average PPV <math><mo>≥</mo></math> 83%. Using manually curated gold standard (GS) sets for nine SBDH categories, the macro-F1 score of LSI (0.74) was better than ICD-9 (0.71) and GPT-3.5 (0.54), but lower than GPT-4 (0.80). Due to document size limitations, only a subset of the GS cases could be processed by GPT-3.5 (55.8%) and GPT-4 (94.2%), compared to LSI (100%). Using common GS subsets for nine different SBDH categories, the macro-F1 of ICD-9 combined with either LSI (mean 0.88, 95% CI 0.82-0.93), GPT-3.5 (0.86, 0.82-0.91) or GPT-4 (0.88, 0.83-0.94) was not significantly different. After including age, gender, race and ICD-9 in a logistic regression model, the AUC for prediction of six out of the nine SBDH categories was higher for LSI compared to GPT-4.0.Conclusions: These results demonstrate that the LSI approach performs comparable to more recent large language models, such as GPT-3.5 and GPT-4.0, when using the same set of documents. Importantly, LSI is robust, deterministic, and does not have document-size limitations or cost implications, which make it more amenable to real-world applications in health systems.","PeriodicalId":3,"journal":{"name":"ACS Applied Electronic Materials","volume":null,"pages":null},"PeriodicalIF":4.3000,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11465786/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Applied Electronic Materials","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12911-024-02705-x","RegionNum":3,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Social and behavioral determinants of health (SBDH) are associated with a variety of health and utilization outcomes, yet these factors are not routinely documented in the structured fields of electronic health records (EHR). The objective of this study was to evaluate different machine learning approaches for detection of SBDH from the unstructured clinical notes in the EHR.

Methods: Latent Semantic Indexing (LSI) was applied to 2,083,180 clinical notes corresponding to 46,146 patients in the MIMIC-III dataset. Using LSI, patients were ranked based on conceptual relevance to a set of keywords (lexicons) pertaining to 15 different SBDH categories. For Generative Pretrained Transformer (GPT) models, API requests were made with a Python script to connect to the OpenAI services in Azure, using gpt-3.5-turbo-1106 and gpt-4-1106-preview models. Prediction of SBDH categories were performed using a logistic regression model that included age, gender, race and SBDH ICD-9 codes.

Results: LSI retrieved patients according to 15 SBDH domains, with an overall average PPV $\geq$ 83%. Using manually curated gold standard (GS) sets for nine SBDH categories, the macro-F1 score of LSI (0.74) was better than ICD-9 (0.71) and GPT-3.5 (0.54), but lower than GPT-4 (0.80). Due to document size limitations, only a subset of the GS cases could be processed by GPT-3.5 (55.8%) and GPT-4 (94.2%), compared to LSI (100%). Using common GS subsets for nine different SBDH categories, the macro-F1 of ICD-9 combined with either LSI (mean 0.88, 95% CI 0.82-0.93), GPT-3.5 (0.86, 0.82-0.91) or GPT-4 (0.88, 0.83-0.94) was not significantly different. After including age, gender, race and ICD-9 in a logistic regression model, the AUC for prediction of six out of the nine SBDH categories was higher for LSI compared to GPT-4.0.

Conclusions: These results demonstrate that the LSI approach performs comparable to more recent large language models, such as GPT-3.5 and GPT-4.0, when using the same set of documents. Importantly, LSI is robust, deterministic, and does not have document-size limitations or cost implications, which make it more amenable to real-world applications in health systems.

查看原文本刊更多论文

从临床笔记中大规模识别健康的社会和行为决定因素：潜在语义索引和生成式预训练转换器 (GPT) 模型的比较。

背景：健康的社会和行为决定因素（SBDH）与各种健康和使用结果相关，但这些因素并未在电子健康记录（EHR）的结构化字段中得到常规记录。本研究的目的是评估不同的机器学习方法，以便从电子病历中的非结构化临床笔记中检测出 SBDH：方法：对 MIMIC-III 数据集中 46,146 名患者的 2,083,180 份临床笔记应用潜语义索引（LSI）。使用 LSI，根据与 15 个不同的 SBDH 类别相关的一组关键词（词库）的概念相关性对患者进行排序。对于生成式预训练转换器（GPT）模型，使用 gpt-3.5-turbo-1106 和 gpt-4-1106-preview 模型，通过 Python 脚本连接到 Azure 中的 OpenAI 服务，提出 API 请求。使用包含年龄、性别、种族和 SBDH ICD-9 代码的逻辑回归模型对 SBDH 类别进行预测：结果：LSI 可根据 15 个 SBDH 领域检索患者，总体平均 PPV ≥ 83%。使用人工策划的九个 SBDH 类别的金标准（GS）集，LSI 的宏观-F1 得分（0.74）优于 ICD-9（0.71）和 GPT-3.5（0.54），但低于 GPT-4（0.80）。由于文档大小的限制，与 LSI（100%）相比，GPT-3.5（55.8%）和 GPT-4 （94.2%）只能处理部分 GS 病例。使用九种不同的 SBDH 类别的通用 GS 子集，ICD-9 与 LSI（平均值 0.88，95% CI 0.82-0.93）、GPT-3.5（0.86，0.82-0.91）或 GPT-4 （0.88，0.83-0.94）相结合的宏 F1 没有显著差异。将年龄、性别、种族和 ICD-9 纳入逻辑回归模型后，与 GPT-4.0 相比，LSI 预测 9 个 SBDH 类别中 6 个类别的 AUC 更高：这些结果表明，在使用相同的文档集时，LSI 方法的性能可与 GPT-3.5 和 GPT-4.0 等最新的大型语言模型相媲美。重要的是，LSI 具有稳健性、确定性，而且没有文档大小的限制或成本影响，因此更适合在医疗系统中实际应用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ACS Applied Electronic Materials Multiple-

CiteScore

7.20

自引率

4.30%

发文量

567