Discovering social determinants of health from case reports using natural language processing: algorithmic development and validation

Shaina Raza, Elham Dolatabadi, Nancy Ondrusek, Laura Rosella, Brian Schwartz
{"title":"Discovering social determinants of health from case reports using natural language processing: algorithmic development and validation","authors":"Shaina Raza, Elham Dolatabadi, Nancy Ondrusek, Laura Rosella, Brian Schwartz","doi":"10.1186/s44247-023-00035-y","DOIUrl":null,"url":null,"abstract":"Abstract Background Social determinants of health are non-medical factors that influence health outcomes (SDOH). There is a wealth of SDOH information available in electronic health records, clinical reports, and social media data, usually in free text format. Extracting key information from free text poses a significant challenge and necessitates the use of natural language processing (NLP) techniques to extract key information. Objective The objective of this research is to advance the automatic extraction of SDOH from clinical texts. Setting and data The case reports of COVID-19 patients from the published literature are curated to create a corpus. A portion of the data is annotated by experts to create ground truth labels, and semi-supervised learning method is used for corpus re-annotation. Methods An NLP framework is developed and tested to extract SDOH from the free texts. A two-way evaluation method is used to assess the quantity and quality of the methods. Results The proposed NER implementation achieves an accuracy (F1-score) of 92.98% on our test set and generalizes well on benchmark data. A careful analysis of case examples demonstrates the superiority of the proposed approach in correctly classifying the named entities. Conclusions NLP can be used to extract key information, such as SDOH factors from free texts. A more accurate understanding of SDOH is needed to further improve healthcare outcomes.","PeriodicalId":72426,"journal":{"name":"BMC digital health","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC digital health","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1186/s44247-023-00035-y","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Abstract Background Social determinants of health are non-medical factors that influence health outcomes (SDOH). There is a wealth of SDOH information available in electronic health records, clinical reports, and social media data, usually in free text format. Extracting key information from free text poses a significant challenge and necessitates the use of natural language processing (NLP) techniques to extract key information. Objective The objective of this research is to advance the automatic extraction of SDOH from clinical texts. Setting and data The case reports of COVID-19 patients from the published literature are curated to create a corpus. A portion of the data is annotated by experts to create ground truth labels, and semi-supervised learning method is used for corpus re-annotation. Methods An NLP framework is developed and tested to extract SDOH from the free texts. A two-way evaluation method is used to assess the quantity and quality of the methods. Results The proposed NER implementation achieves an accuracy (F1-score) of 92.98% on our test set and generalizes well on benchmark data. A careful analysis of case examples demonstrates the superiority of the proposed approach in correctly classifying the named entities. Conclusions NLP can be used to extract key information, such as SDOH factors from free texts. A more accurate understanding of SDOH is needed to further improve healthcare outcomes.
使用自然语言处理从病例报告中发现健康的社会决定因素:算法开发和验证
健康的社会决定因素是影响健康结果(SDOH)的非医学因素。电子健康记录、临床报告和社交媒体数据中提供了丰富的SDOH信息,通常采用自由文本格式。从自由文本中提取关键信息是一个重大挑战,需要使用自然语言处理(NLP)技术来提取关键信息。目的推进临床文献中SDOH的自动提取。从已发表的文献中整理COVID-19患者的病例报告,创建一个语料库。由专家对部分数据进行标注,生成基础真值标签,采用半监督学习方法对语料库进行重新标注。方法开发了一个自然语言处理框架,并对其进行了测试。采用双向评价方法对方法的数量和质量进行评价。结果提出的NER实现在我们的测试集上达到了92.98%的准确率(f1分数),并且在基准数据上有很好的泛化。对实例的仔细分析证明了所提出的方法在正确分类命名实体方面的优越性。结论NLP可以从自由文本中提取关键信息,如SDOH因子。为了进一步改善医疗保健结果,需要更准确地了解SDOH。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信