Identifying family structures from obituaries and matching them to patients in an electronic heath record.

IF 4.6 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS
John Mayer, Brooke Delgoffe, Scott Hebbring
{"title":"Identifying family structures from obituaries and matching them to patients in an electronic heath record.","authors":"John Mayer, Brooke Delgoffe, Scott Hebbring","doi":"10.1093/jamia/ocaf102","DOIUrl":null,"url":null,"abstract":"<p><strong>Objectives: </strong>Family data are a valuable data source in bioinformatic research. This is because family members often share common genetic and environmental exposures. Collecting this family data is traditionally very labor intensive but advances in electronic health record (EHR) data mining has proven useful when identifying pedigrees linked to longitudinal health histories. These are called e-pedigrees. Unfortunately, e-pedigrees tend to miss the oldest patients who inherently have the longest and richest health histories. A good source of family data from older generations includes obituaries, as they have a formulaic nature making them a good candidate for natural language processing (NLP) that can extract relationships to the decedent. While there have been several studies on obtaining such data from obituaries, we demonstrate for the first time approaches that tie that information to an EHR.</p><p><strong>Methods: </strong>Natural language processing extraction resulted in 8 166 534 family members being abstracted from 567 279 obituaries published in the state of Wisconsin. After matching decedent and family members to patients in the EHR, we identified 200 033 unique patients that were put in 53 640 pedigrees.</p><p><strong>Results: </strong>The largest pedigree consisted of 21 individuals. Heritability of adult height was quantified (H2=0.51±0.04, P<1.00e-07) demonstrating these data's use in genetic research. The heritability data, coupled with overlapping data in a biobank, suggested 80%-90% of familial relationships were accurately defined.</p><p><strong>Conclusion: </strong>The totality of these findings demonstrate obituaries with the oldest people in society can be highly informative for bioinformatic research.</p><p><strong>Availability and implementation: </strong>Code is available on GitHub at https://github.com/jgmayer672/ObituaryNLP.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.6000,"publicationDate":"2025-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the American Medical Informatics Association","FirstCategoryId":"91","ListUrlMain":"https://doi.org/10.1093/jamia/ocaf102","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Objectives: Family data are a valuable data source in bioinformatic research. This is because family members often share common genetic and environmental exposures. Collecting this family data is traditionally very labor intensive but advances in electronic health record (EHR) data mining has proven useful when identifying pedigrees linked to longitudinal health histories. These are called e-pedigrees. Unfortunately, e-pedigrees tend to miss the oldest patients who inherently have the longest and richest health histories. A good source of family data from older generations includes obituaries, as they have a formulaic nature making them a good candidate for natural language processing (NLP) that can extract relationships to the decedent. While there have been several studies on obtaining such data from obituaries, we demonstrate for the first time approaches that tie that information to an EHR.

Methods: Natural language processing extraction resulted in 8 166 534 family members being abstracted from 567 279 obituaries published in the state of Wisconsin. After matching decedent and family members to patients in the EHR, we identified 200 033 unique patients that were put in 53 640 pedigrees.

Results: The largest pedigree consisted of 21 individuals. Heritability of adult height was quantified (H2=0.51±0.04, P<1.00e-07) demonstrating these data's use in genetic research. The heritability data, coupled with overlapping data in a biobank, suggested 80%-90% of familial relationships were accurately defined.

Conclusion: The totality of these findings demonstrate obituaries with the oldest people in society can be highly informative for bioinformatic research.

Availability and implementation: Code is available on GitHub at https://github.com/jgmayer672/ObituaryNLP.

从讣告中识别家庭结构,并将其与电子健康记录中的患者进行匹配。
目的:家庭数据是生物信息学研究的重要数据来源。这是因为家庭成员通常有共同的遗传和环境暴露。收集这些家庭数据传统上是非常劳动密集型的,但电子健康记录(EHR)数据挖掘的进步已被证明在识别与纵向健康史相关的谱系时非常有用。这些被称为e-谱系。不幸的是,e-系谱往往会遗漏那些天生拥有最长和最丰富健康史的最年长的患者。来自老一代的家庭数据的一个很好的来源包括讣告,因为它们具有公式化的性质,使它们成为自然语言处理(NLP)的一个很好的候选者,可以提取与死者的关系。虽然已经有一些关于从讣告中获取此类数据的研究,但我们首次展示了将这些信息与电子病历联系起来的方法。方法:用自然语言处理方法从威斯康辛州发表的567 279份讣告中提取出8 166 534名家庭成员。在将死者和家庭成员与EHR中的患者进行匹配后,我们在53640个谱系中确定了20033个独特的患者。结果:最大的家系包括21个个体。成人身高遗传力量化(H2=0.51±0.04,p)。结论:老年人讣告对生物信息学研究具有重要的参考价值。可用性和实现:代码可在GitHub上获得https://github.com/jgmayer672/ObituaryNLP。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of the American Medical Informatics Association
Journal of the American Medical Informatics Association 医学-计算机:跨学科应用
CiteScore
14.50
自引率
7.80%
发文量
230
审稿时长
3-8 weeks
期刊介绍: JAMIA is AMIA''s premier peer-reviewed journal for biomedical and health informatics. Covering the full spectrum of activities in the field, JAMIA includes informatics articles in the areas of clinical care, clinical research, translational science, implementation science, imaging, education, consumer health, public health, and policy. JAMIA''s articles describe innovative informatics research and systems that help to advance biomedical science and to promote health. Case reports, perspectives and reviews also help readers stay connected with the most important informatics developments in implementation, policy and education.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信