Facilitating phenotyping from clinical texts: the medkit library.

Bioinformatics (Oxford, England) Pub Date : 2024-11-15 DOI:10.1093/bioinformatics/btae681

Antoine Neuraz, Ghislain Vaillant, Camila Arias, Olivier Birot, Kim-Tam Huynh, Thibaut Fabacher, Alice Rogier, Nicolas Garcelon, Ivan Lerner, Bastien Rance, Adrien Coulet

{"title":"Facilitating phenotyping from clinical texts: the medkit library.","authors":"Antoine Neuraz, Ghislain Vaillant, Camila Arias, Olivier Birot, Kim-Tam Huynh, Thibaut Fabacher, Alice Rogier, Nicolas Garcelon, Ivan Lerner, Bastien Rance, Adrien Coulet","doi":"10.1093/bioinformatics/btae681","DOIUrl":null,"url":null,"abstract":"Summary: Phenotyping consists in applying algorithms to identify individuals associated with a specific, potentially complex, trait or condition, typically out of a collection of Electronic Health Records (EHRs). Because a lot of the clinical information of EHRs are lying in texts, phenotyping from text takes an important role in studies that rely on the secondary use of EHRs. However, the heterogeneity and highly specialized aspect of both the content and form of clinical texts makes this task particularly tedious, and is the source of time and cost constraints in observational studies.Results: To facilitate the development, evaluation and reproducibility of phenotyping pipelines, we developed an open-source Python library named medkit. It enables composing data processing pipelines made of easy-to-reuse software bricks, named medkit operations. In addition to the core of the library, we share the operations and pipelines we already developed and invite the phenotyping community for their reuse and enrichment.Availability and implementation: medkit is available at https://github.com/medkit-lib/medkit.Supplementary information: Documentation, examples and tutorials are available at https://medkit-lib.org/.","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics (Oxford, England)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioinformatics/btae681","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Summary: Phenotyping consists in applying algorithms to identify individuals associated with a specific, potentially complex, trait or condition, typically out of a collection of Electronic Health Records (EHRs). Because a lot of the clinical information of EHRs are lying in texts, phenotyping from text takes an important role in studies that rely on the secondary use of EHRs. However, the heterogeneity and highly specialized aspect of both the content and form of clinical texts makes this task particularly tedious, and is the source of time and cost constraints in observational studies.

Results: To facilitate the development, evaluation and reproducibility of phenotyping pipelines, we developed an open-source Python library named medkit. It enables composing data processing pipelines made of easy-to-reuse software bricks, named medkit operations. In addition to the core of the library, we share the operations and pipelines we already developed and invite the phenotyping community for their reuse and enrichment.

Availability and implementation: medkit is available at https://github.com/medkit-lib/medkit.

Supplementary information: Documentation, examples and tutorials are available at https://medkit-lib.org/.

查看原文本刊更多论文

促进从临床文本中进行表型分析：medkit 库。

摘要：表型分析包括应用算法来识别与特定、可能复杂的性状或病症相关的个体，通常是从电子健康记录（EHR）集合中识别出来的。由于电子健康记录中的大量临床信息都是文本信息，因此在依赖电子健康记录二次使用的研究中，从文本中进行表型分析起着重要作用。然而，临床文本的内容和形式都具有异质性和高度专业性，这使得这项工作特别繁琐，也是观察性研究中时间和成本限制的根源：为了促进表型分析管道的开发、评估和可重复性，我们开发了一个名为 medkit 的开源 Python 库。该库由易于重用的软件砖组成，名为 medkit 操作。除了库的核心部分，我们还分享了已经开发的操作和管道，并邀请表型分析社区重用和丰富这些操作和管道：文档、示例和教程请访问 https://medkit-lib.org/。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Bioinformatics (Oxford, England)

自引率

0.00%

发文量