LLM-AIx: An open source pipeline for Information Extraction from unstructured medical text based on privacy preserving Large Language Models

medRxiv - Health Informatics Pub Date : 2024-09-03 DOI:10.1101/2024.09.02.24312917

Isabella Catharina Wiest, Fabian Wolf, Marie-Elisabeth Leßmann, Marko van Treeck, Dyke Ferber, Jiefu Zhu, Heiko Boehme, Keno K. Bressem, Hannes Ulrich, Matthias P. Ebert, Jakob Nikolas Kather

{"title":"LLM-AIx: An open source pipeline for Information Extraction from unstructured medical text based on privacy preserving Large Language Models","authors":"Isabella Catharina Wiest, Fabian Wolf, Marie-Elisabeth Leßmann, Marko van Treeck, Dyke Ferber, Jiefu Zhu, Heiko Boehme, Keno K. Bressem, Hannes Ulrich, Matthias P. Ebert, Jakob Nikolas Kather","doi":"10.1101/2024.09.02.24312917","DOIUrl":null,"url":null,"abstract":"In clinical science and practice, text data, such as clinical letters or procedure reports, is stored in an unstructured way. This type of data is not a quantifiable resource for any kind of quantitative investigations and any manual review or structured information retrieval is time-consuming and costly. The capabilities of Large Language Models (LLMs) mark a paradigm shift in natural language processing and offer new possibilities for structured Information Extraction (IE) from medical free text. This protocol describes a workflow for LLM based information extraction (LLM-AIx), enabling extraction of predefined entities from unstructured text using privacy preserving LLMs. By converting unstructured clinical text into structured data, LLM-AIx addresses a critical barrier in clinical research and practice, where the efficient extraction of information is essential for improving clinical decision-making, enhancing patient outcomes, and facilitating large-scale data analysis.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":"256 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"medRxiv - Health Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.09.02.24312917","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

In clinical science and practice, text data, such as clinical letters or procedure reports, is stored in an unstructured way. This type of data is not a quantifiable resource for any kind of quantitative investigations and any manual review or structured information retrieval is time-consuming and costly. The capabilities of Large Language Models (LLMs) mark a paradigm shift in natural language processing and offer new possibilities for structured Information Extraction (IE) from medical free text. This protocol describes a workflow for LLM based information extraction (LLM-AIx), enabling extraction of predefined entities from unstructured text using privacy preserving LLMs. By converting unstructured clinical text into structured data, LLM-AIx addresses a critical barrier in clinical research and practice, where the efficient extraction of information is essential for improving clinical decision-making, enhancing patient outcomes, and facilitating large-scale data analysis.

查看原文本刊更多论文

LLM-AIx：基于隐私保护大语言模型的非结构化医学文本信息提取开源管道

在临床科学和实践中，临床信件或手术报告等文本数据是以非结构化方式存储的。这类数据对于任何类型的定量研究来说都不是可量化的资源，而且任何人工审核或结构化信息检索都非常耗时和昂贵。大语言模型（LLM）的功能标志着自然语言处理的范式转变，为从医学自由文本中进行结构化信息提取（IE）提供了新的可能性。本协议描述了基于 LLM 的信息提取（LLM-AIx）工作流程，利用保护隐私的 LLM 从非结构化文本中提取预定义实体。通过将非结构化临床文本转换为结构化数据，LLM-AIx 解决了临床研究和实践中的一个关键障碍，即有效提取信息对于改善临床决策、提高患者疗效和促进大规模数据分析至关重要。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

medRxiv - Health Informatics

自引率

0.00%

发文量