15种自然语言处理算法的鲁棒性比较评估，可从二级保健记录中积极识别炎症性肠病患者。

IF 2.9 Q2 GASTROENTEROLOGY & HEPATOLOGY

BMJ Open Gastroenterology Pub Date : 2025-10-10 DOI:10.1136/bmjgast-2025-001977

Matt Stammers, Markus Gwiggner, Reza Nouraei, Cheryl Metcalf, James Batchelor

{"title":"15种自然语言处理算法的鲁棒性比较评估，可从二级保健记录中积极识别炎症性肠病患者。","authors":"Matt Stammers, Markus Gwiggner, Reza Nouraei, Cheryl Metcalf, James Batchelor","doi":"10.1136/bmjgast-2025-001977","DOIUrl":null,"url":null,"abstract":"Objective: Natural language processing (NLP) can identify cohorts of patients with inflammatory bowel disease (IBD) from free text. However, limited sharing of code, models, and data sets continues to hinder progress. The aim of this study was to evaluate multiple open-source NLP models for identifying IBD cohorts, reporting on document-to-patient-level classification, while exploring explainability, generalisability, fairness and cost.Methods: 15 algorithms were assessed, covering all types of NLP spanning over 50 years of NLP development. Rule-based (regular expressions, spaCy with negation), and vector-based (bag-of-words (BoW), term frequency inverse document frequency (TF IDF), word-2-vector), to transformers: (two sentence-based sBERT models, three bidirectional encoder representations from transformers (BERT) models (distilBERT, BioclinicalBERT, RoBERTa), and five large language models (LLMs): (Mistral-Instruct-v0.3-7B, M42-Health/Llama-v3-8B, Deepseek-R1-Distill-Qwen-v2.5-32B, Qwen-v3-32B, and Deepseek-R1-Distill-Llama-v3-70B). Models were comparatively evaluated based on full confusion matrices, time/environmental costs, fairness, and explainability.Results: A total of 9311 labelled documents were evaluated. The fine-tuned DistilBERT_IBD model achieved the best performance overall (micro F1: 93.54%), followed by sBERT-Base (micro F1: 93.05%); however, specificity was an issue for both: (67.80-64.41%) respectively. LLMs performed well, given that they had never seen the training data (micro F1: 86.47-92.20%), but were comparatively slow (18-300 hours) and expensive. Bias was a significant issue for all effective model types.Conclusion: NLP has undergone significant advancements over the last 50 years. LLMs appear likely to solve the problem of re-identifying patients with IBD from clinical free text sources in the future. Once cost, performance and bias issues are addressed, they and their successors are likely to become the primary method of data retrieval for clinical data warehousing.","PeriodicalId":9235,"journal":{"name":"BMJ Open Gastroenterology","volume":"12 1","pages":""},"PeriodicalIF":2.9000,"publicationDate":"2025-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Robust comparative evaluation of 15 natural language processing algorithms to positively identify patients with inflammatory bowel disease from secondary care records.\",\"authors\":\"Matt Stammers, Markus Gwiggner, Reza Nouraei, Cheryl Metcalf, James Batchelor\",\"doi\":\"10.1136/bmjgast-2025-001977\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Objective: Natural language processing (NLP) can identify cohorts of patients with inflammatory bowel disease (IBD) from free text. However, limited sharing of code, models, and data sets continues to hinder progress. The aim of this study was to evaluate multiple open-source NLP models for identifying IBD cohorts, reporting on document-to-patient-level classification, while exploring explainability, generalisability, fairness and cost.Methods: 15 algorithms were assessed, covering all types of NLP spanning over 50 years of NLP development. Rule-based (regular expressions, spaCy with negation), and vector-based (bag-of-words (BoW), term frequency inverse document frequency (TF IDF), word-2-vector), to transformers: (two sentence-based sBERT models, three bidirectional encoder representations from transformers (BERT) models (distilBERT, BioclinicalBERT, RoBERTa), and five large language models (LLMs): (Mistral-Instruct-v0.3-7B, M42-Health/Llama-v3-8B, Deepseek-R1-Distill-Qwen-v2.5-32B, Qwen-v3-32B, and Deepseek-R1-Distill-Llama-v3-70B). Models were comparatively evaluated based on full confusion matrices, time/environmental costs, fairness, and explainability.Results: A total of 9311 labelled documents were evaluated. The fine-tuned DistilBERT_IBD model achieved the best performance overall (micro F1: 93.54%), followed by sBERT-Base (micro F1: 93.05%); however, specificity was an issue for both: (67.80-64.41%) respectively. LLMs performed well, given that they had never seen the training data (micro F1: 86.47-92.20%), but were comparatively slow (18-300 hours) and expensive. Bias was a significant issue for all effective model types.Conclusion: NLP has undergone significant advancements over the last 50 years. LLMs appear likely to solve the problem of re-identifying patients with IBD from clinical free text sources in the future. Once cost, performance and bias issues are addressed, they and their successors are likely to become the primary method of data retrieval for clinical data warehousing.\",\"PeriodicalId\":9235,\"journal\":{\"name\":\"BMJ Open Gastroenterology\",\"volume\":\"12 1\",\"pages\":\"\"},\"PeriodicalIF\":2.9000,\"publicationDate\":\"2025-10-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BMJ Open Gastroenterology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1136/bmjgast-2025-001977\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"GASTROENTEROLOGY & HEPATOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMJ Open Gastroenterology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1136/bmjgast-2025-001977","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"GASTROENTEROLOGY & HEPATOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

目的：自然语言处理（NLP）可以从自由文本中识别炎症性肠病（IBD）患者队列。然而，有限的代码、模型和数据集共享继续阻碍着进展。本研究的目的是评估用于识别IBD队列的多个开源NLP模型，报告文件到患者级别的分类，同时探索可解释性、通用性、公平性和成本。方法：对15种算法进行评估，涵盖了50多年来NLP发展的所有类型。基于规则的（正则表达式，带否定的空间）和基于向量的（词袋（BoW），术语频率逆文档频率（TF IDF），单词-2向量），到转换器：(两个基于句子的sBERT模型，三个来自转换器（BERT）模型的双向编码器表示（蒸馏器，BioclinicalBERT, RoBERTa）和五个大型语言模型（llm）：（mistral - instruction -v0.3- 7b, M42-Health/Llama-v3-8B, Deepseek-R1-Distill-Qwen-v2.5-32B， Qwen-v3-32B和Deepseek-R1-Distill-Llama-v3-70B）。基于完全混淆矩阵、时间/环境成本、公平性和可解释性对模型进行比较评估。结果：共评价了9311份标签文件。精调后的DistilBERT_IBD模型整体性能最佳（微F1值为93.54%），其次是sBERT-Base模型（微F1值为93.05%）；然而，特异性是两个问题：（67.80-64.41%）。llm表现良好，考虑到他们从未见过训练数据（微F1: 86.47-92.20%），但速度相对较慢（18-300小时）且成本较高。偏差是所有有效模型类型的重要问题。结论：NLP在过去的50年里取得了显著的进步。法学硕士似乎有可能解决将来从临床免费文本来源中重新识别IBD患者的问题。一旦成本、性能和偏差问题得到解决，它们和它们的后继者很可能成为临床数据仓库数据检索的主要方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Robust comparative evaluation of 15 natural language processing algorithms to positively identify patients with inflammatory bowel disease from secondary care records.

Objective: Natural language processing (NLP) can identify cohorts of patients with inflammatory bowel disease (IBD) from free text. However, limited sharing of code, models, and data sets continues to hinder progress. The aim of this study was to evaluate multiple open-source NLP models for identifying IBD cohorts, reporting on document-to-patient-level classification, while exploring explainability, generalisability, fairness and cost.

Methods: 15 algorithms were assessed, covering all types of NLP spanning over 50 years of NLP development. Rule-based (regular expressions, spaCy with negation), and vector-based (bag-of-words (BoW), term frequency inverse document frequency (TF IDF), word-2-vector), to transformers: (two sentence-based sBERT models, three bidirectional encoder representations from transformers (BERT) models (distilBERT, BioclinicalBERT, RoBERTa), and five large language models (LLMs): (Mistral-Instruct-v0.3-7B, M42-Health/Llama-v3-8B, Deepseek-R1-Distill-Qwen-v2.5-32B, Qwen-v3-32B, and Deepseek-R1-Distill-Llama-v3-70B). Models were comparatively evaluated based on full confusion matrices, time/environmental costs, fairness, and explainability.

Results: A total of 9311 labelled documents were evaluated. The fine-tuned DistilBERT_IBD model achieved the best performance overall (micro F1: 93.54%), followed by sBERT-Base (micro F1: 93.05%); however, specificity was an issue for both: (67.80-64.41%) respectively. LLMs performed well, given that they had never seen the training data (micro F1: 86.47-92.20%), but were comparatively slow (18-300 hours) and expensive. Bias was a significant issue for all effective model types.

Conclusion: NLP has undergone significant advancements over the last 50 years. LLMs appear likely to solve the problem of re-identifying patients with IBD from clinical free text sources in the future. Once cost, performance and bias issues are addressed, they and their successors are likely to become the primary method of data retrieval for clinical data warehousing.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

BMJ Open Gastroenterology GASTROENTEROLOGY & HEPATOLOGY-

CiteScore

5.90

自引率

3.20%

发文量

审稿时长

2 weeks

期刊介绍： BMJ Open Gastroenterology is an online-only, peer-reviewed, open access gastroenterology journal, dedicated to publishing high-quality medical research from all disciplines and therapeutic areas of gastroenterology. It is the open access companion journal of Gut and is co-owned by the British Society of Gastroenterology. The journal publishes all research study types, from study protocols to phase I trials to meta-analyses, including small or specialist studies. Publishing procedures are built around continuous publication, publishing research online as soon as the article is ready.