Development of a Natural Language Processing Model for Extracting Kidney Biopsy Pathology Diagnoses.

IF 3.4 Q1 UROLOGY & NEPHROLOGY

Kidney Medicine Pub Date : 2025-06-14 eCollection Date: 2025-08-01 DOI:10.1016/j.xkme.2025.101047

Shane A Bobart, Enshuo Hsu, Thomas Potter, Luan Truong, Amy Waterman, Stephen Jones, Tariq Shafi

{"title":"Development of a Natural Language Processing Model for Extracting Kidney Biopsy Pathology Diagnoses.","authors":"Shane A Bobart, Enshuo Hsu, Thomas Potter, Luan Truong, Amy Waterman, Stephen Jones, Tariq Shafi","doi":"10.1016/j.xkme.2025.101047","DOIUrl":null,"url":null,"abstract":"Rationale & objective: Kidney biopsy reports are in a nonindexed text format, and the diagnosis requires labor-intensive manual abstraction. Natural language processing (NLP) has not been rigorously tested for kidney biopsy diagnosis extraction. Our objective was to develop an accurate model to extract the biopsy diagnosis from free-text reports.Study design: Text classification using NLP.Setting & participants: 2,666 patients with 3,042 native kidney biopsy reports in the Portable Document Format, from June 2016 to December 2023.Predictor: Kidney biopsy diagnosis.Outcomes: The performance of the NLP algorithm for all and the 20 most common diagnoses based on precision, recall, F1 score, and area under the receiver operating curve (AUROC).Analytical approach: A domain expert manually abstracted the diagnosis, and a renal pathologist validated a random subset (n = 200). Structured Query Language server and Python processed reports into machine-readable free text. We used PubMed Bidirectional Encoder Representations from Transformers to develop our NLP algorithm. We randomly split the reports into training (80%; n = 2,434) and testing (20%; n = 608) sets to train the NLP system. We further divided the testing set into 20% validation and 80% fine-tuning sets.Results: The median age was 57 years, with 50% female, 29% African Americans, and 23% Hispanic participants. The 5 most frequent glomerular diagnoses were diabetic kidney disease (23.7%), focal segmental glomerulosclerosis (15.5%), lupus nephritis (9.7%), immunoglobulin A nephropathy (8.9), and membranous nephropathy (7.2%). The Cohen kappa coefficient for interrater reliability was 0.76. PubMed Bidirectional Encoder Representations from Transformers fine-tuned with a training set showed the average AUROC for NLP performance in the testing set of 0.95 across all diagnoses with an F1 score of 0.57. For the 20 most common diagnoses, the AUROC was 0.97 with an F1 score of 0.72. Limitations: Single centered; sample size and use limited to research purposes.Conclusions: We demonstrate an accurate and scalable NLP system to extract the primary diagnosis from free-text kidney biopsy reports, which can facilitate epidemiologic studies and identify patients for clinical trial recruitment.","PeriodicalId":17885,"journal":{"name":"Kidney Medicine","volume":"7 8","pages":"101047"},"PeriodicalIF":3.4000,"publicationDate":"2025-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12311501/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Kidney Medicine","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1016/j.xkme.2025.101047","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/8/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"UROLOGY & NEPHROLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Rationale & objective: Kidney biopsy reports are in a nonindexed text format, and the diagnosis requires labor-intensive manual abstraction. Natural language processing (NLP) has not been rigorously tested for kidney biopsy diagnosis extraction. Our objective was to develop an accurate model to extract the biopsy diagnosis from free-text reports.

Study design: Text classification using NLP.

Setting & participants: 2,666 patients with 3,042 native kidney biopsy reports in the Portable Document Format, from June 2016 to December 2023.

Predictor: Kidney biopsy diagnosis.

Outcomes: The performance of the NLP algorithm for all and the 20 most common diagnoses based on precision, recall, F1 score, and area under the receiver operating curve (AUROC).

Analytical approach: A domain expert manually abstracted the diagnosis, and a renal pathologist validated a random subset (n = 200). Structured Query Language server and Python processed reports into machine-readable free text. We used PubMed Bidirectional Encoder Representations from Transformers to develop our NLP algorithm. We randomly split the reports into training (80%; n = 2,434) and testing (20%; n = 608) sets to train the NLP system. We further divided the testing set into 20% validation and 80% fine-tuning sets.

Results: The median age was 57 years, with 50% female, 29% African Americans, and 23% Hispanic participants. The 5 most frequent glomerular diagnoses were diabetic kidney disease (23.7%), focal segmental glomerulosclerosis (15.5%), lupus nephritis (9.7%), immunoglobulin A nephropathy (8.9), and membranous nephropathy (7.2%). The Cohen kappa coefficient for interrater reliability was 0.76. PubMed Bidirectional Encoder Representations from Transformers fine-tuned with a training set showed the average AUROC for NLP performance in the testing set of 0.95 across all diagnoses with an F1 score of 0.57. For the 20 most common diagnoses, the AUROC was 0.97 with an F1 score of 0.72. Limitations: Single centered; sample size and use limited to research purposes.

Conclusions: We demonstrate an accurate and scalable NLP system to extract the primary diagnosis from free-text kidney biopsy reports, which can facilitate epidemiologic studies and identify patients for clinical trial recruitment.

Abstract Image

查看原文本刊更多论文

肾活检病理诊断提取的自然语言处理模型的建立。

理由与目的：肾活检报告是一种非索引文本格式，诊断需要劳动密集型的人工抽象。自然语言处理（NLP）在肾活检诊断提取方面还没有经过严格的测试。我们的目标是建立一个准确的模型，从自由文本报告中提取活检诊断。研究设计：使用自然语言处理进行文本分类。环境和参与者：2016年6月至2023年12月，2666名患者，3042份便携式文件格式的本地肾活检报告。预测指标：肾活检诊断。结果：NLP算法对所有和20种最常见诊断的性能基于精度、召回率、F1评分和接受者工作曲线下面积（AUROC）。分析方法：领域专家手动提取诊断，肾脏病理学家验证随机子集（n = 200）。结构化查询语言服务器和Python将报告处理成机器可读的自由文本。我们使用PubMed的《变形金刚》双向编码器表示来开发我们的NLP算法。我们将报告随机分成训练组(80%；N = 2434)和测试(20%；n = 608)集来训练NLP系统。我们进一步将测试集分为20%的验证集和80%的微调集。结果：中位年龄为57岁，女性占50%，非裔美国人占29%，西班牙裔美国人占23%。最常见的5种肾小球诊断是糖尿病肾病（23.7%）、局灶节段性肾小球硬化（15.5%）、狼疮性肾炎（9.7%）、免疫球蛋白A肾病（8.9）和膜性肾病（7.2%）。解释者信度的Cohen kappa系数为0.76。PubMed双向编码器表示从一个训练集微调变形变压器显示平均AUROC在测试集中的NLP性能为0.95，所有诊断F1得分为0.57。对于20种最常见的诊断，AUROC为0.97，F1评分为0.72。局限性：单中心；样本量和使用仅限于研究目的。结论：我们展示了一个准确且可扩展的NLP系统，可以从自由文本肾活检报告中提取初步诊断，这可以促进流行病学研究并确定临床试验招募的患者。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊