病理报告中局部、区域和远处乳腺癌复发识别的自然语言处理。

IF 3 3区医学 Q2 ONCOLOGY

Breast Cancer Research and Treatment Pub Date : 2025-11-01 Epub Date: 2025-09-02 DOI:10.1007/s10549-025-07801-8

Jaimie J Lee, William Jettinghoff, Gregory Arbour, Andres Zepeda, Kathryn V Isaac, Raymond T Ng, Alan M Nichol

{"title":"病理报告中局部、区域和远处乳腺癌复发识别的自然语言处理。","authors":"Jaimie J Lee, William Jettinghoff, Gregory Arbour, Andres Zepeda, Kathryn V Isaac, Raymond T Ng, Alan M Nichol","doi":"10.1007/s10549-025-07801-8","DOIUrl":null,"url":null,"abstract":"Purpose: Cancer registries rarely track breast cancer relapse due to the resource-intensive nature of manual chart review. To address this gap, we developed natural language processing (NLP) models to automate the identification of breast cancer relapse in pathology reports.Methods: We collected pathology reports from patients diagnosed with breast cancer between January 1, 2005, and December 31, 2014, in British Columbia, Canada, and manually annotated each for the presence or absence of local, regional, distant, and any breast cancer relapses. With these reports, we fine-tuned large language models to classify pathology reports.Results: The corpus contained 1,888 pathology reports from a cohort of 993 breast cancer patients. Of these reports, 673 (35.6%) described local, 296 (15.7%) regional, and 654 (34.6%) distant relapses. In addition, 1,510 (80.0%) described at least one of any relapse type. The median time from diagnosis to first relapse was 7.3 years (range 0.2-18.2). All models demonstrated excellent performance. The local-relapse model performed particularly well, with > 93% accuracy, sensitivity, specificity, and 0.98 area under the receiver operating characteristic curve (AUC) score.Conclusion: We developed NLP models to detect breast cancer relapses from pathology reports with excellent accuracy, sensitivity, specificity, and AUC. NLP may facilitate more efficient and accurate collection of breast cancer outcomes data from clinical reports.","PeriodicalId":9133,"journal":{"name":"Breast Cancer Research and Treatment","volume":" ","pages":"149-158"},"PeriodicalIF":3.0000,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Natural language processing for local, regional, and distant breast cancer relapse identification in pathology reports.\",\"authors\":\"Jaimie J Lee, William Jettinghoff, Gregory Arbour, Andres Zepeda, Kathryn V Isaac, Raymond T Ng, Alan M Nichol\",\"doi\":\"10.1007/s10549-025-07801-8\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Purpose: Cancer registries rarely track breast cancer relapse due to the resource-intensive nature of manual chart review. To address this gap, we developed natural language processing (NLP) models to automate the identification of breast cancer relapse in pathology reports.Methods: We collected pathology reports from patients diagnosed with breast cancer between January 1, 2005, and December 31, 2014, in British Columbia, Canada, and manually annotated each for the presence or absence of local, regional, distant, and any breast cancer relapses. With these reports, we fine-tuned large language models to classify pathology reports.Results: The corpus contained 1,888 pathology reports from a cohort of 993 breast cancer patients. Of these reports, 673 (35.6%) described local, 296 (15.7%) regional, and 654 (34.6%) distant relapses. In addition, 1,510 (80.0%) described at least one of any relapse type. The median time from diagnosis to first relapse was 7.3 years (range 0.2-18.2). All models demonstrated excellent performance. The local-relapse model performed particularly well, with > 93% accuracy, sensitivity, specificity, and 0.98 area under the receiver operating characteristic curve (AUC) score.Conclusion: We developed NLP models to detect breast cancer relapses from pathology reports with excellent accuracy, sensitivity, specificity, and AUC. NLP may facilitate more efficient and accurate collection of breast cancer outcomes data from clinical reports.\",\"PeriodicalId\":9133,\"journal\":{\"name\":\"Breast Cancer Research and Treatment\",\"volume\":\" \",\"pages\":\"149-158\"},\"PeriodicalIF\":3.0000,\"publicationDate\":\"2025-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Breast Cancer Research and Treatment\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1007/s10549-025-07801-8\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/9/2 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q2\",\"JCRName\":\"ONCOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Breast Cancer Research and Treatment","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s10549-025-07801-8","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/9/2 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"ONCOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

目的：由于手工图表审查的资源密集性，癌症登记处很少追踪乳腺癌复发。为了解决这一差距，我们开发了自然语言处理（NLP）模型来自动识别病理报告中的乳腺癌复发。方法：我们收集2005年1月1日至2014年12月31日在加拿大不列颠哥伦比亚省诊断为乳腺癌的患者的病理报告，并手工注释每一份报告是否存在局部、区域、远处和任何乳腺癌复发。有了这些报告，我们微调了大型语言模型来分类病理报告。结果：该语料库包含来自993名乳腺癌患者队列的1888份病理报告。在这些报告中，673例（35.6%）为局部复发，296例（15.7%）为区域性复发，654例（34.6%）为远处复发。此外，1510例（80.0%）至少有一种复发类型。从诊断到首次复发的中位时间为7.3年（范围0.2-18.2年）。所有型号均表现出优异的性能。局部复发模型表现特别好，准确率、灵敏度、特异性为bb0.93%，受试者工作特征曲线（AUC）评分下面积为0.98。结论：我们建立了NLP模型，从病理报告中检测乳腺癌复发，具有良好的准确性、敏感性、特异性和AUC。NLP可能有助于从临床报告中更有效和准确地收集乳腺癌预后数据。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Natural language processing for local, regional, and distant breast cancer relapse identification in pathology reports.

Purpose: Cancer registries rarely track breast cancer relapse due to the resource-intensive nature of manual chart review. To address this gap, we developed natural language processing (NLP) models to automate the identification of breast cancer relapse in pathology reports.

Methods: We collected pathology reports from patients diagnosed with breast cancer between January 1, 2005, and December 31, 2014, in British Columbia, Canada, and manually annotated each for the presence or absence of local, regional, distant, and any breast cancer relapses. With these reports, we fine-tuned large language models to classify pathology reports.

Results: The corpus contained 1,888 pathology reports from a cohort of 993 breast cancer patients. Of these reports, 673 (35.6%) described local, 296 (15.7%) regional, and 654 (34.6%) distant relapses. In addition, 1,510 (80.0%) described at least one of any relapse type. The median time from diagnosis to first relapse was 7.3 years (range 0.2-18.2). All models demonstrated excellent performance. The local-relapse model performed particularly well, with > 93% accuracy, sensitivity, specificity, and 0.98 area under the receiver operating characteristic curve (AUC) score.

Conclusion: We developed NLP models to detect breast cancer relapses from pathology reports with excellent accuracy, sensitivity, specificity, and AUC. NLP may facilitate more efficient and accurate collection of breast cancer outcomes data from clinical reports.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Breast Cancer Research and Treatment 医学-肿瘤学

CiteScore

6.80

自引率

2.60%

发文量

342

审稿时长

1 months

期刊介绍： Breast Cancer Research and Treatment provides the surgeon, radiotherapist, medical oncologist, endocrinologist, epidemiologist, immunologist or cell biologist investigating problems in breast cancer a single forum for communication. The journal creates a "market place" for breast cancer topics which cuts across all the usual lines of disciplines, providing a site for presenting pertinent investigations, and for discussing critical questions relevant to the entire field. It seeks to develop a new focus and new perspectives for all those concerned with breast cancer.