Jaimie J Lee, William Jettinghoff, Gregory Arbour, Andres Zepeda, Kathryn V Isaac, Raymond T Ng, Alan M Nichol
{"title":"病理报告中局部、区域和远处乳腺癌复发识别的自然语言处理。","authors":"Jaimie J Lee, William Jettinghoff, Gregory Arbour, Andres Zepeda, Kathryn V Isaac, Raymond T Ng, Alan M Nichol","doi":"10.1007/s10549-025-07801-8","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>Cancer registries rarely track breast cancer relapse due to the resource-intensive nature of manual chart review. To address this gap, we developed natural language processing (NLP) models to automate the identification of breast cancer relapse in pathology reports.</p><p><strong>Methods: </strong>We collected pathology reports from patients diagnosed with breast cancer between January 1, 2005, and December 31, 2014, in British Columbia, Canada, and manually annotated each for the presence or absence of local, regional, distant, and any breast cancer relapses. With these reports, we fine-tuned large language models to classify pathology reports.</p><p><strong>Results: </strong>The corpus contained 1,888 pathology reports from a cohort of 993 breast cancer patients. Of these reports, 673 (35.6%) described local, 296 (15.7%) regional, and 654 (34.6%) distant relapses. In addition, 1,510 (80.0%) described at least one of any relapse type. The median time from diagnosis to first relapse was 7.3 years (range 0.2-18.2). All models demonstrated excellent performance. The local-relapse model performed particularly well, with > 93% accuracy, sensitivity, specificity, and 0.98 area under the receiver operating characteristic curve (AUC) score.</p><p><strong>Conclusion: </strong>We developed NLP models to detect breast cancer relapses from pathology reports with excellent accuracy, sensitivity, specificity, and AUC. NLP may facilitate more efficient and accurate collection of breast cancer outcomes data from clinical reports.</p>","PeriodicalId":9133,"journal":{"name":"Breast Cancer Research and Treatment","volume":" ","pages":"149-158"},"PeriodicalIF":3.0000,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Natural language processing for local, regional, and distant breast cancer relapse identification in pathology reports.\",\"authors\":\"Jaimie J Lee, William Jettinghoff, Gregory Arbour, Andres Zepeda, Kathryn V Isaac, Raymond T Ng, Alan M Nichol\",\"doi\":\"10.1007/s10549-025-07801-8\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Purpose: </strong>Cancer registries rarely track breast cancer relapse due to the resource-intensive nature of manual chart review. To address this gap, we developed natural language processing (NLP) models to automate the identification of breast cancer relapse in pathology reports.</p><p><strong>Methods: </strong>We collected pathology reports from patients diagnosed with breast cancer between January 1, 2005, and December 31, 2014, in British Columbia, Canada, and manually annotated each for the presence or absence of local, regional, distant, and any breast cancer relapses. With these reports, we fine-tuned large language models to classify pathology reports.</p><p><strong>Results: </strong>The corpus contained 1,888 pathology reports from a cohort of 993 breast cancer patients. Of these reports, 673 (35.6%) described local, 296 (15.7%) regional, and 654 (34.6%) distant relapses. In addition, 1,510 (80.0%) described at least one of any relapse type. The median time from diagnosis to first relapse was 7.3 years (range 0.2-18.2). All models demonstrated excellent performance. The local-relapse model performed particularly well, with > 93% accuracy, sensitivity, specificity, and 0.98 area under the receiver operating characteristic curve (AUC) score.</p><p><strong>Conclusion: </strong>We developed NLP models to detect breast cancer relapses from pathology reports with excellent accuracy, sensitivity, specificity, and AUC. NLP may facilitate more efficient and accurate collection of breast cancer outcomes data from clinical reports.</p>\",\"PeriodicalId\":9133,\"journal\":{\"name\":\"Breast Cancer Research and Treatment\",\"volume\":\" \",\"pages\":\"149-158\"},\"PeriodicalIF\":3.0000,\"publicationDate\":\"2025-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Breast Cancer Research and Treatment\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1007/s10549-025-07801-8\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/9/2 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q2\",\"JCRName\":\"ONCOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Breast Cancer Research and Treatment","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s10549-025-07801-8","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/9/2 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"ONCOLOGY","Score":null,"Total":0}
Natural language processing for local, regional, and distant breast cancer relapse identification in pathology reports.
Purpose: Cancer registries rarely track breast cancer relapse due to the resource-intensive nature of manual chart review. To address this gap, we developed natural language processing (NLP) models to automate the identification of breast cancer relapse in pathology reports.
Methods: We collected pathology reports from patients diagnosed with breast cancer between January 1, 2005, and December 31, 2014, in British Columbia, Canada, and manually annotated each for the presence or absence of local, regional, distant, and any breast cancer relapses. With these reports, we fine-tuned large language models to classify pathology reports.
Results: The corpus contained 1,888 pathology reports from a cohort of 993 breast cancer patients. Of these reports, 673 (35.6%) described local, 296 (15.7%) regional, and 654 (34.6%) distant relapses. In addition, 1,510 (80.0%) described at least one of any relapse type. The median time from diagnosis to first relapse was 7.3 years (range 0.2-18.2). All models demonstrated excellent performance. The local-relapse model performed particularly well, with > 93% accuracy, sensitivity, specificity, and 0.98 area under the receiver operating characteristic curve (AUC) score.
Conclusion: We developed NLP models to detect breast cancer relapses from pathology reports with excellent accuracy, sensitivity, specificity, and AUC. NLP may facilitate more efficient and accurate collection of breast cancer outcomes data from clinical reports.
期刊介绍:
Breast Cancer Research and Treatment provides the surgeon, radiotherapist, medical oncologist, endocrinologist, epidemiologist, immunologist or cell biologist investigating problems in breast cancer a single forum for communication. The journal creates a "market place" for breast cancer topics which cuts across all the usual lines of disciplines, providing a site for presenting pertinent investigations, and for discussing critical questions relevant to the entire field. It seeks to develop a new focus and new perspectives for all those concerned with breast cancer.