Path-BigBird:一种人工智能驱动的癌症病理报告分类变换器方法。

IF 3.3 Q2 ONCOLOGY
Mayanka Chandrashekar, Isaac Lyngaas, Heidi A Hanson, Shang Gao, Xiao-Cheng Wu, John Gounley
{"title":"Path-BigBird:一种人工智能驱动的癌症病理报告分类变换器方法。","authors":"Mayanka Chandrashekar, Isaac Lyngaas, Heidi A Hanson, Shang Gao, Xiao-Cheng Wu, John Gounley","doi":"10.1200/CCI.23.00148","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>Surgical pathology reports are critical for cancer diagnosis and management. To accurately extract information about tumor characteristics from pathology reports in near real time, we explore the impact of using domain-specific transformer models that understand cancer pathology reports.</p><p><strong>Methods: </strong>We built a pathology transformer model, Path-BigBird, by using 2.7 million pathology reports from six SEER cancer registries. We then compare different variations of Path-BigBird with two less computationally intensive methods: Hierarchical Self-Attention Network (HiSAN) classification model and an off-the-shelf clinical transformer model (Clinical BigBird). We use five pathology information extraction tasks for evaluation: site, subsite, laterality, histology, and behavior. Model performance is evaluated by using macro and micro <i>F</i><sub>1</sub> scores.</p><p><strong>Results: </strong>We found that Path-BigBird and Clinical BigBird outperformed the HiSAN in all tasks. Clinical BigBird performed better on the <i>site</i> and <i>laterality</i> tasks. Versions of the Path-BigBird model performed best on the two most difficult tasks: <i>subsite</i> (micro <i>F</i><sub>1</sub> score of 72.53, macro <i>F</i><sub>1</sub> score of 35.76) and <i>histology</i> (micro <i>F</i><sub>1</sub> score of 80.96, macro <i>F</i><sub>1</sub> score of 37.94). The largest performance gains over the HiSAN model were for <i>histology</i>, for which a Path-BigBird model increased the micro <i>F</i><sub>1</sub> score by 1.44 points and the macro <i>F</i><sub>1</sub> score by 3.55 points. Overall, the results suggest that a Path-BigBird model with a vocabulary derived from well-curated and deidentified data is the best-performing model.</p><p><strong>Conclusion: </strong>The Path-BigBird pathology transformer model improves automated information extraction from pathology reports. Although Path-BigBird outperforms Clinical BigBird and HiSAN, these less computationally expensive models still have utility when resources are constrained.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":null,"pages":null},"PeriodicalIF":3.3000,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10904099/pdf/","citationCount":"0","resultStr":"{\"title\":\"Path-BigBird: An AI-Driven Transformer Approach to Classification of Cancer Pathology Reports.\",\"authors\":\"Mayanka Chandrashekar, Isaac Lyngaas, Heidi A Hanson, Shang Gao, Xiao-Cheng Wu, John Gounley\",\"doi\":\"10.1200/CCI.23.00148\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Purpose: </strong>Surgical pathology reports are critical for cancer diagnosis and management. To accurately extract information about tumor characteristics from pathology reports in near real time, we explore the impact of using domain-specific transformer models that understand cancer pathology reports.</p><p><strong>Methods: </strong>We built a pathology transformer model, Path-BigBird, by using 2.7 million pathology reports from six SEER cancer registries. We then compare different variations of Path-BigBird with two less computationally intensive methods: Hierarchical Self-Attention Network (HiSAN) classification model and an off-the-shelf clinical transformer model (Clinical BigBird). We use five pathology information extraction tasks for evaluation: site, subsite, laterality, histology, and behavior. Model performance is evaluated by using macro and micro <i>F</i><sub>1</sub> scores.</p><p><strong>Results: </strong>We found that Path-BigBird and Clinical BigBird outperformed the HiSAN in all tasks. Clinical BigBird performed better on the <i>site</i> and <i>laterality</i> tasks. Versions of the Path-BigBird model performed best on the two most difficult tasks: <i>subsite</i> (micro <i>F</i><sub>1</sub> score of 72.53, macro <i>F</i><sub>1</sub> score of 35.76) and <i>histology</i> (micro <i>F</i><sub>1</sub> score of 80.96, macro <i>F</i><sub>1</sub> score of 37.94). The largest performance gains over the HiSAN model were for <i>histology</i>, for which a Path-BigBird model increased the micro <i>F</i><sub>1</sub> score by 1.44 points and the macro <i>F</i><sub>1</sub> score by 3.55 points. Overall, the results suggest that a Path-BigBird model with a vocabulary derived from well-curated and deidentified data is the best-performing model.</p><p><strong>Conclusion: </strong>The Path-BigBird pathology transformer model improves automated information extraction from pathology reports. Although Path-BigBird outperforms Clinical BigBird and HiSAN, these less computationally expensive models still have utility when resources are constrained.</p>\",\"PeriodicalId\":51626,\"journal\":{\"name\":\"JCO Clinical Cancer Informatics\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":3.3000,\"publicationDate\":\"2024-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10904099/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"JCO Clinical Cancer Informatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1200/CCI.23.00148\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ONCOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"JCO Clinical Cancer Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1200/CCI.23.00148","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ONCOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

目的:手术病理报告对于癌症诊断和管理至关重要。为了近乎实时地从病理报告中准确提取肿瘤特征信息,我们探索了使用特定领域的转换器模型对理解癌症病理报告的影响:方法:我们利用六个 SEER 癌症登记处的 270 万份病理报告建立了病理转换器模型 Path-BigBird。然后,我们将 Path-BigBird 的不同变体与两种计算密集度较低的方法进行了比较:分层自注意力网络(HiSAN)分类模型和现成的临床转化模型(Clinical BigBird)。我们使用五种病理信息提取任务进行评估:部位、亚部位、侧位、组织学和行为。模型性能通过宏观和微观 F1 分数进行评估:我们发现,Path-BigBird 和 Clinical BigBird 在所有任务中的表现都优于 HiSAN。临床 BigBird 在部位和侧向任务中表现更好。Path-BigBird 模型的各个版本在两个最难的任务中表现最佳:亚位点(微观 F1 得分为 72.53,宏观 F1 得分为 35.76)和组织学(微观 F1 得分为 80.96,宏观 F1 得分为 37.94)。与 HiSAN 模型相比,组学模型的性能提升最大,Path-BigBird 模型的微观 F1 分数提高了 1.44 分,宏观 F1 分数提高了 3.55 分。总之,研究结果表明,Path-BigBird 模型的词汇来源于精心整理和去标识化的数据,是表现最好的模型:结论:Path-BigBird 病理转换器模型改进了病理报告的自动信息提取。虽然 Path-BigBird 的性能优于 Clinical BigBird 和 HiSAN,但在资源有限的情况下,这些计算成本较低的模型仍具有实用性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Path-BigBird: An AI-Driven Transformer Approach to Classification of Cancer Pathology Reports.

Purpose: Surgical pathology reports are critical for cancer diagnosis and management. To accurately extract information about tumor characteristics from pathology reports in near real time, we explore the impact of using domain-specific transformer models that understand cancer pathology reports.

Methods: We built a pathology transformer model, Path-BigBird, by using 2.7 million pathology reports from six SEER cancer registries. We then compare different variations of Path-BigBird with two less computationally intensive methods: Hierarchical Self-Attention Network (HiSAN) classification model and an off-the-shelf clinical transformer model (Clinical BigBird). We use five pathology information extraction tasks for evaluation: site, subsite, laterality, histology, and behavior. Model performance is evaluated by using macro and micro F1 scores.

Results: We found that Path-BigBird and Clinical BigBird outperformed the HiSAN in all tasks. Clinical BigBird performed better on the site and laterality tasks. Versions of the Path-BigBird model performed best on the two most difficult tasks: subsite (micro F1 score of 72.53, macro F1 score of 35.76) and histology (micro F1 score of 80.96, macro F1 score of 37.94). The largest performance gains over the HiSAN model were for histology, for which a Path-BigBird model increased the micro F1 score by 1.44 points and the macro F1 score by 3.55 points. Overall, the results suggest that a Path-BigBird model with a vocabulary derived from well-curated and deidentified data is the best-performing model.

Conclusion: The Path-BigBird pathology transformer model improves automated information extraction from pathology reports. Although Path-BigBird outperforms Clinical BigBird and HiSAN, these less computationally expensive models still have utility when resources are constrained.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
6.20
自引率
4.80%
发文量
190
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信