LKAN:基于 LLM 的肝癌临床分期知识感知注意力网络。

IF 6.7 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS
Ya Li, Xuecong Zheng, Jiaping Li, Qingyun Dai, Chang-Dong Wang, Min Chen
{"title":"LKAN:基于 LLM 的肝癌临床分期知识感知注意力网络。","authors":"Ya Li, Xuecong Zheng, Jiaping Li, Qingyun Dai, Chang-Dong Wang, Min Chen","doi":"10.1109/JBHI.2024.3478809","DOIUrl":null,"url":null,"abstract":"<p><p>Clinical staging of liver cancer (CSoLC), an important indicator for evaluating the degree of deterioration of primary liver cancer cells (PLCCs), is key in the diagnosis, treatment, and rehabilitation of liver cancer. In China, the current CSoLC adopts the China liver cancer (CNLC) staging, which is usually evaluated by clinicians based on the patient's radiology reports. Therefore, inferring clinical information from unstructured radiology reports can provide auxiliary decision support for clinicians. The key to solving the challenging task is to guide the model to pay attention to the staging-related words or sentences, and the following issues may occur: 1) Imbalanced categories: The symptoms of liver cancer in the early- or mid-stage are not obvious, resulting in more data in the end-stage. 2) Domain sensitivity of liver cancer data: The liver cancer dataset contains a large amount of domain knowledge, and the conventional methods can exacerbate out-of-vocabulary, which greatly affects the accuracy of classification. 3) Free-text and lengthy report: The radiology report of liver cancer sparsely describes various lesions with domain-specific terms, which poses difficulties in mining key information related to staging. To tackle these challenges, this article proposes a large language model (LLM)-based Knowledge-aware Attention Network (LKAN) for CSoLC. First, for maintaining semantic consistency, LLM and a rule-based algorithm are integrated to generate more diverse and reasonable data. Second, unlabeled radiology corpus of liver cancer are pre-trained to introduce domain knowledge for subsequent representation learning. Third, attention is improved by incorporating both global and local features, which can provide professional guidance for the classifier to focus on the important information. Compared with the baseline models, the classification accuracy of LKAN has achieved the best results with 90.3% Accuracy, 90.0% Macro_F1 score, and 90.0% Macro_Recall. The code is available at https://github.com/xczhh/Supplemental-Material.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":""},"PeriodicalIF":6.7000,"publicationDate":"2024-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"LKAN: LLM-Based Knowledge-Aware Attention Network for Clinical Staging of Liver Cancer.\",\"authors\":\"Ya Li, Xuecong Zheng, Jiaping Li, Qingyun Dai, Chang-Dong Wang, Min Chen\",\"doi\":\"10.1109/JBHI.2024.3478809\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Clinical staging of liver cancer (CSoLC), an important indicator for evaluating the degree of deterioration of primary liver cancer cells (PLCCs), is key in the diagnosis, treatment, and rehabilitation of liver cancer. In China, the current CSoLC adopts the China liver cancer (CNLC) staging, which is usually evaluated by clinicians based on the patient's radiology reports. Therefore, inferring clinical information from unstructured radiology reports can provide auxiliary decision support for clinicians. The key to solving the challenging task is to guide the model to pay attention to the staging-related words or sentences, and the following issues may occur: 1) Imbalanced categories: The symptoms of liver cancer in the early- or mid-stage are not obvious, resulting in more data in the end-stage. 2) Domain sensitivity of liver cancer data: The liver cancer dataset contains a large amount of domain knowledge, and the conventional methods can exacerbate out-of-vocabulary, which greatly affects the accuracy of classification. 3) Free-text and lengthy report: The radiology report of liver cancer sparsely describes various lesions with domain-specific terms, which poses difficulties in mining key information related to staging. To tackle these challenges, this article proposes a large language model (LLM)-based Knowledge-aware Attention Network (LKAN) for CSoLC. First, for maintaining semantic consistency, LLM and a rule-based algorithm are integrated to generate more diverse and reasonable data. Second, unlabeled radiology corpus of liver cancer are pre-trained to introduce domain knowledge for subsequent representation learning. Third, attention is improved by incorporating both global and local features, which can provide professional guidance for the classifier to focus on the important information. Compared with the baseline models, the classification accuracy of LKAN has achieved the best results with 90.3% Accuracy, 90.0% Macro_F1 score, and 90.0% Macro_Recall. The code is available at https://github.com/xczhh/Supplemental-Material.</p>\",\"PeriodicalId\":13073,\"journal\":{\"name\":\"IEEE Journal of Biomedical and Health Informatics\",\"volume\":\"PP \",\"pages\":\"\"},\"PeriodicalIF\":6.7000,\"publicationDate\":\"2024-10-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Journal of Biomedical and Health Informatics\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://doi.org/10.1109/JBHI.2024.3478809\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Journal of Biomedical and Health Informatics","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1109/JBHI.2024.3478809","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

摘要

肝癌临床分期(CSoLC)是评价原发性肝癌细胞(PLCC)恶化程度的重要指标,是肝癌诊断、治疗和康复的关键。在中国,目前的 CSoLC 采用的是中国肝癌(CNLC)分期,通常由临床医生根据患者的放射学报告进行评估。因此,从非结构化的放射学报告中推断临床信息可为临床医生提供辅助决策支持。解决这一挑战性任务的关键在于引导模型关注分期相关的单词或句子,可能会出现以下问题:1)分类失衡:肝癌早期或中期症状不明显,导致末期数据较多。2) 肝癌数据的领域敏感性:肝癌数据集包含大量领域知识,传统方法会加剧词汇缺失,大大影响分类的准确性。3) 自由文本和冗长报告:肝癌的放射报告用特定领域的术语对各种病变进行了稀疏描述,这给挖掘与分期相关的关键信息带来了困难。针对这些难题,本文提出了一种基于大语言模型(LLM)的知识感知注意力网络(LKAN),用于 CSoLC。首先,为了保持语义的一致性,LLM 与基于规则的算法相结合,以生成更多样、更合理的数据。其次,对未标记的肝癌放射学语料进行预训练,为后续的表征学习引入领域知识。第三,通过结合全局和局部特征来提高注意力,为分类器关注重要信息提供专业指导。与基线模型相比,LKAN 的分类准确率达到了最佳效果,准确率为 90.3%,Macro_F1 分数为 90.0%,Macro_Recall 分数为 90.0%。代码见 https://github.com/xczhh/Supplemental-Material。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
LKAN: LLM-Based Knowledge-Aware Attention Network for Clinical Staging of Liver Cancer.

Clinical staging of liver cancer (CSoLC), an important indicator for evaluating the degree of deterioration of primary liver cancer cells (PLCCs), is key in the diagnosis, treatment, and rehabilitation of liver cancer. In China, the current CSoLC adopts the China liver cancer (CNLC) staging, which is usually evaluated by clinicians based on the patient's radiology reports. Therefore, inferring clinical information from unstructured radiology reports can provide auxiliary decision support for clinicians. The key to solving the challenging task is to guide the model to pay attention to the staging-related words or sentences, and the following issues may occur: 1) Imbalanced categories: The symptoms of liver cancer in the early- or mid-stage are not obvious, resulting in more data in the end-stage. 2) Domain sensitivity of liver cancer data: The liver cancer dataset contains a large amount of domain knowledge, and the conventional methods can exacerbate out-of-vocabulary, which greatly affects the accuracy of classification. 3) Free-text and lengthy report: The radiology report of liver cancer sparsely describes various lesions with domain-specific terms, which poses difficulties in mining key information related to staging. To tackle these challenges, this article proposes a large language model (LLM)-based Knowledge-aware Attention Network (LKAN) for CSoLC. First, for maintaining semantic consistency, LLM and a rule-based algorithm are integrated to generate more diverse and reasonable data. Second, unlabeled radiology corpus of liver cancer are pre-trained to introduce domain knowledge for subsequent representation learning. Third, attention is improved by incorporating both global and local features, which can provide professional guidance for the classifier to focus on the important information. Compared with the baseline models, the classification accuracy of LKAN has achieved the best results with 90.3% Accuracy, 90.0% Macro_F1 score, and 90.0% Macro_Recall. The code is available at https://github.com/xczhh/Supplemental-Material.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
IEEE Journal of Biomedical and Health Informatics
IEEE Journal of Biomedical and Health Informatics COMPUTER SCIENCE, INFORMATION SYSTEMS-COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS
CiteScore
13.60
自引率
6.50%
发文量
1151
期刊介绍: IEEE Journal of Biomedical and Health Informatics publishes original papers presenting recent advances where information and communication technologies intersect with health, healthcare, life sciences, and biomedicine. Topics include acquisition, transmission, storage, retrieval, management, and analysis of biomedical and health information. The journal covers applications of information technologies in healthcare, patient monitoring, preventive care, early disease diagnosis, therapy discovery, and personalized treatment protocols. It explores electronic medical and health records, clinical information systems, decision support systems, medical and biological imaging informatics, wearable systems, body area/sensor networks, and more. Integration-related topics like interoperability, evidence-based medicine, and secure patient data are also addressed.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信