Relation extraction using large language models: a case study on acupuncture point locations.

IF 4.7 2区医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of the American Medical Informatics Association Pub Date : 2024-11-01 DOI:10.1093/jamia/ocae233

Yiming Li, Xueqing Peng, Jianfu Li, Xu Zuo, Suyuan Peng, Donghong Pei, Cui Tao, Hua Xu, Na Hong

{"title":"Relation extraction using large language models: a case study on acupuncture point locations.","authors":"Yiming Li, Xueqing Peng, Jianfu Li, Xu Zuo, Suyuan Peng, Donghong Pei, Cui Tao, Hua Xu, Na Hong","doi":"10.1093/jamia/ocae233","DOIUrl":null,"url":null,"abstract":"Objective: In acupuncture therapy, the accurate location of acupoints is essential for its effectiveness. The advanced language understanding capabilities of large language models (LLMs) like Generative Pre-trained Transformers (GPTs) and Llama present a significant opportunity for extracting relations related to acupoint locations from textual knowledge sources. This study aims to explore the performance of LLMs in extracting acupoint-related location relations and assess the impact of fine-tuning on GPT's performance.Materials and methods: We utilized the World Health Organization Standard Acupuncture Point Locations in the Western Pacific Region (WHO Standard) as our corpus, which consists of descriptions of 361 acupoints. Five types of relations (\"direction_of\", \"distance_of\", \"part_of\", \"near_acupoint\", and \"located_near\") (n = 3174) between acupoints were annotated. Four models were compared: pre-trained GPT-3.5, fine-tuned GPT-3.5, pre-trained GPT-4, as well as pretrained Llama 3. Performance metrics included micro-average exact match precision, recall, and F1 scores.Results: Our results demonstrate that fine-tuned GPT-3.5 consistently outperformed other models in F1 scores across all relation types. Overall, it achieved the highest micro-average F1 score of 0.92.Discussion: The superior performance of the fine-tuned GPT-3.5 model, as shown by its F1 scores, underscores the importance of domain-specific fine-tuning in enhancing relation extraction capabilities for acupuncture-related tasks. In light of the findings from this study, it offers valuable insights into leveraging LLMs for developing clinical decision support and creating educational modules in acupuncture.Conclusion: This study underscores the effectiveness of LLMs like GPT and Llama in extracting relations related to acupoint locations, with implications for accurately modeling acupuncture knowledge and promoting standard implementation in acupuncture training and practice. The findings also contribute to advancing informatics applications in traditional and complementary medicine, showcasing the potential of LLMs in natural language processing.","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"2622-2631"},"PeriodicalIF":4.7000,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11491641/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the American Medical Informatics Association","FirstCategoryId":"91","ListUrlMain":"https://doi.org/10.1093/jamia/ocae233","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Objective: In acupuncture therapy, the accurate location of acupoints is essential for its effectiveness. The advanced language understanding capabilities of large language models (LLMs) like Generative Pre-trained Transformers (GPTs) and Llama present a significant opportunity for extracting relations related to acupoint locations from textual knowledge sources. This study aims to explore the performance of LLMs in extracting acupoint-related location relations and assess the impact of fine-tuning on GPT's performance.

Materials and methods: We utilized the World Health Organization Standard Acupuncture Point Locations in the Western Pacific Region (WHO Standard) as our corpus, which consists of descriptions of 361 acupoints. Five types of relations ("direction_of", "distance_of", "part_of", "near_acupoint", and "located_near") (n = 3174) between acupoints were annotated. Four models were compared: pre-trained GPT-3.5, fine-tuned GPT-3.5, pre-trained GPT-4, as well as pretrained Llama 3. Performance metrics included micro-average exact match precision, recall, and F1 scores.

Results: Our results demonstrate that fine-tuned GPT-3.5 consistently outperformed other models in F1 scores across all relation types. Overall, it achieved the highest micro-average F1 score of 0.92.

Discussion: The superior performance of the fine-tuned GPT-3.5 model, as shown by its F1 scores, underscores the importance of domain-specific fine-tuning in enhancing relation extraction capabilities for acupuncture-related tasks. In light of the findings from this study, it offers valuable insights into leveraging LLMs for developing clinical decision support and creating educational modules in acupuncture.

Conclusion: This study underscores the effectiveness of LLMs like GPT and Llama in extracting relations related to acupoint locations, with implications for accurately modeling acupuncture knowledge and promoting standard implementation in acupuncture training and practice. The findings also contribute to advancing informatics applications in traditional and complementary medicine, showcasing the potential of LLMs in natural language processing.

查看原文本刊更多论文

使用大型语言模型进行关系提取：穴位位置案例研究。

目的：在针灸疗法中，准确定位穴位对其疗效至关重要。生成预训练转换器（GPT）和 Llama 等大型语言模型（LLM）的高级语言理解能力为从文本知识源中提取与穴位位置相关的关系提供了重要机会。本研究旨在探索 LLMs 在提取穴位相关位置关系方面的性能，并评估微调对 GPT 性能的影响：我们使用世界卫生组织西太平洋地区标准穴位位置（WHO 标准）作为语料库，其中包含 361 个穴位的描述。我们注释了穴位之间的五种关系（"方向_of"、"距离_of"、"部分_of"、"近穴位 "和 "位于_近穴位"）（n = 3174）。比较了四种模型：预训练的 GPT-3.5、微调的 GPT-3.5、预训练的 GPT-4 以及预训练的 Llama 3。性能指标包括微平均精确匹配精度、召回率和 F1 分数：我们的结果表明，在所有关系类型中，经过微调的 GPT-3.5 的 F1 分数始终优于其他模型。总体而言，它取得了最高的微平均 F1 分数 0.92：微调后的 GPT-3.5 模型在 F1 分数上的优异表现突出表明了针对特定领域进行微调对于提高针灸相关任务的关系提取能力的重要性。根据这项研究的结果，它为利用 LLMs 开发临床决策支持和创建针灸教育模块提供了有价值的见解：本研究强调了 GPT 和 Llama 等 LLMs 在提取与穴位位置相关的关系方面的有效性，这对针灸知识的精确建模以及促进针灸培训和实践中的标准实施具有重要意义。研究结果还有助于推动信息学在传统医学和补充医学中的应用，展示了 LLM 在自然语言处理方面的潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of the American Medical Informatics Association 医学-计算机：跨学科应用

CiteScore

14.50

自引率

7.80%

发文量

230

审稿时长

3-8 weeks

期刊介绍： JAMIA is AMIA''s premier peer-reviewed journal for biomedical and health informatics. Covering the full spectrum of activities in the field, JAMIA includes informatics articles in the areas of clinical care, clinical research, translational science, implementation science, imaging, education, consumer health, public health, and policy. JAMIA''s articles describe innovative informatics research and systems that help to advance biomedical science and to promote health. Case reports, perspectives and reviews also help readers stay connected with the most important informatics developments in implementation, policy and education.