Implementing large language model and retrieval augmented generation to extract geographic locations of illicit transnational kidney trade.

IF 3 2区 医学 Q2 PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH
Zifu Wang, Meng-Hao Li, Patrick Baxter, Olzhas Zhorayev, Jiaxin Wei, Valerie Kovacs, Qiuhan Zhao, Chaowei Yang, Naoru Koizumi
{"title":"Implementing large language model and retrieval augmented generation to extract geographic locations of illicit transnational kidney trade.","authors":"Zifu Wang, Meng-Hao Li, Patrick Baxter, Olzhas Zhorayev, Jiaxin Wei, Valerie Kovacs, Qiuhan Zhao, Chaowei Yang, Naoru Koizumi","doi":"10.1186/s12942-025-00397-8","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Illicit kidney trade networks, operating globally, involve intricate interactions among various players, most notably buyers, sellers, brokers, and surgeons. A comprehensive understanding of these trade networks is, however, hindered by the lack of systematically amassed data for analysis. Further, extracting the geographic locations of buyers, sellers, brokers, transplant surgeons, and medical facilities in all relevant publications often involves extensive, time-consuming, manual labelling that is very costly. Although current techniques such as Named Entity Recognition (NER) tools can potentially automate the process, they are limited to identifying country names and often fail to associate the roles (i.e., offering buyer, seller, broker and/or surgery) that each country played.</p><p><strong>Methods: </strong>This study employed state-of-the-art technologies, including Bidirectional Encoder Representations from Transformers (BERT) and Generative Pre-Trained Transformers (GPT) model Llama3.3 from Meta in developing a kidney trade country database. We first extracted news articles reporting illicit kidney trade from the LexisNexis database (2000-2022). BERT and Llama3.3 with chain-of-thought prompt tuning strategies were then applied to the materials to determine the relevance of articles to the illegal kidney trade and to identify the roles those different countries played in kidney trade cases over the past 23 years. The specific country classes recorded in the final kidney trade database included: a) countries of origin for kidney sellers; b) countries of origin of kidney buyers; c) countries performing illegal transplant surgeries; and d) countries of origin of organ trafficking brokers.</p><p><strong>Results: </strong>The BERT classification model achieved an accuracy of 88.75%, ensuring that only relevant articles were analyzed. Additionally, the Llama3.3-70B model with chain-of-thought prompt tuning strategies extracted location-based roles with an accuracy of 86.30% for sellers, 88.89% for buyers, 93.33% for brokers, and 95.93% for surgeries, supporting these observed patterns. We observed in the final database that the kidney trade networks change and evolve dynamically where the primary role played by each country (as a host of either sellers, buyers or surgeries) change over time. About half of the top 10 countries playing each role gets replaced by other countries within a decade. The final database also demonstrated that developing countries were more likely to be a host of kidney sellers while that played by developed countries was a host of kidney buyers.</p><p><strong>Conclusion: </strong>The current study developed a geospatial database describing transnational kidney trade country networks over the past two decades. The new approach for geographic location extraction that is more precise compared to conventional NER and machine learning methods.</p>","PeriodicalId":48739,"journal":{"name":"International Journal of Health Geographics","volume":"24 1","pages":"10"},"PeriodicalIF":3.0000,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12039186/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Health Geographics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12942-025-00397-8","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Illicit kidney trade networks, operating globally, involve intricate interactions among various players, most notably buyers, sellers, brokers, and surgeons. A comprehensive understanding of these trade networks is, however, hindered by the lack of systematically amassed data for analysis. Further, extracting the geographic locations of buyers, sellers, brokers, transplant surgeons, and medical facilities in all relevant publications often involves extensive, time-consuming, manual labelling that is very costly. Although current techniques such as Named Entity Recognition (NER) tools can potentially automate the process, they are limited to identifying country names and often fail to associate the roles (i.e., offering buyer, seller, broker and/or surgery) that each country played.

Methods: This study employed state-of-the-art technologies, including Bidirectional Encoder Representations from Transformers (BERT) and Generative Pre-Trained Transformers (GPT) model Llama3.3 from Meta in developing a kidney trade country database. We first extracted news articles reporting illicit kidney trade from the LexisNexis database (2000-2022). BERT and Llama3.3 with chain-of-thought prompt tuning strategies were then applied to the materials to determine the relevance of articles to the illegal kidney trade and to identify the roles those different countries played in kidney trade cases over the past 23 years. The specific country classes recorded in the final kidney trade database included: a) countries of origin for kidney sellers; b) countries of origin of kidney buyers; c) countries performing illegal transplant surgeries; and d) countries of origin of organ trafficking brokers.

Results: The BERT classification model achieved an accuracy of 88.75%, ensuring that only relevant articles were analyzed. Additionally, the Llama3.3-70B model with chain-of-thought prompt tuning strategies extracted location-based roles with an accuracy of 86.30% for sellers, 88.89% for buyers, 93.33% for brokers, and 95.93% for surgeries, supporting these observed patterns. We observed in the final database that the kidney trade networks change and evolve dynamically where the primary role played by each country (as a host of either sellers, buyers or surgeries) change over time. About half of the top 10 countries playing each role gets replaced by other countries within a decade. The final database also demonstrated that developing countries were more likely to be a host of kidney sellers while that played by developed countries was a host of kidney buyers.

Conclusion: The current study developed a geospatial database describing transnational kidney trade country networks over the past two decades. The new approach for geographic location extraction that is more precise compared to conventional NER and machine learning methods.

实现大型语言模型和检索增强生成,提取非法跨国肾脏贸易的地理位置。
背景:非法肾脏交易网络在全球范围内运作,涉及各种参与者之间复杂的互动,最明显的是买家、卖家、经纪人和外科医生。然而,由于缺乏系统积累的分析数据,对这些贸易网络的全面了解受到了阻碍。此外,在所有相关出版物中提取买方、卖方、经纪人、移植外科医生和医疗设施的地理位置,往往涉及大量、耗时和昂贵的手工标记。虽然目前的技术,如命名实体识别(NER)工具,有可能使这一过程自动化,但它们仅限于识别国家名称,而且往往不能将每个国家扮演的角色(即提供买方、卖方、经纪人和/或手术)联系起来。方法:本研究采用最先进的技术,包括来自变形金刚的双向编码器表示(BERT)和来自Meta的生成式预训练变形金刚(GPT)模型Llama3.3,开发肾脏贸易国家数据库。我们首先从LexisNexis数据库(2000-2022)中提取报道非法肾脏交易的新闻文章。然后将BERT和Llama3.3与思维链提示调整策略应用于材料,以确定文章与非法肾脏贸易的相关性,并确定这些不同国家在过去23年中肾脏贸易案件中所扮演的角色。最终肾脏贸易数据库中记录的具体国家类别包括:a)肾脏销售商的原产国;B)肾脏买家的原产国;C)进行非法移植手术的国家;d)器官贩卖掮客的来源国。结果:BERT分类模型准确率达到88.75%,保证了只分析相关文章。此外,采用思维链提示调整策略的Llama3.3-70B模型提取基于位置的角色,卖家的准确率为86.30%,买家的准确率为88.89%,经纪人的准确率为93.33%,手术的准确率为95.93%,支持这些观察到的模式。我们在最后的数据库中观察到,肾脏贸易网络是动态变化和发展的,每个国家扮演的主要角色(作为卖家、买家或手术的宿主)随着时间的推移而变化。在扮演这两个角色的前10个国家中,大约有一半会在10年内被其他国家取代。最后的数据库还表明,发展中国家更有可能成为肾脏卖家的主体,而发达国家则更有可能成为肾脏买家的主体。结论:目前的研究建立了一个地理空间数据库,描述了过去二十年来跨国肾脏贸易国家网络。与传统的NER和机器学习方法相比,新的地理位置提取方法更加精确。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
International Journal of Health Geographics
International Journal of Health Geographics PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH -
CiteScore
10.20
自引率
2.00%
发文量
17
审稿时长
12 weeks
期刊介绍: A leader among the field, International Journal of Health Geographics is an interdisciplinary, open access journal publishing internationally significant studies of geospatial information systems and science applications in health and healthcare. With an exceptional author satisfaction rate and a quick time to first decision, the journal caters to readers across an array of healthcare disciplines globally. International Journal of Health Geographics welcomes novel studies in the health and healthcare context spanning from spatial data infrastructure and Web geospatial interoperability research, to research into real-time Geographic Information Systems (GIS)-enabled surveillance services, remote sensing applications, spatial epidemiology, spatio-temporal statistics, internet GIS and cyberspace mapping, participatory GIS and citizen sensing, geospatial big data, healthy smart cities and regions, and geospatial Internet of Things and blockchain.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信