InfectA-Chat, an Arabic Large Language Model for Infectious Diseases: Comparative Analysis.

IF 3.1 3区医学 Q2 MEDICAL INFORMATICS

JMIR Medical Informatics Pub Date : 2025-02-10 DOI:10.2196/63881

Yesim Selcuk, Eunhui Kim, Insung Ahn

{"title":"InfectA-Chat, an Arabic Large Language Model for Infectious Diseases: Comparative Analysis.","authors":"Yesim Selcuk, Eunhui Kim, Insung Ahn","doi":"10.2196/63881","DOIUrl":null,"url":null,"abstract":"Background: Infectious diseases have consistently been a significant concern in public health, requiring proactive measures to safeguard societal well-being. In this regard, regular monitoring activities play a crucial role in mitigating the adverse effects of diseases on society. To monitor disease trends, various organizations, such as the World Health Organization (WHO) and the European Centre for Disease Prevention and Control (ECDC), collect diverse surveillance data and make them publicly accessible. However, these platforms primarily present surveillance data in English, which creates language barriers for non-English-speaking individuals and global public health efforts to accurately observe disease trends. This challenge is particularly noticeable in regions such as the Middle East, where specific infectious diseases, such as Middle East respiratory syndrome coronavirus (MERS-CoV), have seen a dramatic increase. For such regions, it is essential to develop tools that can overcome language barriers and reach more individuals to alleviate the negative impacts of these diseases.Objective: This study aims to address these issues; therefore, we propose InfectA-Chat, a cutting-edge large language model (LLM) specifically designed for the Arabic language but also incorporating English for question and answer (Q&A) tasks. InfectA-Chat leverages its deep understanding of the language to provide users with information on the latest trends in infectious diseases based on their queries.Methods: This comprehensive study was achieved by instruction tuning the AceGPT-7B and AceGPT-7B-Chat models on a Q&A task, using a dataset of 55,400 Arabic and English domain-specific instruction-following data. The performance of these fine-tuned models was evaluated using 2770 domain-specific Arabic and English instruction-following data, using the GPT-4 evaluation method. A comparative analysis was then performed against Arabic LLMs and state-of-the-art models, including AceGPT-13B-Chat, Jais-13B-Chat, Gemini, GPT-3.5, and GPT-4. Furthermore, to ensure the model had access to the latest information on infectious diseases by regularly updating the data without additional fine-tuning, we used the retrieval-augmented generation (RAG) method.Results: InfectA-Chat demonstrated good performance in answering questions about infectious diseases by the GPT-4 evaluation method. Our comparative analysis revealed that it outperforms the AceGPT-7B-Chat and InfectA-Chat (based on AceGPT-7B) models by a margin of 43.52%. It also surpassed other Arabic LLMs such as AceGPT-13B-Chat and Jais-13B-Chat by 48.61%. Among the state-of-the-art models, InfectA-Chat achieved a leading performance of 23.78%, competing closely with the GPT-4 model. Furthermore, the RAG method in InfectA-Chat significantly improved document retrieval accuracy. Notably, RAG retrieved more accurate documents based on queries when the top-k parameter value was increased.Conclusions: Our findings highlight the shortcomings of general Arabic LLMs in providing up-to-date information about infectious diseases. With this study, we aim to empower individuals and public health efforts by offering a bilingual Q&A system for infectious disease monitoring.","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e63881"},"PeriodicalIF":3.1000,"publicationDate":"2025-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11851044/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR Medical Informatics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.2196/63881","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Infectious diseases have consistently been a significant concern in public health, requiring proactive measures to safeguard societal well-being. In this regard, regular monitoring activities play a crucial role in mitigating the adverse effects of diseases on society. To monitor disease trends, various organizations, such as the World Health Organization (WHO) and the European Centre for Disease Prevention and Control (ECDC), collect diverse surveillance data and make them publicly accessible. However, these platforms primarily present surveillance data in English, which creates language barriers for non-English-speaking individuals and global public health efforts to accurately observe disease trends. This challenge is particularly noticeable in regions such as the Middle East, where specific infectious diseases, such as Middle East respiratory syndrome coronavirus (MERS-CoV), have seen a dramatic increase. For such regions, it is essential to develop tools that can overcome language barriers and reach more individuals to alleviate the negative impacts of these diseases.

Objective: This study aims to address these issues; therefore, we propose InfectA-Chat, a cutting-edge large language model (LLM) specifically designed for the Arabic language but also incorporating English for question and answer (Q&A) tasks. InfectA-Chat leverages its deep understanding of the language to provide users with information on the latest trends in infectious diseases based on their queries.

Methods: This comprehensive study was achieved by instruction tuning the AceGPT-7B and AceGPT-7B-Chat models on a Q&A task, using a dataset of 55,400 Arabic and English domain-specific instruction-following data. The performance of these fine-tuned models was evaluated using 2770 domain-specific Arabic and English instruction-following data, using the GPT-4 evaluation method. A comparative analysis was then performed against Arabic LLMs and state-of-the-art models, including AceGPT-13B-Chat, Jais-13B-Chat, Gemini, GPT-3.5, and GPT-4. Furthermore, to ensure the model had access to the latest information on infectious diseases by regularly updating the data without additional fine-tuning, we used the retrieval-augmented generation (RAG) method.

Results: InfectA-Chat demonstrated good performance in answering questions about infectious diseases by the GPT-4 evaluation method. Our comparative analysis revealed that it outperforms the AceGPT-7B-Chat and InfectA-Chat (based on AceGPT-7B) models by a margin of 43.52%. It also surpassed other Arabic LLMs such as AceGPT-13B-Chat and Jais-13B-Chat by 48.61%. Among the state-of-the-art models, InfectA-Chat achieved a leading performance of 23.78%, competing closely with the GPT-4 model. Furthermore, the RAG method in InfectA-Chat significantly improved document retrieval accuracy. Notably, RAG retrieved more accurate documents based on queries when the top-k parameter value was increased.

Conclusions: Our findings highlight the shortcomings of general Arabic LLMs in providing up-to-date information about infectious diseases. With this study, we aim to empower individuals and public health efforts by offering a bilingual Q&A system for infectious disease monitoring.

查看原文本刊更多论文

传染病的阿拉伯语大语言模型：比较分析。

背景：传染病一直是公共卫生领域的一个重大问题，需要采取积极措施保障社会福祉。在这方面，定期监测活动在减轻疾病对社会的不利影响方面发挥着关键作用。为了监测疾病趋势，世界卫生组织（世卫组织）和欧洲疾病预防和控制中心等各组织收集各种监测数据，并向公众开放。然而，这些平台主要以英语呈现监测数据，这给非英语个人和全球公共卫生工作造成了语言障碍，无法准确观察疾病趋势。这一挑战在中东等地区尤为明显，中东呼吸综合征冠状病毒等特定传染病在中东地区急剧增加。对于这些地区，至关重要的是开发能够克服语言障碍的工具，并使更多的人接触到这些疾病，以减轻这些疾病的负面影响。目的：本研究旨在解决这些问题；因此，我们提出了一种尖端的大型语言模型（LLM），专门为阿拉伯语设计，但也纳入了英语的问答（Q&A）任务。感染聊天利用其对语言的深刻理解，根据用户的查询为用户提供有关传染病最新趋势的信息。方法：这项综合研究是通过在问答任务上对AceGPT-7B和AceGPT-7B- chat模型进行指令调整来实现的，使用了55,400个阿拉伯语和英语特定领域的指令跟随数据集。使用GPT-4评估方法，使用2770个特定领域的阿拉伯语和英语指令遵循数据对这些微调模型的性能进行了评估。然后与阿拉伯法学硕士和最先进的模型（包括AceGPT-13B-Chat、Jais-13B-Chat、Gemini、GPT-3.5和GPT-4）进行比较分析。此外，为了确保模型能够在不进行额外微调的情况下定期更新数据，从而获得有关传染病的最新信息，我们使用了检索增强生成（RAG）方法。结果：采用GPT-4评价方法，对感染性疾病的问题进行了较好的回答。我们的比较分析显示，它比AceGPT-7B- chat和感染- chat（基于AceGPT-7B）模型高出43.52%。它也超过了其他阿拉伯法学硕士如AceGPT-13B-Chat和Jais-13B-Chat 48.61%。在最先进的模型中，感染-聊天取得了23.78%的领先性能，与GPT-4模型密切竞争。此外，RAG方法在infected - chat中显著提高了文档检索的准确性。值得注意的是，当top-k参数值增加时，RAG基于查询检索到更准确的文档。结论：我们的研究结果突出了一般阿拉伯法学硕士在提供有关传染病的最新信息方面的缺点。通过这项研究，我们旨在通过提供传染病监测的双语问答系统来增强个人和公共卫生工作的能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

JMIR Medical Informatics Medicine-Health Informatics

CiteScore

7.90

自引率

3.10%

发文量

173

审稿时长

12 weeks

期刊介绍： JMIR Medical Informatics (JMI, ISSN 2291-9694) is a top-rated, tier A journal which focuses on clinical informatics, big data in health and health care, decision support for health professionals, electronic health records, ehealth infrastructures and implementation. It has a focus on applied, translational research, with a broad readership including clinicians, CIOs, engineers, industry and health informatics professionals. Published by JMIR Publications, publisher of the Journal of Medical Internet Research (JMIR), the leading eHealth/mHealth journal (Impact Factor 2016: 5.175), JMIR Med Inform has a slightly different scope (emphasizing more on applications for clinicians and health professionals rather than consumers/citizens, which is the focus of JMIR), publishes even faster, and also allows papers which are more technical or more formative than what would be published in the Journal of Medical Internet Research.