使用合成医学文本评估ChatGPT在临床营养咨询中的多语言表现：来自中亚的见解。

IF 3.7 3区医学 Q2 NUTRITION & DIETETICS

Journal of Nutrition Pub Date : 2025-03-01 DOI:10.1016/j.tjnut.2024.12.018

Gulnoza Adilmetova , Ruslan Nassyrov , Aizhan Meyerbekova , Aknur Karabay , Huseyin Atakan Varol , Mei-Yen Chan

{"title":"使用合成医学文本评估ChatGPT在临床营养咨询中的多语言表现：来自中亚的见解。","authors":"Gulnoza Adilmetova , Ruslan Nassyrov , Aizhan Meyerbekova , Aknur Karabay , Huseyin Atakan Varol , Mei-Yen Chan","doi":"10.1016/j.tjnut.2024.12.018","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>Although large language models like ChatGPT-4 have demonstrated competency in English, their performance for minority groups speaking underrepresented languages, as well as their ability to adapt to specific sociocultural nuances and regional cuisines, such as those in Central Asia (for example, Kazakhstan), still requires further investigation.</div></div><div><h3>Objectives</h3><div>To evaluate and compare the effectiveness of the ChatGPT-4 system in providing personalized, evidence-based nutritional recommendations in English, Kazakh, and Russian in Central Asia.</div></div><div><h3>Methods</h3><div>This study was conducted from 15 May to 31 August, 2023. On the basis of 50 mock patient profiles, ChatGPT-4 generated dietary advice, and responses were evaluated for personalization, consistency, and practicality using a 5-point Likert scale. To identify significant differences between the 3 languages, the Kruskal–Wallis test was conducted. Additional pairwise comparisons for each language were carried out using the post hoc Dunn's test.</div></div><div><h3>Results</h3><div>ChatGPT-4 showed a moderate level of performance in each category for English and Russian languages, whereas in Kazakh language, outputs were unsuitable for evaluation. The scores for English, Russian, and Kazakh were as follows: for personalization, 3.32 ± 0.46, 3.18 ± 0.38, and 1.01 ± 0.06; for consistency, 3.48 ± 0.43, 3.38 ± 0.39, and 1.09 ± 0.18; and for practicality, 3.25 ± 0.41, 3.37 ± 0.38, and 1.07 ± 0.15, respectively. The Kruskal–Wallis test indicated statistically significant differences in ChatGPT-4's performance across the 3 languages (<em>P <</em> 0.001). Subsequent post hoc analysis using Dunn’s test showed that the performance in both English and Russian was significantly different from that in Kazakh.</div></div><div><h3>Conclusions</h3><div>Our findings show that, despite using identical prompts across 3 distinct languages, the ChatGPT-4's capability to produce sensible outputs is limited by the lack of training data in non-English languages. Thus, a customized large language model should be developed to perform better in underrepresented languages and to take into account specific local diets and practices.</div></div>","PeriodicalId":16620,"journal":{"name":"Journal of Nutrition","volume":"155 3","pages":"Pages 729-735"},"PeriodicalIF":3.7000,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Evaluating ChatGPT’s Multilingual Performance in Clinical Nutrition Advice Using Synthetic Medical Text: Insights from Central Asia\",\"authors\":\"Gulnoza Adilmetova , Ruslan Nassyrov , Aizhan Meyerbekova , Aknur Karabay , Huseyin Atakan Varol , Mei-Yen Chan\",\"doi\":\"10.1016/j.tjnut.2024.12.018\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Background</h3><div>Although large language models like ChatGPT-4 have demonstrated competency in English, their performance for minority groups speaking underrepresented languages, as well as their ability to adapt to specific sociocultural nuances and regional cuisines, such as those in Central Asia (for example, Kazakhstan), still requires further investigation.</div></div><div><h3>Objectives</h3><div>To evaluate and compare the effectiveness of the ChatGPT-4 system in providing personalized, evidence-based nutritional recommendations in English, Kazakh, and Russian in Central Asia.</div></div><div><h3>Methods</h3><div>This study was conducted from 15 May to 31 August, 2023. On the basis of 50 mock patient profiles, ChatGPT-4 generated dietary advice, and responses were evaluated for personalization, consistency, and practicality using a 5-point Likert scale. To identify significant differences between the 3 languages, the Kruskal–Wallis test was conducted. Additional pairwise comparisons for each language were carried out using the post hoc Dunn's test.</div></div><div><h3>Results</h3><div>ChatGPT-4 showed a moderate level of performance in each category for English and Russian languages, whereas in Kazakh language, outputs were unsuitable for evaluation. The scores for English, Russian, and Kazakh were as follows: for personalization, 3.32 ± 0.46, 3.18 ± 0.38, and 1.01 ± 0.06; for consistency, 3.48 ± 0.43, 3.38 ± 0.39, and 1.09 ± 0.18; and for practicality, 3.25 ± 0.41, 3.37 ± 0.38, and 1.07 ± 0.15, respectively. The Kruskal–Wallis test indicated statistically significant differences in ChatGPT-4's performance across the 3 languages (<em>P <</em> 0.001). Subsequent post hoc analysis using Dunn’s test showed that the performance in both English and Russian was significantly different from that in Kazakh.</div></div><div><h3>Conclusions</h3><div>Our findings show that, despite using identical prompts across 3 distinct languages, the ChatGPT-4's capability to produce sensible outputs is limited by the lack of training data in non-English languages. Thus, a customized large language model should be developed to perform better in underrepresented languages and to take into account specific local diets and practices.</div></div>\",\"PeriodicalId\":16620,\"journal\":{\"name\":\"Journal of Nutrition\",\"volume\":\"155 3\",\"pages\":\"Pages 729-735\"},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2025-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Nutrition\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0022316624012458\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"NUTRITION & DIETETICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Nutrition","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0022316624012458","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"NUTRITION & DIETETICS","Score":null,"Total":0}

引用次数: 0

摘要

背景：虽然像ChatGPT-4这样的大型语言模型已经证明了英语的能力，但它们在使用代表性不足的语言的少数群体中的表现，以及它们适应特定社会文化细微差别和地区美食的能力，如中亚（如哈萨克斯坦），仍需要进一步研究。目的：评估和比较ChatGPT-4系统在中亚地区以英语、哈萨克语和俄语提供个性化、循证营养建议的有效性。方法：本研究于2023年5月15日至8月31日进行。基于50例模拟患者资料，ChatGPT-4生成饮食建议，并使用5分李克特量表评估反馈的个性化、一致性和实用性。为了确定三种语言之间的显著差异，进行了Kruskal Wallis测试。使用事后邓恩测验对每种语言进行了额外的两两比较。结果：ChatGPT-4在英语和俄语的每个类别中表现中等水平，而哈萨克语的输出不适合评估。英语、俄语、哈萨克语单项得分分别为3.32±0.46、3.18±0.38、1.01±0.06；一致性为3.48±0.43、3.38±0.39、1.09±0.18；实用性方面分别为3.25±0.41、3.37±0.38、1.07±0.15。Kruskal-Wallis测试表明，在三种语言中，ChatGPT-4的表现在统计上存在显著差异(p结论：我们的研究结果表明，尽管在三种不同的语言中使用相同的提示，ChatGPT-4产生合理输出的能力受到缺乏非英语语言训练数据的限制。因此，应该开发定制的LLM，以便在代表性不足的语言中表现更好，并考虑到特定的当地饮食和实践。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Evaluating ChatGPT’s Multilingual Performance in Clinical Nutrition Advice Using Synthetic Medical Text: Insights from Central Asia

Background

Although large language models like ChatGPT-4 have demonstrated competency in English, their performance for minority groups speaking underrepresented languages, as well as their ability to adapt to specific sociocultural nuances and regional cuisines, such as those in Central Asia (for example, Kazakhstan), still requires further investigation.

Objectives

To evaluate and compare the effectiveness of the ChatGPT-4 system in providing personalized, evidence-based nutritional recommendations in English, Kazakh, and Russian in Central Asia.

Methods

This study was conducted from 15 May to 31 August, 2023. On the basis of 50 mock patient profiles, ChatGPT-4 generated dietary advice, and responses were evaluated for personalization, consistency, and practicality using a 5-point Likert scale. To identify significant differences between the 3 languages, the Kruskal–Wallis test was conducted. Additional pairwise comparisons for each language were carried out using the post hoc Dunn's test.

Results

ChatGPT-4 showed a moderate level of performance in each category for English and Russian languages, whereas in Kazakh language, outputs were unsuitable for evaluation. The scores for English, Russian, and Kazakh were as follows: for personalization, 3.32 ± 0.46, 3.18 ± 0.38, and 1.01 ± 0.06; for consistency, 3.48 ± 0.43, 3.38 ± 0.39, and 1.09 ± 0.18; and for practicality, 3.25 ± 0.41, 3.37 ± 0.38, and 1.07 ± 0.15, respectively. The Kruskal–Wallis test indicated statistically significant differences in ChatGPT-4's performance across the 3 languages (P < 0.001). Subsequent post hoc analysis using Dunn’s test showed that the performance in both English and Russian was significantly different from that in Kazakh.

Conclusions

Our findings show that, despite using identical prompts across 3 distinct languages, the ChatGPT-4's capability to produce sensible outputs is limited by the lack of training data in non-English languages. Thus, a customized large language model should be developed to perform better in underrepresented languages and to take into account specific local diets and practices.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Nutrition 医学-营养学

CiteScore

7.60

自引率

4.80%

发文量

260

审稿时长

39 days

期刊介绍： The Journal of Nutrition (JN/J Nutr) publishes peer-reviewed original research papers covering all aspects of experimental nutrition in humans and other animal species; special articles such as reviews and biographies of prominent nutrition scientists; and issues, opinions, and commentaries on controversial issues in nutrition. Supplements are frequently published to provide extended discussion of topics of special interest.