{"title":"使用合成医学文本评估ChatGPT在临床营养咨询中的多语言表现:来自中亚的见解。","authors":"Gulnoza Adilmetova , Ruslan Nassyrov , Aizhan Meyerbekova , Aknur Karabay , Huseyin Atakan Varol , Mei-Yen Chan","doi":"10.1016/j.tjnut.2024.12.018","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>Although large language models like ChatGPT-4 have demonstrated competency in English, their performance for minority groups speaking underrepresented languages, as well as their ability to adapt to specific sociocultural nuances and regional cuisines, such as those in Central Asia (for example, Kazakhstan), still requires further investigation.</div></div><div><h3>Objectives</h3><div>To evaluate and compare the effectiveness of the ChatGPT-4 system in providing personalized, evidence-based nutritional recommendations in English, Kazakh, and Russian in Central Asia.</div></div><div><h3>Methods</h3><div>This study was conducted from 15 May to 31 August, 2023. On the basis of 50 mock patient profiles, ChatGPT-4 generated dietary advice, and responses were evaluated for personalization, consistency, and practicality using a 5-point Likert scale. To identify significant differences between the 3 languages, the Kruskal–Wallis test was conducted. Additional pairwise comparisons for each language were carried out using the post hoc Dunn's test.</div></div><div><h3>Results</h3><div>ChatGPT-4 showed a moderate level of performance in each category for English and Russian languages, whereas in Kazakh language, outputs were unsuitable for evaluation. The scores for English, Russian, and Kazakh were as follows: for personalization, 3.32 ± 0.46, 3.18 ± 0.38, and 1.01 ± 0.06; for consistency, 3.48 ± 0.43, 3.38 ± 0.39, and 1.09 ± 0.18; and for practicality, 3.25 ± 0.41, 3.37 ± 0.38, and 1.07 ± 0.15, respectively. The Kruskal–Wallis test indicated statistically significant differences in ChatGPT-4's performance across the 3 languages (<em>P <</em> 0.001). Subsequent post hoc analysis using Dunn’s test showed that the performance in both English and Russian was significantly different from that in Kazakh.</div></div><div><h3>Conclusions</h3><div>Our findings show that, despite using identical prompts across 3 distinct languages, the ChatGPT-4's capability to produce sensible outputs is limited by the lack of training data in non-English languages. Thus, a customized large language model should be developed to perform better in underrepresented languages and to take into account specific local diets and practices.</div></div>","PeriodicalId":16620,"journal":{"name":"Journal of Nutrition","volume":"155 3","pages":"Pages 729-735"},"PeriodicalIF":3.7000,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Evaluating ChatGPT’s Multilingual Performance in Clinical Nutrition Advice Using Synthetic Medical Text: Insights from Central Asia\",\"authors\":\"Gulnoza Adilmetova , Ruslan Nassyrov , Aizhan Meyerbekova , Aknur Karabay , Huseyin Atakan Varol , Mei-Yen Chan\",\"doi\":\"10.1016/j.tjnut.2024.12.018\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Background</h3><div>Although large language models like ChatGPT-4 have demonstrated competency in English, their performance for minority groups speaking underrepresented languages, as well as their ability to adapt to specific sociocultural nuances and regional cuisines, such as those in Central Asia (for example, Kazakhstan), still requires further investigation.</div></div><div><h3>Objectives</h3><div>To evaluate and compare the effectiveness of the ChatGPT-4 system in providing personalized, evidence-based nutritional recommendations in English, Kazakh, and Russian in Central Asia.</div></div><div><h3>Methods</h3><div>This study was conducted from 15 May to 31 August, 2023. On the basis of 50 mock patient profiles, ChatGPT-4 generated dietary advice, and responses were evaluated for personalization, consistency, and practicality using a 5-point Likert scale. To identify significant differences between the 3 languages, the Kruskal–Wallis test was conducted. Additional pairwise comparisons for each language were carried out using the post hoc Dunn's test.</div></div><div><h3>Results</h3><div>ChatGPT-4 showed a moderate level of performance in each category for English and Russian languages, whereas in Kazakh language, outputs were unsuitable for evaluation. The scores for English, Russian, and Kazakh were as follows: for personalization, 3.32 ± 0.46, 3.18 ± 0.38, and 1.01 ± 0.06; for consistency, 3.48 ± 0.43, 3.38 ± 0.39, and 1.09 ± 0.18; and for practicality, 3.25 ± 0.41, 3.37 ± 0.38, and 1.07 ± 0.15, respectively. The Kruskal–Wallis test indicated statistically significant differences in ChatGPT-4's performance across the 3 languages (<em>P <</em> 0.001). Subsequent post hoc analysis using Dunn’s test showed that the performance in both English and Russian was significantly different from that in Kazakh.</div></div><div><h3>Conclusions</h3><div>Our findings show that, despite using identical prompts across 3 distinct languages, the ChatGPT-4's capability to produce sensible outputs is limited by the lack of training data in non-English languages. Thus, a customized large language model should be developed to perform better in underrepresented languages and to take into account specific local diets and practices.</div></div>\",\"PeriodicalId\":16620,\"journal\":{\"name\":\"Journal of Nutrition\",\"volume\":\"155 3\",\"pages\":\"Pages 729-735\"},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2025-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Nutrition\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0022316624012458\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"NUTRITION & DIETETICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Nutrition","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0022316624012458","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"NUTRITION & DIETETICS","Score":null,"Total":0}
Evaluating ChatGPT’s Multilingual Performance in Clinical Nutrition Advice Using Synthetic Medical Text: Insights from Central Asia
Background
Although large language models like ChatGPT-4 have demonstrated competency in English, their performance for minority groups speaking underrepresented languages, as well as their ability to adapt to specific sociocultural nuances and regional cuisines, such as those in Central Asia (for example, Kazakhstan), still requires further investigation.
Objectives
To evaluate and compare the effectiveness of the ChatGPT-4 system in providing personalized, evidence-based nutritional recommendations in English, Kazakh, and Russian in Central Asia.
Methods
This study was conducted from 15 May to 31 August, 2023. On the basis of 50 mock patient profiles, ChatGPT-4 generated dietary advice, and responses were evaluated for personalization, consistency, and practicality using a 5-point Likert scale. To identify significant differences between the 3 languages, the Kruskal–Wallis test was conducted. Additional pairwise comparisons for each language were carried out using the post hoc Dunn's test.
Results
ChatGPT-4 showed a moderate level of performance in each category for English and Russian languages, whereas in Kazakh language, outputs were unsuitable for evaluation. The scores for English, Russian, and Kazakh were as follows: for personalization, 3.32 ± 0.46, 3.18 ± 0.38, and 1.01 ± 0.06; for consistency, 3.48 ± 0.43, 3.38 ± 0.39, and 1.09 ± 0.18; and for practicality, 3.25 ± 0.41, 3.37 ± 0.38, and 1.07 ± 0.15, respectively. The Kruskal–Wallis test indicated statistically significant differences in ChatGPT-4's performance across the 3 languages (P < 0.001). Subsequent post hoc analysis using Dunn’s test showed that the performance in both English and Russian was significantly different from that in Kazakh.
Conclusions
Our findings show that, despite using identical prompts across 3 distinct languages, the ChatGPT-4's capability to produce sensible outputs is limited by the lack of training data in non-English languages. Thus, a customized large language model should be developed to perform better in underrepresented languages and to take into account specific local diets and practices.
期刊介绍:
The Journal of Nutrition (JN/J Nutr) publishes peer-reviewed original research papers covering all aspects of experimental nutrition in humans and other animal species; special articles such as reviews and biographies of prominent nutrition scientists; and issues, opinions, and commentaries on controversial issues in nutrition. Supplements are frequently published to provide extended discussion of topics of special interest.