英语和日语 ChatGPT 对麻醉相关医疗问题回答的比较研究

BJA open Pub Date : 2024-06-01 DOI:10.1016/j.bjao.2024.100296

Kazuo Ando , Masaki Sato , Shin Wakatsuki , Ryotaro Nagai , Kumiko Chino , Hinata Kai , Tomomi Sasaki , Rie Kato , Teresa Phuongtram Nguyen , Nan Guo , Pervez Sultan

{"title":"英语和日语 ChatGPT 对麻醉相关医疗问题回答的比较研究","authors":"Kazuo Ando , Masaki Sato , Shin Wakatsuki , Ryotaro Nagai , Kumiko Chino , Hinata Kai , Tomomi Sasaki , Rie Kato , Teresa Phuongtram Nguyen , Nan Guo , Pervez Sultan","doi":"10.1016/j.bjao.2024.100296","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><p>The expansion of artificial intelligence (AI) within large language models (LLMs) has the potential to streamline healthcare delivery. Despite the increased use of LLMs, disparities in their performance particularly in different languages, remain underexplored. This study examines the quality of ChatGPT responses in English and Japanese, specifically to questions related to anaesthesiology.</p></div><div><h3>Methods</h3><p>Anaesthesiologists proficient in both languages were recruited as experts in this study. Ten frequently asked questions in anaesthesia were selected and translated for evaluation. Three non-sequential responses from ChatGPT were assessed for content quality (accuracy, comprehensiveness, and safety) and communication quality (understanding, empathy/tone, and ethics) by expert evaluators.</p></div><div><h3>Results</h3><p>Eight anaesthesiologists evaluated English and Japanese LLM responses. The overall quality for all questions combined was higher in English compared with Japanese responses. Content and communication quality were significantly higher in English compared with Japanese LLMs responses (both <em>P</em><0.001) in all three responses. Comprehensiveness, safety, and understanding were higher scores in English LLM responses. In all three responses, more than half of the evaluators marked overall English responses as better than Japanese responses.</p></div><div><h3>Conclusions</h3><p>English LLM responses to anaesthesia-related frequently asked questions were superior in quality to Japanese responses when assessed by bilingual anaesthesia experts in this report. This study highlights the potential for language-related disparities in healthcare information and the need to improve the quality of AI responses in underrepresented languages. Future studies are needed to explore these disparities in other commonly spoken languages and to compare the performance of different LLMs.</p></div>","PeriodicalId":72418,"journal":{"name":"BJA open","volume":"10 ","pages":"Article 100296"},"PeriodicalIF":0.0000,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2772609624000406/pdfft?md5=17018b4a959c51babd6313efa948146d&pid=1-s2.0-S2772609624000406-main.pdf","citationCount":"0","resultStr":"{\"title\":\"A comparative study of English and Japanese ChatGPT responses to anaesthesia-related medical questions\",\"authors\":\"Kazuo Ando , Masaki Sato , Shin Wakatsuki , Ryotaro Nagai , Kumiko Chino , Hinata Kai , Tomomi Sasaki , Rie Kato , Teresa Phuongtram Nguyen , Nan Guo , Pervez Sultan\",\"doi\":\"10.1016/j.bjao.2024.100296\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Background</h3><p>The expansion of artificial intelligence (AI) within large language models (LLMs) has the potential to streamline healthcare delivery. Despite the increased use of LLMs, disparities in their performance particularly in different languages, remain underexplored. This study examines the quality of ChatGPT responses in English and Japanese, specifically to questions related to anaesthesiology.</p></div><div><h3>Methods</h3><p>Anaesthesiologists proficient in both languages were recruited as experts in this study. Ten frequently asked questions in anaesthesia were selected and translated for evaluation. Three non-sequential responses from ChatGPT were assessed for content quality (accuracy, comprehensiveness, and safety) and communication quality (understanding, empathy/tone, and ethics) by expert evaluators.</p></div><div><h3>Results</h3><p>Eight anaesthesiologists evaluated English and Japanese LLM responses. The overall quality for all questions combined was higher in English compared with Japanese responses. Content and communication quality were significantly higher in English compared with Japanese LLMs responses (both <em>P</em><0.001) in all three responses. Comprehensiveness, safety, and understanding were higher scores in English LLM responses. In all three responses, more than half of the evaluators marked overall English responses as better than Japanese responses.</p></div><div><h3>Conclusions</h3><p>English LLM responses to anaesthesia-related frequently asked questions were superior in quality to Japanese responses when assessed by bilingual anaesthesia experts in this report. This study highlights the potential for language-related disparities in healthcare information and the need to improve the quality of AI responses in underrepresented languages. Future studies are needed to explore these disparities in other commonly spoken languages and to compare the performance of different LLMs.</p></div>\",\"PeriodicalId\":72418,\"journal\":{\"name\":\"BJA open\",\"volume\":\"10 \",\"pages\":\"Article 100296\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S2772609624000406/pdfft?md5=17018b4a959c51babd6313efa948146d&pid=1-s2.0-S2772609624000406-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BJA open\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2772609624000406\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BJA open","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2772609624000406","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

背景人工智能（AI）在大型语言模型（LLM）中的扩展有可能简化医疗保健服务。尽管 LLM 的使用越来越多，但其性能差异，尤其是在不同语言中的性能差异仍未得到充分探索。本研究探讨了英语和日语 ChatGPT 回答的质量，特别是与麻醉学相关的问题。我们选择了 10 个麻醉学方面的常见问题并进行了翻译评估。专家评估员对 ChatGPT 的三个非连续回答进行了内容质量（准确性、全面性和安全性）和交流质量（理解力、共鸣/语气和道德）评估。与日语回答相比，英语回答所有问题的总体质量更高。在所有三个问题的回答中，与日语回答相比，英语回答的内容和交流质量明显更高（均为 P<0.001）。在全面性、安全性和理解力方面，英语语言学硕士的回答得分更高。在所有三个回答中，半数以上的评估者认为英语回答的整体质量优于日语回答。结论在本报告中，由双语麻醉专家评估的麻醉相关常见问题的英语 LLM 回答的质量优于日语回答。本研究强调了医疗保健信息中可能存在的与语言相关的差异，以及提高代表性不足语言的人工智能回答质量的必要性。未来的研究还需要探索其他常用语言中的这些差异，并比较不同 LLM 的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A comparative study of English and Japanese ChatGPT responses to anaesthesia-related medical questions

Background

The expansion of artificial intelligence (AI) within large language models (LLMs) has the potential to streamline healthcare delivery. Despite the increased use of LLMs, disparities in their performance particularly in different languages, remain underexplored. This study examines the quality of ChatGPT responses in English and Japanese, specifically to questions related to anaesthesiology.

Methods

Anaesthesiologists proficient in both languages were recruited as experts in this study. Ten frequently asked questions in anaesthesia were selected and translated for evaluation. Three non-sequential responses from ChatGPT were assessed for content quality (accuracy, comprehensiveness, and safety) and communication quality (understanding, empathy/tone, and ethics) by expert evaluators.

Results

Eight anaesthesiologists evaluated English and Japanese LLM responses. The overall quality for all questions combined was higher in English compared with Japanese responses. Content and communication quality were significantly higher in English compared with Japanese LLMs responses (both P<0.001) in all three responses. Comprehensiveness, safety, and understanding were higher scores in English LLM responses. In all three responses, more than half of the evaluators marked overall English responses as better than Japanese responses.

Conclusions

English LLM responses to anaesthesia-related frequently asked questions were superior in quality to Japanese responses when assessed by bilingual anaesthesia experts in this report. This study highlights the potential for language-related disparities in healthcare information and the need to improve the quality of AI responses in underrepresented languages. Future studies are needed to explore these disparities in other commonly spoken languages and to compare the performance of different LLMs.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

BJA open Anesthesiology and Pain Medicine

CiteScore

0.60

自引率

0.00%

发文量

审稿时长

83 days