Takashi Hisamatsu, Mari Fukuda, Minako Kinuta, Hideyuki Kanda
{"title":"ChatGPT 对《日本动脉粥样硬化学会 2022 年动脉粥样硬化性心血管疾病预防指南》中临床问题的答复。","authors":"Takashi Hisamatsu, Mari Fukuda, Minako Kinuta, Hideyuki Kanda","doi":"10.5551/jat.65240","DOIUrl":null,"url":null,"abstract":"<p><strong>Aims: </strong>Artificial intelligence is increasingly used in the medical field. We assessed the accuracy and reproducibility of responses by ChatGPT to clinical questions (CQs) in the Japan Atherosclerosis Society Guidelines for Prevention Atherosclerotic Cardiovascular Diseases 2022 (JAS Guidelines 2022).</p><p><strong>Methods: </strong>In June 2024, we assessed responses by ChatGPT (version 3.5) to CQs, including background questions (BQs) and foreground questions (FQs). Accuracy was assessed independently by three researchers using six-point Likert scales ranging from 1 (\"completely incorrect\") to 6 (\"completely correct\") by evaluating responses to CQs in Japanese or translated into English. For reproducibility assessment, responses to each CQ asked five times separately in a new chat were scored using six-point Likert scales, and Fleiss kappa coefficients were calculated.</p><p><strong>Results: </strong>The median (25th-75th percentile) score for ChatGPT's responses to BQs and FQs was 4 (3-5) and 5 (5-6) for Japanese CQs and 5 (3-6) and 6 (5-6) for English CQs, respectively. Response scores were higher for FQs than those for BQs (P values <0.001 for Japanese and English). Similar response accuracy levels were observed between Japanese and English CQs (P value 0.139 for BQs and 0.586 for FQs). Kappa coefficients for reproducibility were 0.76 for BQs and 0.90 for FQs.</p><p><strong>Conclusions: </strong>ChatGPT showed high accuracy and reproducibility in responding to JAS Guidelines 2022 CQs, especially FQs. While ChatGPT primarily reflects existing guidelines, its strength could lie in rapidly organizing and presenting relevant information, thus supporting instant and more efficient guideline interpretation and aiding in medical decision-making.</p>","PeriodicalId":15128,"journal":{"name":"Journal of atherosclerosis and thrombosis","volume":" ","pages":""},"PeriodicalIF":3.0000,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"ChatGPT Responses to Clinical Questions in the Japan Atherosclerosis Society Guidelines for Prevention of Atherosclerotic Cardiovascular Disease 2022.\",\"authors\":\"Takashi Hisamatsu, Mari Fukuda, Minako Kinuta, Hideyuki Kanda\",\"doi\":\"10.5551/jat.65240\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Aims: </strong>Artificial intelligence is increasingly used in the medical field. We assessed the accuracy and reproducibility of responses by ChatGPT to clinical questions (CQs) in the Japan Atherosclerosis Society Guidelines for Prevention Atherosclerotic Cardiovascular Diseases 2022 (JAS Guidelines 2022).</p><p><strong>Methods: </strong>In June 2024, we assessed responses by ChatGPT (version 3.5) to CQs, including background questions (BQs) and foreground questions (FQs). Accuracy was assessed independently by three researchers using six-point Likert scales ranging from 1 (\\\"completely incorrect\\\") to 6 (\\\"completely correct\\\") by evaluating responses to CQs in Japanese or translated into English. For reproducibility assessment, responses to each CQ asked five times separately in a new chat were scored using six-point Likert scales, and Fleiss kappa coefficients were calculated.</p><p><strong>Results: </strong>The median (25th-75th percentile) score for ChatGPT's responses to BQs and FQs was 4 (3-5) and 5 (5-6) for Japanese CQs and 5 (3-6) and 6 (5-6) for English CQs, respectively. Response scores were higher for FQs than those for BQs (P values <0.001 for Japanese and English). Similar response accuracy levels were observed between Japanese and English CQs (P value 0.139 for BQs and 0.586 for FQs). Kappa coefficients for reproducibility were 0.76 for BQs and 0.90 for FQs.</p><p><strong>Conclusions: </strong>ChatGPT showed high accuracy and reproducibility in responding to JAS Guidelines 2022 CQs, especially FQs. While ChatGPT primarily reflects existing guidelines, its strength could lie in rapidly organizing and presenting relevant information, thus supporting instant and more efficient guideline interpretation and aiding in medical decision-making.</p>\",\"PeriodicalId\":15128,\"journal\":{\"name\":\"Journal of atherosclerosis and thrombosis\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":3.0000,\"publicationDate\":\"2024-10-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of atherosclerosis and thrombosis\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.5551/jat.65240\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"PERIPHERAL VASCULAR DISEASE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of atherosclerosis and thrombosis","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.5551/jat.65240","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"PERIPHERAL VASCULAR DISEASE","Score":null,"Total":0}
ChatGPT Responses to Clinical Questions in the Japan Atherosclerosis Society Guidelines for Prevention of Atherosclerotic Cardiovascular Disease 2022.
Aims: Artificial intelligence is increasingly used in the medical field. We assessed the accuracy and reproducibility of responses by ChatGPT to clinical questions (CQs) in the Japan Atherosclerosis Society Guidelines for Prevention Atherosclerotic Cardiovascular Diseases 2022 (JAS Guidelines 2022).
Methods: In June 2024, we assessed responses by ChatGPT (version 3.5) to CQs, including background questions (BQs) and foreground questions (FQs). Accuracy was assessed independently by three researchers using six-point Likert scales ranging from 1 ("completely incorrect") to 6 ("completely correct") by evaluating responses to CQs in Japanese or translated into English. For reproducibility assessment, responses to each CQ asked five times separately in a new chat were scored using six-point Likert scales, and Fleiss kappa coefficients were calculated.
Results: The median (25th-75th percentile) score for ChatGPT's responses to BQs and FQs was 4 (3-5) and 5 (5-6) for Japanese CQs and 5 (3-6) and 6 (5-6) for English CQs, respectively. Response scores were higher for FQs than those for BQs (P values <0.001 for Japanese and English). Similar response accuracy levels were observed between Japanese and English CQs (P value 0.139 for BQs and 0.586 for FQs). Kappa coefficients for reproducibility were 0.76 for BQs and 0.90 for FQs.
Conclusions: ChatGPT showed high accuracy and reproducibility in responding to JAS Guidelines 2022 CQs, especially FQs. While ChatGPT primarily reflects existing guidelines, its strength could lie in rapidly organizing and presenting relevant information, thus supporting instant and more efficient guideline interpretation and aiding in medical decision-making.