{"title":"人工智能在睡眠医学中的应用:ChatGPT-4的诊断精度评估。","authors":"Anshum Patel, Joseph Cheung","doi":"10.5664/jcsm.11732","DOIUrl":null,"url":null,"abstract":"<p><strong>Study objectives: </strong>Large language models such as ChatGPT-4 are emerging in medicine, including sleep medicine, where artificial intelligence is used to analyze sleep data and predict treatment outcomes. Effectiveness of large language models in accurately diagnosing sleep disorders based on clinical history has not yet been studied. This study evaluates ChatGPT-4's diagnostic performance using clinical vignettes.</p><p><strong>Methods: </strong>Nineteen clinical vignettes containing patient history, physical examination findings, and diagnostic tests from the <i>Case Book of Sleep Medicine</i> (third edition, 2019, American Academy of Sleep Medicine) were presented to ChatGPT-4. Its differential and final diagnoses were compared to reference diagnoses, with accuracy assessed by (1) the percentage of correct differentials and (2) a 3-tier scoring system (no match, partial match, full match) for final diagnoses.</p><p><strong>Results: </strong>The mean accuracy for differential diagnoses was 63.27% ± 15.61% (standard deviation), ranging from 33.33-100%. The mean number of artificial intelligence-generated differential diagnoses matching the American Academy of Sleep Medicine case differential diagnoses was 2.79 ± 0.71 (standard deviation). For final diagnoses, ChatGPT-4 scored a total of 30 out of a possible 38, resulting in an overall accuracy of 78.95%. The model achieved a mean score of 1.58 ± 0.61 (standard deviation) out of 2, with 68.42% of cases achieving a full match. Performance was higher in cases with fewer differential diagnoses, whereas accuracy decreased in complex cases.</p><p><strong>Conclusions: </strong>ChatGPT-4 demonstrates promising diagnostic potential in sleep medicine, with moderate to high accuracy in identifying differential and final diagnoses, although its variability in more complex cases calls for refinement and clinical validation.</p><p><strong>Citation: </strong>Patel A, Cheung J. Artificial intelligence in sleep medicine: assessing the diagnostic precision of ChatGPT-4. <i>J Clin Sleep Med.</i> 2025;21(9):1511-1517.</p>","PeriodicalId":50233,"journal":{"name":"Journal of Clinical Sleep Medicine","volume":" ","pages":"1511-1517"},"PeriodicalIF":2.9000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12406831/pdf/","citationCount":"0","resultStr":"{\"title\":\"Artificial intelligence in sleep medicine: assessing the diagnostic precision of ChatGPT-4.\",\"authors\":\"Anshum Patel, Joseph Cheung\",\"doi\":\"10.5664/jcsm.11732\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Study objectives: </strong>Large language models such as ChatGPT-4 are emerging in medicine, including sleep medicine, where artificial intelligence is used to analyze sleep data and predict treatment outcomes. Effectiveness of large language models in accurately diagnosing sleep disorders based on clinical history has not yet been studied. This study evaluates ChatGPT-4's diagnostic performance using clinical vignettes.</p><p><strong>Methods: </strong>Nineteen clinical vignettes containing patient history, physical examination findings, and diagnostic tests from the <i>Case Book of Sleep Medicine</i> (third edition, 2019, American Academy of Sleep Medicine) were presented to ChatGPT-4. Its differential and final diagnoses were compared to reference diagnoses, with accuracy assessed by (1) the percentage of correct differentials and (2) a 3-tier scoring system (no match, partial match, full match) for final diagnoses.</p><p><strong>Results: </strong>The mean accuracy for differential diagnoses was 63.27% ± 15.61% (standard deviation), ranging from 33.33-100%. The mean number of artificial intelligence-generated differential diagnoses matching the American Academy of Sleep Medicine case differential diagnoses was 2.79 ± 0.71 (standard deviation). For final diagnoses, ChatGPT-4 scored a total of 30 out of a possible 38, resulting in an overall accuracy of 78.95%. The model achieved a mean score of 1.58 ± 0.61 (standard deviation) out of 2, with 68.42% of cases achieving a full match. Performance was higher in cases with fewer differential diagnoses, whereas accuracy decreased in complex cases.</p><p><strong>Conclusions: </strong>ChatGPT-4 demonstrates promising diagnostic potential in sleep medicine, with moderate to high accuracy in identifying differential and final diagnoses, although its variability in more complex cases calls for refinement and clinical validation.</p><p><strong>Citation: </strong>Patel A, Cheung J. Artificial intelligence in sleep medicine: assessing the diagnostic precision of ChatGPT-4. <i>J Clin Sleep Med.</i> 2025;21(9):1511-1517.</p>\",\"PeriodicalId\":50233,\"journal\":{\"name\":\"Journal of Clinical Sleep Medicine\",\"volume\":\" \",\"pages\":\"1511-1517\"},\"PeriodicalIF\":2.9000,\"publicationDate\":\"2025-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12406831/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Clinical Sleep Medicine\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.5664/jcsm.11732\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"CLINICAL NEUROLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Clinical Sleep Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.5664/jcsm.11732","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CLINICAL NEUROLOGY","Score":null,"Total":0}
Artificial intelligence in sleep medicine: assessing the diagnostic precision of ChatGPT-4.
Study objectives: Large language models such as ChatGPT-4 are emerging in medicine, including sleep medicine, where artificial intelligence is used to analyze sleep data and predict treatment outcomes. Effectiveness of large language models in accurately diagnosing sleep disorders based on clinical history has not yet been studied. This study evaluates ChatGPT-4's diagnostic performance using clinical vignettes.
Methods: Nineteen clinical vignettes containing patient history, physical examination findings, and diagnostic tests from the Case Book of Sleep Medicine (third edition, 2019, American Academy of Sleep Medicine) were presented to ChatGPT-4. Its differential and final diagnoses were compared to reference diagnoses, with accuracy assessed by (1) the percentage of correct differentials and (2) a 3-tier scoring system (no match, partial match, full match) for final diagnoses.
Results: The mean accuracy for differential diagnoses was 63.27% ± 15.61% (standard deviation), ranging from 33.33-100%. The mean number of artificial intelligence-generated differential diagnoses matching the American Academy of Sleep Medicine case differential diagnoses was 2.79 ± 0.71 (standard deviation). For final diagnoses, ChatGPT-4 scored a total of 30 out of a possible 38, resulting in an overall accuracy of 78.95%. The model achieved a mean score of 1.58 ± 0.61 (standard deviation) out of 2, with 68.42% of cases achieving a full match. Performance was higher in cases with fewer differential diagnoses, whereas accuracy decreased in complex cases.
Conclusions: ChatGPT-4 demonstrates promising diagnostic potential in sleep medicine, with moderate to high accuracy in identifying differential and final diagnoses, although its variability in more complex cases calls for refinement and clinical validation.
Citation: Patel A, Cheung J. Artificial intelligence in sleep medicine: assessing the diagnostic precision of ChatGPT-4. J Clin Sleep Med. 2025;21(9):1511-1517.
期刊介绍:
Journal of Clinical Sleep Medicine focuses on clinical sleep medicine. Its emphasis is publication of papers with direct applicability and/or relevance to the clinical practice of sleep medicine. This includes clinical trials, clinical reviews, clinical commentary and debate, medical economic/practice perspectives, case series and novel/interesting case reports. In addition, the journal will publish proceedings from conferences, workshops and symposia sponsored by the American Academy of Sleep Medicine or other organizations related to improving the practice of sleep medicine.