{"title":"提高GPT-4中复杂提示的眼科疾病的诊断准确性:全球和中低收入国家(LMIC)特异性病理的比较分析","authors":"Shona Alex Tapiwa M'gadzah, Andrew O'Malley","doi":"10.2196/64986","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>The global incidence of blindness has continued to increase, despite the enactment of a Global Eye Health Action Plan by the World Health Assembly. This can be attributed, in part, to an aging population, but also to the limited diagnostic resources within low- and middle-income countries (LMICs). The advent of generative artificial intelligence (AI) within health care could pose a novel solution to combating the prevalence of blindness globally.</p><p><strong>Objective: </strong>The objectives of this study are to quantify the effect the addition of a complex prompt has on the diagnostic accuracy of a commercially available LLM, and to assess whether such LLMs are better or worse at diagnosing conditions that are more prevalent in LMICs.</p><p><strong>Methods: </strong>Ten clinical vignettes representing globally and LMIC-prevalent ophthalmological conditions were presented to GPT-4-0125-preview using simple and complex prompts. Diagnostic performance metrics, including sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV), were calculated. Statistical comparison between prompts was conducted using a chi-square test of independence.</p><p><strong>Results: </strong>The complex prompt achieved a higher diagnostic accuracy (90.1%) compared to the simple prompt (60.4%), with a statistically significant difference (χ2=428.86; P<.001). Sensitivity, specificity, PPV, and NPV were consistently improved for most conditions with the complex prompt. The simple prompt struggled with LMIC-prevalent conditions, diagnosing only 1 of 5 accurately, while the complex prompt successfully diagnosed 4 of 5.</p><p><strong>Conclusions: </strong>The study established that overall, the inclusion of a complex prompt positively affected the diagnostic accuracy of GPT-4-0125-preview, particularly for LMIC-prevalent conditions. This highlights the potential for LLMs, when appropriately tailored, to support clinicians in diverse health care settings. Future research should explore the generalizability of these findings across other models and specialties.</p>","PeriodicalId":14841,"journal":{"name":"JMIR Formative Research","volume":"9 ","pages":"e64986"},"PeriodicalIF":2.0000,"publicationDate":"2025-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12261798/pdf/","citationCount":"0","resultStr":"{\"title\":\"Enhancing Diagnostic Accuracy of Ophthalmological Conditions With Complex Prompts in GPT-4: Comparative Analysis of Global and Low- and Middle-Income Country (LMIC)-Specific Pathologies.\",\"authors\":\"Shona Alex Tapiwa M'gadzah, Andrew O'Malley\",\"doi\":\"10.2196/64986\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>The global incidence of blindness has continued to increase, despite the enactment of a Global Eye Health Action Plan by the World Health Assembly. This can be attributed, in part, to an aging population, but also to the limited diagnostic resources within low- and middle-income countries (LMICs). The advent of generative artificial intelligence (AI) within health care could pose a novel solution to combating the prevalence of blindness globally.</p><p><strong>Objective: </strong>The objectives of this study are to quantify the effect the addition of a complex prompt has on the diagnostic accuracy of a commercially available LLM, and to assess whether such LLMs are better or worse at diagnosing conditions that are more prevalent in LMICs.</p><p><strong>Methods: </strong>Ten clinical vignettes representing globally and LMIC-prevalent ophthalmological conditions were presented to GPT-4-0125-preview using simple and complex prompts. Diagnostic performance metrics, including sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV), were calculated. Statistical comparison between prompts was conducted using a chi-square test of independence.</p><p><strong>Results: </strong>The complex prompt achieved a higher diagnostic accuracy (90.1%) compared to the simple prompt (60.4%), with a statistically significant difference (χ2=428.86; P<.001). Sensitivity, specificity, PPV, and NPV were consistently improved for most conditions with the complex prompt. The simple prompt struggled with LMIC-prevalent conditions, diagnosing only 1 of 5 accurately, while the complex prompt successfully diagnosed 4 of 5.</p><p><strong>Conclusions: </strong>The study established that overall, the inclusion of a complex prompt positively affected the diagnostic accuracy of GPT-4-0125-preview, particularly for LMIC-prevalent conditions. This highlights the potential for LLMs, when appropriately tailored, to support clinicians in diverse health care settings. Future research should explore the generalizability of these findings across other models and specialties.</p>\",\"PeriodicalId\":14841,\"journal\":{\"name\":\"JMIR Formative Research\",\"volume\":\"9 \",\"pages\":\"e64986\"},\"PeriodicalIF\":2.0000,\"publicationDate\":\"2025-06-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12261798/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"JMIR Formative Research\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2196/64986\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"HEALTH CARE SCIENCES & SERVICES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR Formative Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2196/64986","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
Enhancing Diagnostic Accuracy of Ophthalmological Conditions With Complex Prompts in GPT-4: Comparative Analysis of Global and Low- and Middle-Income Country (LMIC)-Specific Pathologies.
Background: The global incidence of blindness has continued to increase, despite the enactment of a Global Eye Health Action Plan by the World Health Assembly. This can be attributed, in part, to an aging population, but also to the limited diagnostic resources within low- and middle-income countries (LMICs). The advent of generative artificial intelligence (AI) within health care could pose a novel solution to combating the prevalence of blindness globally.
Objective: The objectives of this study are to quantify the effect the addition of a complex prompt has on the diagnostic accuracy of a commercially available LLM, and to assess whether such LLMs are better or worse at diagnosing conditions that are more prevalent in LMICs.
Methods: Ten clinical vignettes representing globally and LMIC-prevalent ophthalmological conditions were presented to GPT-4-0125-preview using simple and complex prompts. Diagnostic performance metrics, including sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV), were calculated. Statistical comparison between prompts was conducted using a chi-square test of independence.
Results: The complex prompt achieved a higher diagnostic accuracy (90.1%) compared to the simple prompt (60.4%), with a statistically significant difference (χ2=428.86; P<.001). Sensitivity, specificity, PPV, and NPV were consistently improved for most conditions with the complex prompt. The simple prompt struggled with LMIC-prevalent conditions, diagnosing only 1 of 5 accurately, while the complex prompt successfully diagnosed 4 of 5.
Conclusions: The study established that overall, the inclusion of a complex prompt positively affected the diagnostic accuracy of GPT-4-0125-preview, particularly for LMIC-prevalent conditions. This highlights the potential for LLMs, when appropriately tailored, to support clinicians in diverse health care settings. Future research should explore the generalizability of these findings across other models and specialties.