Banu Arslan, Mehmet Necmeddin Sutasir, Ertugrul Altinbilek
{"title":"微软副驾驶在肺栓塞诊断过程中的表现。","authors":"Banu Arslan, Mehmet Necmeddin Sutasir, Ertugrul Altinbilek","doi":"10.5811/westjem.24995","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>Patients with pulmonary embolism (PE) often present with non-specific signs and symptoms mimicking other conditions and complicating diagnosis. In this study we aimed to evaluate the performance of an artificial-intelligence tool, Microsoft Copilot, in the diagnostic process of PE, using clinical data including demographics, complaints, and vital signs.</p><p><strong>Methods: </strong>We conducted this study using 140 clinical vignettes, including 70 patients with and 70 patients without PE. The vignettes were derived from published case reports within the last 10 years. We used Copilot for its free GPT-4 integration to analyze clinical data and answer two questions after each vignette. We compared Copilot's ability to identify PE within the top 10 differential diagnoses, and its ability to predict the risk of PE when compared to the use of the Wells score by two independent investigators.</p><p><strong>Results: </strong>Copilot correctly included PE in the differential diagnosis in 94.3% of cases by listing it within the top 10 conditions. Risk assessment by Copilot yielded significantly higher levels in patients with PE (P<.05). No statistically significant difference was found in the Wells scores between patients with PE and without PE (P>.05). Copilot demonstrated better discriminatory power than the Wells score in risk assessment of PE (area under the curve 0.713 vs 0.583), with statistical significance (P<0.001 vs P=.091). Sensitivity, specificity, positive predictive value, and negative predictive value for discriminating between the combination of low- and intermediate- vs high-risk categories were 34%, 97.1%, 92.3%, and 59.6%, respectively.</p><p><strong>Conclusion: </strong>This study explores the potential of Copilot as a tool in clinical decision-making, demonstrating a high rate of correctly identifying PE and improved performance over the Wells score. However, further validation in larger populations and real-world settings is crucial to fully realize its potential.</p>","PeriodicalId":23682,"journal":{"name":"Western Journal of Emergency Medicine","volume":"26 4","pages":"1030-1039"},"PeriodicalIF":2.0000,"publicationDate":"2025-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12342421/pdf/","citationCount":"0","resultStr":"{\"title\":\"Performance of Microsoft Copilot in the Diagnostic Process of Pulmonary Embolism.\",\"authors\":\"Banu Arslan, Mehmet Necmeddin Sutasir, Ertugrul Altinbilek\",\"doi\":\"10.5811/westjem.24995\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Introduction: </strong>Patients with pulmonary embolism (PE) often present with non-specific signs and symptoms mimicking other conditions and complicating diagnosis. In this study we aimed to evaluate the performance of an artificial-intelligence tool, Microsoft Copilot, in the diagnostic process of PE, using clinical data including demographics, complaints, and vital signs.</p><p><strong>Methods: </strong>We conducted this study using 140 clinical vignettes, including 70 patients with and 70 patients without PE. The vignettes were derived from published case reports within the last 10 years. We used Copilot for its free GPT-4 integration to analyze clinical data and answer two questions after each vignette. We compared Copilot's ability to identify PE within the top 10 differential diagnoses, and its ability to predict the risk of PE when compared to the use of the Wells score by two independent investigators.</p><p><strong>Results: </strong>Copilot correctly included PE in the differential diagnosis in 94.3% of cases by listing it within the top 10 conditions. Risk assessment by Copilot yielded significantly higher levels in patients with PE (P<.05). No statistically significant difference was found in the Wells scores between patients with PE and without PE (P>.05). Copilot demonstrated better discriminatory power than the Wells score in risk assessment of PE (area under the curve 0.713 vs 0.583), with statistical significance (P<0.001 vs P=.091). Sensitivity, specificity, positive predictive value, and negative predictive value for discriminating between the combination of low- and intermediate- vs high-risk categories were 34%, 97.1%, 92.3%, and 59.6%, respectively.</p><p><strong>Conclusion: </strong>This study explores the potential of Copilot as a tool in clinical decision-making, demonstrating a high rate of correctly identifying PE and improved performance over the Wells score. However, further validation in larger populations and real-world settings is crucial to fully realize its potential.</p>\",\"PeriodicalId\":23682,\"journal\":{\"name\":\"Western Journal of Emergency Medicine\",\"volume\":\"26 4\",\"pages\":\"1030-1039\"},\"PeriodicalIF\":2.0000,\"publicationDate\":\"2025-07-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12342421/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Western Journal of Emergency Medicine\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.5811/westjem.24995\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"EMERGENCY MEDICINE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Western Journal of Emergency Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.5811/westjem.24995","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"EMERGENCY MEDICINE","Score":null,"Total":0}
引用次数: 0
摘要
肺栓塞(PE)患者通常表现为非特异性体征和症状,与其他疾病相似,并使诊断复杂化。在这项研究中,我们旨在评估人工智能工具微软Copilot在PE诊断过程中的表现,使用包括人口统计、投诉和生命体征在内的临床数据。方法:我们对140名临床受试者进行了这项研究,其中包括70名PE患者和70名非PE患者。这些小插图来源于过去10年发表的病例报告。我们使用免费的GPT-4集成的Copilot来分析临床数据,并在每个小插曲之后回答两个问题。我们比较了Copilot在前10种鉴别诊断中识别PE的能力,以及它预测PE风险的能力,并将其与两位独立研究者使用的Wells评分进行了比较。结果:Copilot将PE列入前10项诊断条件,94.3%的病例正确纳入PE鉴别诊断。Copilot风险评估显示PE患者的水平明显更高(p < 0.05)。Copilot在PE风险评估中表现出比Wells评分更好的判别能力(曲线下面积0.713 vs 0.583),差异有统计学意义(p)。结论:本研究探讨了Copilot作为临床决策工具的潜力,显示出较高的PE识别正确率,并且优于Wells评分。然而,在更大的人群和现实环境中进一步验证对于充分发挥其潜力至关重要。
Performance of Microsoft Copilot in the Diagnostic Process of Pulmonary Embolism.
Introduction: Patients with pulmonary embolism (PE) often present with non-specific signs and symptoms mimicking other conditions and complicating diagnosis. In this study we aimed to evaluate the performance of an artificial-intelligence tool, Microsoft Copilot, in the diagnostic process of PE, using clinical data including demographics, complaints, and vital signs.
Methods: We conducted this study using 140 clinical vignettes, including 70 patients with and 70 patients without PE. The vignettes were derived from published case reports within the last 10 years. We used Copilot for its free GPT-4 integration to analyze clinical data and answer two questions after each vignette. We compared Copilot's ability to identify PE within the top 10 differential diagnoses, and its ability to predict the risk of PE when compared to the use of the Wells score by two independent investigators.
Results: Copilot correctly included PE in the differential diagnosis in 94.3% of cases by listing it within the top 10 conditions. Risk assessment by Copilot yielded significantly higher levels in patients with PE (P<.05). No statistically significant difference was found in the Wells scores between patients with PE and without PE (P>.05). Copilot demonstrated better discriminatory power than the Wells score in risk assessment of PE (area under the curve 0.713 vs 0.583), with statistical significance (P<0.001 vs P=.091). Sensitivity, specificity, positive predictive value, and negative predictive value for discriminating between the combination of low- and intermediate- vs high-risk categories were 34%, 97.1%, 92.3%, and 59.6%, respectively.
Conclusion: This study explores the potential of Copilot as a tool in clinical decision-making, demonstrating a high rate of correctly identifying PE and improved performance over the Wells score. However, further validation in larger populations and real-world settings is crucial to fully realize its potential.
期刊介绍:
WestJEM focuses on how the systems and delivery of emergency care affects health, health disparities, and health outcomes in communities and populations worldwide, including the impact of social conditions on the composition of patients seeking care in emergency departments.