{"title":"ChatGPT in Iranian medical licensing examination: evaluating the diagnostic accuracy and decision-making capabilities of an AI-based model","authors":"Manoochehr Ebrahimian, Behdad Behnam, Negin Ghayebi, Elham Sobhrakhshankhah","doi":"10.1136/bmjhci-2023-100815","DOIUrl":"https://doi.org/10.1136/bmjhci-2023-100815","url":null,"abstract":"Introduction Large language models such as ChatGPT have gained popularity for their ability to generate comprehensive responses to human queries. In the field of medicine, ChatGPT has shown promise in applications ranging from diagnostics to decision-making. However, its performance in medical examinations and its comparison to random guessing have not been extensively studied. Methods This study aimed to evaluate the performance of ChatGPT in the preinternship examination, a comprehensive medical assessment for students in Iran. The examination consisted of 200 multiple-choice questions categorised into basic science evaluation, diagnosis and decision-making. GPT-4 was used, and the questions were translated to English. A statistical analysis was conducted to assess the performance of ChatGPT and also compare it with a random test group. Results The results showed that ChatGPT performed exceptionally well, with 68.5% of the questions answered correctly, significantly surpassing the pass mark of 45%. It exhibited superior performance in decision-making and successfully passed all specialties. Comparing ChatGPT to the random test group, ChatGPT’s performance was significantly higher, demonstrating its ability to provide more accurate responses and reasoning. Conclusion This study highlights the potential of ChatGPT in medical licensing examinations and its advantage over random guessing. However, it is important to note that ChatGPT still falls short of human physicians in terms of diagnostic accuracy and decision-making capabilities. Caution should be exercised when using ChatGPT, and its results should be verified by human experts to ensure patient safety and avoid potential errors in the medical field. Data are available on reasonable request.","PeriodicalId":9050,"journal":{"name":"BMJ Health & Care Informatics","volume":"102 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138576006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Seungwon Lee, Elliot A Martin, Jie Pan, Cathy A Eastwood, Danielle A Southern, David J T Campbell, Abdel Aziz Shaheen, Hude Quan, Sonia Butalia
{"title":"Exploring the reliability of inpatient EMR algorithms for diabetes identification","authors":"Seungwon Lee, Elliot A Martin, Jie Pan, Cathy A Eastwood, Danielle A Southern, David J T Campbell, Abdel Aziz Shaheen, Hude Quan, Sonia Butalia","doi":"10.1136/bmjhci-2023-100894","DOIUrl":"https://doi.org/10.1136/bmjhci-2023-100894","url":null,"abstract":"Introduction Accurate identification of medical conditions within a real-time inpatient setting is crucial for health systems. Current inpatient comorbidity algorithms rely on integrating various sources of administrative data, but at times, there is a considerable lag in obtaining and linking these data. Our study objective was to develop electronic medical records (EMR) data-based inpatient diabetes phenotyping algorithms. Materials and methods A chart review on 3040 individuals was completed, and 583 had diabetes. We linked EMR data on these individuals to the International Classification of Disease (ICD) administrative databases. The following EMR-data-based diabetes algorithms were developed: (1) laboratory data, (2) medication data, (3) laboratory and medications data, (4) diabetes concept keywords and (5) diabetes free-text algorithm. Combined algorithms used or statements between the above algorithms. Algorithm performances were measured using chart review as a gold standard. We determined the best-performing algorithm as the one that showed the high performance of sensitivity (SN), and positive predictive value (PPV). Results The algorithms tested generally performed well: ICD-coded data, SN 0.84, specificity (SP) 0.98, PPV 0.93 and negative predictive value (NPV) 0.96; medication and laboratory algorithm, SN 0.90, SP 0.95, PPV 0.80 and NPV 0.97; all document types algorithm, SN 0.95, SP 0.98, PPV 0.94 and NPV 0.99. Discussion Free-text data-based diabetes algorithm can yield comparable or superior performance to a commonly used ICD-coded algorithm and could supplement existing methods. These types of inpatient EMR-based algorithms for case identification may become a key method for timely resource planning and care delivery. Data may be obtained from a third party and are not publicly available. Restrictions apply to the availability of these data. Data were obtained from Alberta Health Services and are available with the permission of Alberta Health Services.","PeriodicalId":9050,"journal":{"name":"BMJ Health & Care Informatics","volume":"6 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138823348","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tasce Bongiovanni, Mark J Pletcher, Andrew Robinson, Elizabeth Lancaster, Li Zhang, Matthias Behrends, Elizabeth Wick, Andrew Auerbach
{"title":"Electronic health record intervention to increase use of NSAIDs as analgesia for hospitalised patients: a cluster randomised controlled study","authors":"Tasce Bongiovanni, Mark J Pletcher, Andrew Robinson, Elizabeth Lancaster, Li Zhang, Matthias Behrends, Elizabeth Wick, Andrew Auerbach","doi":"10.1136/bmjhci-2023-100842","DOIUrl":"https://doi.org/10.1136/bmjhci-2023-100842","url":null,"abstract":"Background Prescribing non-opioid pain medications, such as non-steroidal anti-inflammatory (NSAIDs) medications, has been shown to reduce pain and decrease opioid use, but it is unclear how to effectively encourage multimodal pain medication prescribing for hospitalised patients. Therefore, the aim of this study is to evaluate the effect of prechecking non-opioid pain medication orders on clinician prescribing of NSAIDs among hospitalised adults. Methods This was a cluster randomised controlled trial of adult (≥18 years) hospitalised patients admitted to three hospital sites under one quaternary hospital system in the USA from 2 March 2022 to 3 March 2023. A multimodal pain order panel was embedded in the admission order set, with NSAIDs prechecked in the intervention group. The intervention group could uncheck the NSAID order. The control group had access to the same NSAID order. The primary outcome was an increase in NSAID ordering. Secondary outcomes include NSAID administration, inpatient pain scores and opioid use and prescribing and relevant clinical harms including acute kidney injury, new gastrointestinal bleed and in-hospital death. Results Overall, 1049 clinicians were randomised. The study included 6239 patients for a total of 9595 encounters. Both NSAID ordering (36 vs 43%, p<0.001) and administering (30 vs 34%, p=0.001) by the end of the first full hospital day were higher in the intervention (prechecked) group. There was no statistically significant difference in opioid outcomes during the hospitalisation and at discharge. There was a statistically but perhaps not clinically significant difference in pain scores during both the first and last full hospital day. Conclusions This cluster randomised controlled trial showed that prechecking an order for NSAIDs to promote multimodal pain management in the admission order set increased NSAID ordering and administration, although there were no changes to pain scores or opioid use. While prechecking orders is an important way to increase adoption, safety checks should be in place. Data are available in a public, open access repository. Data is publicly available from the Centers of Medicare and Medicaid Services from the US Government.","PeriodicalId":9050,"journal":{"name":"BMJ Health & Care Informatics","volume":"17 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139067675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Electronic health card: a technological solution to promote the Chinese integrated healthcare system in the digital age","authors":"Wenjuan Tao, Tao Gu, Yujue Li, Weimin Li","doi":"10.1136/bmjhci-2023-100911","DOIUrl":"https://doi.org/10.1136/bmjhci-2023-100911","url":null,"abstract":"People-centred integrated care, with an emphasis on ensuring healthcare services are well coordinated around people’s needs,[1][1] is regarded as a global strategy towards universal health coverage.[2][2] Underutilisation of information technology and lack of interoperability are identified as the","PeriodicalId":9050,"journal":{"name":"BMJ Health & Care Informatics","volume":"6 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138714904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Role of evaluation throughout the life cycle of biomedical and health AI applications","authors":"Edward H Shortliffe","doi":"10.1136/bmjhci-2023-100925","DOIUrl":"https://doi.org/10.1136/bmjhci-2023-100925","url":null,"abstract":"In the development and evaluation of medical artificial intelligence (AI) programmes, there is a tendency to focus the work on the system’s decision-making performance. This is natural, since the typical goal is to develop software that can assist physicians or other clinicians with decision tasks","PeriodicalId":9050,"journal":{"name":"BMJ Health & Care Informatics","volume":"10 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138576284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Cognitive science in the evaluation of medical AI systems","authors":"Vimla Lodhia Patel","doi":"10.1136/bmjhci-2023-100929","DOIUrl":"https://doi.org/10.1136/bmjhci-2023-100929","url":null,"abstract":"Clinical cognition is central to a clinician’s daily tasks, such as making diagnostic and therapeutic decisions. For example, doctors rely on their memory to recall relevant facts, concepts and experiences that can help them diagnose and treat their patients. Memory is needed for clinicians to","PeriodicalId":9050,"journal":{"name":"BMJ Health & Care Informatics","volume":"17 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138684572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Joe Zhang, Jack Gallifant, Robin L Pierce, Aoife Fordham, James Teo, Leo Celi, Hutan Ashrafian
{"title":"Quantifying digital health inequality across a national healthcare system.","authors":"Joe Zhang, Jack Gallifant, Robin L Pierce, Aoife Fordham, James Teo, Leo Celi, Hutan Ashrafian","doi":"10.1136/bmjhci-2023-100809","DOIUrl":"10.1136/bmjhci-2023-100809","url":null,"abstract":"<p><strong>Objectives: </strong>Digital health inequality, observed as differential utilisation of digital tools between population groups, has not previously been quantified in the National Health Service (NHS). Deployment of universal digital health interventions, including a national smartphone app and online primary care services, allows measurement of digital inequality across a nation. We aimed to measure population factors associated with digital utilisation across 6356 primary care providers serving the population of England.</p><p><strong>Methods: </strong>We used multivariable regression to test association of population and provider characteristics (including patient demographics, socioeconomic deprivation, disease burden, prescribing burden, geography and healthcare provider resource) with activation of two independent digital services during 2021/2022.</p><p><strong>Results: </strong>We find a significant adjusted association between increased population deprivation and reduced digital utilisation across both interventions. Multivariable regression coefficients for most deprived quintiles correspond to 4.27 million patients across England where deprivation is associated with non-activation of the NHS App.</p><p><strong>Conclusion: </strong>Results are concerning for technologically driven widening of healthcare inequalities. Targeted incentive to digital is necessary to prevent digital disparity from becoming health outcomes disparity.</p>","PeriodicalId":9050,"journal":{"name":"BMJ Health & Care Informatics","volume":"30 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2023-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10680008/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138440311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sapna Trivedi, Stephen Hall, Fiona Inglis, Afzal Chaudhry
{"title":"Proof-of-concept solution to create an interoperable timeline of healthcare data.","authors":"Sapna Trivedi, Stephen Hall, Fiona Inglis, Afzal Chaudhry","doi":"10.1136/bmjhci-2023-100754","DOIUrl":"10.1136/bmjhci-2023-100754","url":null,"abstract":"<p><strong>Objectives: </strong>To overcome the barriers of interoperability by sharing simulated patient data from different electronic health records systems and presenting them in an intuitive timeline of events.</p><p><strong>Methods: </strong>The 'Patient Story' software comprising database and blockchain, PS Timeline Windows interface, PS Timeline Web interface and network relays on Azure cloud was customised for Epic and Lorenzo electonic patient record (EPR) systems used at different hospitals, using site-specific adapters.</p><p><strong>Results: </strong>Each site could view their own clinical documents and view each other's site specific, fully coded test sets of (Care Connect) medications, conditions and allergies, in an aggregated single view.</p><p><strong>Discussion: </strong>This work has shown that clinical data from different EPR systems can be successfully integrated and visualised on a single timeline, accessible by clinicians and patients.</p><p><strong>Conclusion: </strong>The Patient Story system combined the timeline visualisation with successful interoperability across healthcare settings, as well giving patients the ability to directly interact with their timeline.</p>","PeriodicalId":9050,"journal":{"name":"BMJ Health & Care Informatics","volume":"30 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10693683/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71520373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Richard Hr Roberts, Stephen R Ali, Hayley A Hutchings, Thomas D Dobbs, Iain S Whitaker
{"title":"Comparative study of ChatGPT and human evaluators on the assessment of medical literature according to recognised reporting standards.","authors":"Richard Hr Roberts, Stephen R Ali, Hayley A Hutchings, Thomas D Dobbs, Iain S Whitaker","doi":"10.1136/bmjhci-2023-100830","DOIUrl":"10.1136/bmjhci-2023-100830","url":null,"abstract":"Introduction Amid clinicians’ challenges in staying updated with medical research, artificial intelligence (AI) tools like the large language model (LLM) ChatGPT could automate appraisal of research quality, saving time and reducing bias. This study compares the proficiency of ChatGPT3 against human evaluation in scoring abstracts to determine its potential as a tool for evidence synthesis. Methods We compared ChatGPT’s scoring of implant dentistry abstracts with human evaluators using the Consolidated Standards of Reporting Trials for Abstracts reporting standards checklist, yielding an overall compliance score (OCS). Bland-Altman analysis assessed agreement between human and AI-generated OCS percentages. Additional error analysis included mean difference of OCS subscores, Welch’s t-test and Pearson’s correlation coefficient. Results Bland-Altman analysis showed a mean difference of 4.92% (95% CI 0.62%, 0.37%) in OCS between human evaluation and ChatGPT. Error analysis displayed small mean differences in most domains, with the highest in ‘conclusion’ (0.764 (95% CI 0.186, 0.280)) and the lowest in ‘blinding’ (0.034 (95% CI 0.818, 0.895)). The strongest correlations between were in ‘harms’ (r=0.32, p<0.001) and ‘trial registration’ (r=0.34, p=0.002), whereas the weakest were in ‘intervention’ (r=0.02, p<0.001) and ‘objective’ (r=0.06, p<0.001). Conclusion LLMs like ChatGPT can help automate appraisal of medical literature, aiding in the identification of accurately reported research. Possible applications of ChatGPT include integration within medical databases for abstract evaluation. Current limitations include the token limit, restricting its usage to abstracts. As AI technology advances, future versions like GPT4 could offer more reliable, comprehensive evaluations, enhancing the identification of high-quality research and potentially improving patient outcomes.","PeriodicalId":9050,"journal":{"name":"BMJ Health & Care Informatics","volume":"30 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10583079/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41190771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Time to treat the climate and nature crisis as one indivisible global health emergency.","authors":"Chris Zielinski","doi":"10.1136/bmjhci-2023-100938","DOIUrl":"10.1136/bmjhci-2023-100938","url":null,"abstract":"","PeriodicalId":9050,"journal":{"name":"BMJ Health & Care Informatics","volume":"30 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10603532/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50160603","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}