JMIR AI最新文献

筛选
英文 中文
Supervised Natural Language Processing Classification of Violent Death Narratives: Development and Assessment of a Compact Large Language Model. 暴力死亡叙事的监督自然语言处理分类:一个紧凑的大语言模型的发展和评估。
JMIR AI Pub Date : 2025-06-19 DOI: 10.2196/68212
Susan T Parker
{"title":"Supervised Natural Language Processing Classification of Violent Death Narratives: Development and Assessment of a Compact Large Language Model.","authors":"Susan T Parker","doi":"10.2196/68212","DOIUrl":"10.2196/68212","url":null,"abstract":"<p><strong>Background: </strong>The recent availability of law enforcement and coroner or medical examiner reports for nearly every violent death in the United States expands the potential for natural language processing (NLP) research into violence.</p><p><strong>Objective: </strong>The objective of this work is to assess applications of supervised NLP to unstructured data in the National Violent Death Reporting System to predict circumstances and types of violent death.</p><p><strong>Methods: </strong>This analysis applied distilBERT, a compact large language model (LLM) with fewer parameters relative to full-scale LLMs, to unstructured narrative data to simulate the impacts of preprocessing, volume, and composition of training data on model performance, evaluated by F1-scores, precision, recall, and the false negative rate. Model performance was evaluated for bias by race, ethnicity, and sex by comparing F1-scores across subgroups.</p><p><strong>Results: </strong>A minimum training set of 1500 cases was necessary to achieve an F1-score of 0.6 and a false negative rate of 0.01-0.05 with a compact LLM. Replacement of domain-specific jargon improved model performance, while oversampling positive class cases to address class imbalance did not substantially improve F1-scores. Between racial and ethnic groups, F1-score disparities ranged from 0.2 to 0.25, and between male and female decedents, differences ranged from 0.12 to 0.2.</p><p><strong>Conclusions: </strong>Compact LLMs with sufficient training data can be applied to supervised NLP tasks with a class imbalance in the National Violent Death Reporting System. Simulations of supervised text classification across the model-fitting process of preprocessing and training compact LLM-informed NLP applications to unstructured death narrative data.</p>","PeriodicalId":73551,"journal":{"name":"JMIR AI","volume":"4 ","pages":"e68212"},"PeriodicalIF":0.0,"publicationDate":"2025-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12223685/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144556022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AI-Powered Drug Classification and Indication Mapping for Pharmacoepidemiologic Studies: Prompt Development and Validation. 用于药物流行病学研究的人工智能药物分类和适应症定位:快速发展和验证。
JMIR AI Pub Date : 2025-06-12 DOI: 10.2196/65481
Benjamin Ogorek, Thomas Rhoads, Eric Finkelman, Isaac R Rodriguez-Chavez
{"title":"AI-Powered Drug Classification and Indication Mapping for Pharmacoepidemiologic Studies: Prompt Development and Validation.","authors":"Benjamin Ogorek, Thomas Rhoads, Eric Finkelman, Isaac R Rodriguez-Chavez","doi":"10.2196/65481","DOIUrl":"10.2196/65481","url":null,"abstract":"&lt;p&gt;&lt;strong&gt;Background: &lt;/strong&gt;Pharmacoepidemiologic studies, which promote rational drug use and improve health outcomes, often require Anatomical Therapeutic Chemical Classification System (ATC) drug classification within real-world data (RWD) sources. Existing classification tools are expensive, brittle, or have restrictive terms of service, and lack context that may inform classification itself.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Objective: &lt;/strong&gt;This study sought to establish large language models (LLMs) as an assisting technology in the drug classification task. This included developing artificial intelligence prompts that reason about drugs using RWD and showing that the resulting accuracy, efficiency, and effectiveness are favorable to alternative methods.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Methods: &lt;/strong&gt;A prompt was constructed to classify aspirin as either an analgesic or antithrombotic and evaluated within 12,294 anonymized daily dose strings from a polychronic population residing in the United States and Canada. The patients used a smart medication dispenser called \"spencer\" and consented to the use of their data for research. The LLM prompt requested that the best and next-best second-level ATC code be returned, and grading was performed on a 3-point scale. After success in a pilot sample of 20, an inference sample of 200 was taken without replacement. Finite population inference was carried out on the proportion of outputs receiving 1 of the top 2 grades. As a benchmark, Google's Programmable Search Engine was used to query the drug name plus \"ATC code\" followed by regex-based extraction of ATC codes. All imperfect results were reviewed.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Results: &lt;/strong&gt;The population consisted of 12,294 daily dose strings from 86.26% (2908/3371) patients residing in Canada and 13.73% (463/3371) residing in the United States. A prompt using the chain-of-thought reasoning was able to distinguish between aspirin's analgesic versus antithrombotic therapeutic uses and performed well in the pilot sample. In the inferential sample, 87.5% (175/200) were graded as perfect, 5% (10/200) had a minor issue, and 7.5% (15/200) had a major issue. The estimate of the proportion of at least mostly correct classification was 92.5% (185/200, 80% CI 90.1%-94.9%). For the search-based algorithm, 82.5% (165/200) were deemed acceptable. The chain-of-thought reasoning was most helpful with supplements (eg, folic acid) when high doses indicated antianemic preparations. The problem formulation of daily dose inputs and multiple ATC outputs was sometimes incompatible with the drug (eg, pregabalin, calcitriol, and methotrexate).&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Conclusions: &lt;/strong&gt;GPT-4o offers cost-effective drug classification from RWD without violating any terms of service. Using a chain-of-thought prompting technique, GPT-4o can reason about drug dosages that affect the class. The wide accessibility of LLMs gives every research team the ability to classify drugs at scale, a key prerequisite of pharmacoepidem","PeriodicalId":73551,"journal":{"name":"JMIR AI","volume":"4 ","pages":"e65481"},"PeriodicalIF":0.0,"publicationDate":"2025-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12203024/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144287457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
High-Throughput Phenotyping of the Symptoms of Alzheimer Disease and Related Dementias Using Large Language Models: Cross-Sectional Study. 使用大语言模型的阿尔茨海默病和相关痴呆症状的高通量表型:横断面研究
JMIR AI Pub Date : 2025-06-03 DOI: 10.2196/66926
You Cheng, Mrunal Malekar, Yingnan He, Apoorva Bommareddy, Colin Magdamo, Arjun Singh, Brandon Westover, Shibani S Mukerji, John Dickson, Sudeshna Das
{"title":"High-Throughput Phenotyping of the Symptoms of Alzheimer Disease and Related Dementias Using Large Language Models: Cross-Sectional Study.","authors":"You Cheng, Mrunal Malekar, Yingnan He, Apoorva Bommareddy, Colin Magdamo, Arjun Singh, Brandon Westover, Shibani S Mukerji, John Dickson, Sudeshna Das","doi":"10.2196/66926","DOIUrl":"10.2196/66926","url":null,"abstract":"<p><strong>Background: </strong>Alzheimer disease and related dementias (ADRD) are complex disorders with overlapping symptoms and pathologies. Comprehensive records of symptoms in electronic health records (EHRs) are critical for not only reaching an accurate diagnosis but also supporting ongoing research studies and clinical trials. However, these symptoms are frequently obscured within unstructured clinical notes in EHRs, making manual extraction both time-consuming and labor-intensive.</p><p><strong>Objective: </strong>We aimed to automate symptom extraction from the clinical notes of patients with ADRD using fine-tuned large language models (LLMs), compare its performance to regular expression-based symptom recognition, and validate the results using brain magnetic resonance imaging (MRI) data.</p><p><strong>Methods: </strong>We fine-tuned LLMs to extract ADRD symptoms across the following 7 domains: memory, executive function, motor, language, visuospatial, neuropsychiatric, and sleep. We assessed the algorithm's performance by calculating the area under the receiver operating characteristic curve (AUROC) for each domain. The extracted symptoms were then validated in two analyses: (1) predicting ADRD diagnosis using the counts of extracted symptoms and (2) examining the association between ADRD symptoms and MRI-derived brain volumes.</p><p><strong>Results: </strong>Symptom extraction across the 7 domains achieved high accuracy with AUROCs ranging from 0.97 to 0.99. Using the counts of extracted symptoms to predict ADRD diagnosis yielded an AUROC of 0.83 (95% CI 0.77-0.89). Symptom associations with brain volumes revealed that a smaller hippocampal volume was linked to memory impairments (odds ratio 0.62, 95% CI 0.46-0.84; P=.006), and reduced pallidum size was associated with motor impairments (odds ratio 0.73, 95% CI 0.58-0.90; P=.04).</p><p><strong>Conclusions: </strong>These results highlight the accuracy and reliability of our high-throughput ADRD phenotyping algorithm. By enabling automated symptom extraction, our approach has the potential to assist with differential diagnosis, as well as facilitate clinical trials and research studies of dementia.</p>","PeriodicalId":73551,"journal":{"name":"JMIR AI","volume":"4 ","pages":"e66926"},"PeriodicalIF":0.0,"publicationDate":"2025-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12174885/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144217726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Algorithmic Classification of Psychiatric Disorder-Related Spontaneous Communication Using Large Language Model Embeddings: Algorithm Development and Validation. 使用大语言模型嵌入的精神疾病相关自发交流的算法分类:算法开发和验证。
JMIR AI Pub Date : 2025-05-30 DOI: 10.2196/67369
Ryan Allen Shewcraft, John Schwarz, Mariann Micsinai Balan
{"title":"Algorithmic Classification of Psychiatric Disorder-Related Spontaneous Communication Using Large Language Model Embeddings: Algorithm Development and Validation.","authors":"Ryan Allen Shewcraft, John Schwarz, Mariann Micsinai Balan","doi":"10.2196/67369","DOIUrl":"10.2196/67369","url":null,"abstract":"&lt;p&gt;&lt;strong&gt;Background: &lt;/strong&gt;Language, which is a crucial element of human communication, is influenced by the complex interplay between thoughts, emotions, and experiences. Psychiatric disorders have an impact on cognitive and emotional processes, which in turn affect the content and way individuals with these disorders communicate using language. The recent rapid advancements in large language models (LLMs) suggest that leveraging them for quantitative analysis of language usage has the potential to become a useful method for providing objective measures in diagnosing and monitoring psychiatric conditions by analyzing language patterns.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Objective: &lt;/strong&gt;This study aims to explore the use of LLMs in analyzing spontaneous communication to differentiate between various psychiatric disorders. We seek to show that the latent LLM embedding space identifies distinct linguistic markers that can be used to classify spontaneous communication from 7 different psychiatric disorders.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Methods: &lt;/strong&gt;We used embeddings from the 7 billion parameter Generative Representational Instruction Tuning Language Model to analyze more than 37,000 posts from subreddits dedicated to seven common conditions: schizophrenia, borderline personality disorder (BPD), depression, attention-deficit/hyperactivity disorder (ADHD), anxiety, posttraumatic stress disorder (PTSD) and bipolar disorder. A cross-validated multiclass Extreme Gradient Boosting classifier was trained on these embeddings to predict the origin subreddit for each post. Performance was evaluated using metrics such as precision, recall, F1-score, and area under the receiver operating characteristic curve (AUC). In addition, we used Uniform Manifold Approximation and Projection dimensionality reduction to visualize relationships in language between these psychiatric disorders.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Results: &lt;/strong&gt;The 10-fold cross-validated Extreme Gradient Boosting classifier achieved a support-weighted average precision, recall, F1, and accuracy score of 0.73, 0.73, 0.73, and 0.73, respectively. In one-versus-rest tasks, individual category AUCs ranged from 0.89 to 0.97, with a microaverage AUC of 0.95. ADHD posts were classified with the highest AUC of 0.97, indicating distinct linguistic features, while BPD posts had the lowest AUC of 0.89, suggesting greater linguistic overlap with other conditions. Consistent with the classifier results, the ADHD posts have a more visually distinct cluster in the Uniform Manifold Approximation and Projection projects, while BPD overlaps with depression, anxiety, and schizophrenia. Comparisons with other state-of-the-art embedding methods, such as OpenAI's text-embedding-3-small (AUC=0.94) and sentence-bidirectional encoder representations from transformers (AUC=0.86), demonstrated superior performance of the Generative Representational Instruction Tuning Language Model-7B model.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Conclusions: &lt;/strong&gt;This study introduces an innov","PeriodicalId":73551,"journal":{"name":"JMIR AI","volume":"4 ","pages":"e67369"},"PeriodicalIF":0.0,"publicationDate":"2025-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12223684/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144556091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Performance of 3 Conversational Generative Artificial Intelligence Models for Computing Maximum Safe Doses of Local Anesthetics: Comparative Analysis. 计算局麻药最大安全剂量的3种会话生成人工智能模型的性能:比较分析。
JMIR AI Pub Date : 2025-05-13 DOI: 10.2196/66796
Mélanie Suppan, Pietro Elias Fubini, Alexandra Stefani, Mia Gisselbaek, Caroline Flora Samer, Georges Louis Savoldelli
{"title":"Performance of 3 Conversational Generative Artificial Intelligence Models for Computing Maximum Safe Doses of Local Anesthetics: Comparative Analysis.","authors":"Mélanie Suppan, Pietro Elias Fubini, Alexandra Stefani, Mia Gisselbaek, Caroline Flora Samer, Georges Louis Savoldelli","doi":"10.2196/66796","DOIUrl":"10.2196/66796","url":null,"abstract":"<p><strong>Background: </strong>Generative artificial intelligence (AI) is showing great promise as a tool to optimize decision-making across various fields, including medicine. In anesthesiology, accurately calculating maximum safe doses of local anesthetics (LAs) is crucial to prevent complications such as local anesthetic systemic toxicity (LAST). Current methods for determining LA dosage are largely based on empirical guidelines and clinician experience, which can result in significant variability and dosing errors. AI models may offer a solution, by processing multiple parameters simultaneously to suggest adequate LA doses.</p><p><strong>Objective: </strong>This study aimed to evaluate the efficacy and safety of 3 generative AI models, ChatGPT (OpenAI), Copilot (Microsoft Corporation), and Gemini (Google LLC), in calculating maximum safe LA doses, with the goal of determining their potential use in clinical practice.</p><p><strong>Methods: </strong>A comparative analysis was conducted using a 51-item questionnaire designed to assess LA dose calculation across 10 simulated clinical vignettes. The responses generated by ChatGPT, Copilot, and Gemini were compared with reference doses calculated using a scientifically validated set of rules. Quantitative evaluations involved comparing AI-generated doses to these reference doses, while qualitative assessments were conducted by independent reviewers using a 5-point Likert scale.</p><p><strong>Results: </strong>All 3 AI models (Gemini, ChatGPT, and Copilot) completed the questionnaire and generated responses aligned with LA dose calculation principles, but their performance in providing safe doses varied significantly. Gemini frequently avoided proposing any specific dose, instead recommending consultation with a specialist. When it did provide dose ranges, they often exceeded safe limits by 140% (SD 103%) in cases involving mixtures. ChatGPT provided unsafe doses in 90% (9/10) of cases, exceeding safe limits by 198% (SD 196%). Copilot's recommendations were unsafe in 67% (6/9) of cases, exceeding limits by 217% (SD 239%). Qualitative assessments rated Gemini as \"fair\" and both ChatGPT and Copilot as \"poor.\"</p><p><strong>Conclusions: </strong>Generative AI models like Gemini, ChatGPT, and Copilot currently lack the accuracy and reliability needed for safe LA dose calculation. Their poor performance suggests that they should not be used as decision-making tools for this purpose. Until more reliable AI-driven solutions are developed and validated, clinicians should rely on their expertise, experience, and a careful assessment of individual patient factors to guide LA dosing and ensure patient safety.</p>","PeriodicalId":73551,"journal":{"name":"JMIR AI","volume":"4 ","pages":"e66796"},"PeriodicalIF":0.0,"publicationDate":"2025-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12223683/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144556012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Elastic Electronic Health Record: A Five-Tiered Framework for Applying Artificial Intelligence to Electronic Health Record Maintenance, Configuration, and Use. 弹性电子健康记录:将人工智能应用于电子健康记录维护、配置和使用的五层框架。
JMIR AI Pub Date : 2025-05-09 DOI: 10.2196/66741
Colby Uptegraft, Kameron Collin Black, Jonathan Gale, Andrew Marshall, Shuhan He
{"title":"The Elastic Electronic Health Record: A Five-Tiered Framework for Applying Artificial Intelligence to Electronic Health Record Maintenance, Configuration, and Use.","authors":"Colby Uptegraft, Kameron Collin Black, Jonathan Gale, Andrew Marshall, Shuhan He","doi":"10.2196/66741","DOIUrl":"10.2196/66741","url":null,"abstract":"<p><strong>Unlabelled: </strong>Properly configuring modern electronic health records (EHRs) has become increasingly challenging for human operators, failing to fully meet the efficiency and cost-saving potential seen with the digitization of other sectors. The integration of artificial intelligence (AI) offers a promising solution, particularly through a comprehensive governance approach that moves beyond front-end enhancements such as user- and patient-facing copilots. These copilots, although useful, are limited by the underlying EHR configuration, leading to inefficiencies and high maintenance costs. To address this, we propose the concept of an \"Elastic EHR,\" which proactively suggests and validates optimal content and configuration changes, significantly reducing governance costs and enhancing user experience, as well as reducing many of the common frustrations including the documentation burden, alert fatigue, system responsiveness, outdated content, and unintuitive design. Our five-tiered model details a structured approach to AI integration within EHRs. Tier I focuses on autonomous database reconfiguration, akin to Oracle Autonomous Database functionalities, to ensure continuous system improvements without direct edits to the production environment. Tier II empowers EHR clients to shape system performance according to predefined strategies and standards, ensuring coordinated and efficient EHR solution builds. Tier III optimizes EHR choice architecture by analyzing user behaviors and suggesting content and configuration changes that minimize clicks and keystrokes, thereby enhancing workflow efficiency. Tier IV maintains the currency of EHR clinical content and decision support by linking content and configuration to updated guidelines and literature, ensuring the EHR remains evidence-based and compliant with evolving standards. Finally, Tier V incorporates context-dependent AI copilots to enhance care efficiency, quality, and user experience. Despite the potential benefits, major limitations exist. The market dominance of a few major EHR vendors-Epic Systems, Oracle Health, and MEDITECH-poses a challenge, as any enhancements require their cooperation and financial motivation. Furthermore, the diverse and complex nature of health care environments demands a flexible yet robust AI system that can adapt to various institutional needs that has not yet been developed, researched, or tested. The Elastic EHR model proposes a five-tiered framework for optimizing EHR systems and user experience with AI. By overcoming the identified limitations through vendor-led, collaborative efforts, AI-enabled EHRs could improve the efficiency, quality, and user experience of health care delivery, fully delivering on the promises of digitization within health care.</p>","PeriodicalId":73551,"journal":{"name":"JMIR AI","volume":"4 ","pages":"e66741"},"PeriodicalIF":0.0,"publicationDate":"2025-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12223678/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144556024","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Comparative Performance of Medical Students, ChatGPT-3.5 and ChatGPT-4.0 in Answering Questions From a Brazilian National Medical Exam: Cross-Sectional Questionnaire Study. 医学生ChatGPT-3.5和ChatGPT-4.0在巴西国家医学考试中回答问题的比较表现:横断面问卷研究
JMIR AI Pub Date : 2025-05-08 DOI: 10.2196/66552
Mateus Rodrigues Alessi, Heitor Augusto Gomes, Gabriel Oliveira, Matheus Lopes de Castro, Fabiano Grenteski, Leticia Miyashiro, Camila do Valle, Leticia Tozzini Tavares da Silva, Cristina Okamoto
{"title":"Comparative Performance of Medical Students, ChatGPT-3.5 and ChatGPT-4.0 in Answering Questions From a Brazilian National Medical Exam: Cross-Sectional Questionnaire Study.","authors":"Mateus Rodrigues Alessi, Heitor Augusto Gomes, Gabriel Oliveira, Matheus Lopes de Castro, Fabiano Grenteski, Leticia Miyashiro, Camila do Valle, Leticia Tozzini Tavares da Silva, Cristina Okamoto","doi":"10.2196/66552","DOIUrl":"10.2196/66552","url":null,"abstract":"<p><strong>Background: </strong>Artificial intelligence has advanced significantly in various fields, including medicine, where tools like ChatGPT (GPT) have demonstrated remarkable capabilities in interpreting and synthesizing complex medical data. Since its launch in 2019, GPT has evolved, with version 4.0 offering enhanced processing power, image interpretation, and more accurate responses. In medicine, GPT has been used for diagnosis, research, and education, achieving significant milestones like passing the United States Medical Licensing Examination. Recent studies show that GPT 4.0 outperforms earlier versions and even medical students on medical exams.</p><p><strong>Objective: </strong>This study aimed to evaluate and compare the performance of GPT versions 3.5 and 4.0 on Brazilian Progress Tests (PT) from 2021 to 2023, analyzing their accuracy compared to medical students.</p><p><strong>Methods: </strong>A cross-sectional observational study was conducted using 333 multiple-choice questions from the PT, excluding questions with images and those nullified or repeated. All questions were presented sequentially without modification to their structure. The performance of GPT versions was compared using statistical methods and medical students' scores were included for context.</p><p><strong>Results: </strong>There was a statistically significant difference in total performance scores across the 2021, 2022, and 2023 exams between GPT-3.5 and GPT-4.0 (P=.03). However, this significance did not remain after Bonferroni correction. On average, GPT v3.5 scored 68.4%, whereas v4.0 achieved 87.2%, reflecting an absolute improvement of 18.8% and a relative increase of 27.4% in accuracy. When broken down by subject, the average scores for GPT-3.5 and GPT-4.0, respectively, were as follows: surgery (73.5% vs 88.0%, P=.03), basic sciences (77.5% vs 96.2%, P=.004), internal medicine (61.5% vs 75.1%, P=.14), gynecology and obstetrics (64.5% vs 94.8%, P=.002), pediatrics (58.5% vs 80.0%, P=.02), and public health (77.8% vs 89.6%, P=.02). After Bonferroni correction, only basic sciences and gynecology and obstetrics retained statistically significant differences.</p><p><strong>Conclusions: </strong>GPT-4.0 demonstrates superior accuracy compared to its predecessor in answering medical questions on the PT. These results are similar to other studies, indicating that we are approaching a new revolution in medicine.</p>","PeriodicalId":73551,"journal":{"name":"JMIR AI","volume":"4 ","pages":"e66552"},"PeriodicalIF":0.0,"publicationDate":"2025-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12223693/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144556093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Correction: Striking a Balance: Innovation, Equity, and Consistency in AI Health Technologies. 修正:平衡:人工智能医疗技术的创新、公平和一致性。
JMIR AI Pub Date : 2025-05-07 DOI: 10.2196/76234
Eric Perakslis, Kimberly Nolen, Ethan Fricklas, Tracy Tubb
{"title":"Correction: Striking a Balance: Innovation, Equity, and Consistency in AI Health Technologies.","authors":"Eric Perakslis, Kimberly Nolen, Ethan Fricklas, Tracy Tubb","doi":"10.2196/76234","DOIUrl":"10.2196/76234","url":null,"abstract":"","PeriodicalId":73551,"journal":{"name":"JMIR AI","volume":"4 ","pages":"e76234"},"PeriodicalIF":0.0,"publicationDate":"2025-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12223679/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144556095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring Patient Participation in AI-Supported Health Care: Qualitative Study. 探索患者参与人工智能支持的医疗保健:定性研究。
JMIR AI Pub Date : 2025-05-05 DOI: 10.2196/50781
Laura Arbelaez Ossa, Michael Rost, Nathalie Bont, Giorgia Lorenzini, David Shaw, Bernice Simone Elger
{"title":"Exploring Patient Participation in AI-Supported Health Care: Qualitative Study.","authors":"Laura Arbelaez Ossa, Michael Rost, Nathalie Bont, Giorgia Lorenzini, David Shaw, Bernice Simone Elger","doi":"10.2196/50781","DOIUrl":"10.2196/50781","url":null,"abstract":"<p><strong>Background: </strong>The introduction of artificial intelligence (AI) into health care has sparked discussions about its potential impact. Patients, as key stakeholders, will be at the forefront of interacting with and being impacted by AI. Given the ethical importance of patient-centered health care, patients must navigate how they engage with AI. However, integrating AI into clinical practice brings potential challenges, particularly in shared decision-making and ensuring patients remain active participants in their care. Whether AI-supported interventions empower or undermine patient participation depends largely on how these technologies are envisioned and integrated into practice.</p><p><strong>Objective: </strong>This study explores how patients and medical AI professionals perceive the patient's role and the factors shaping participation in AI-supported care.</p><p><strong>Methods: </strong>We conducted qualitative semistructured interviews with 21 patients and 21 medical AI professionals from different disciplinary backgrounds. Data were analyzed using reflexive thematic analysis. We identified 3 themes to describe how patients and professionals describe factors that shape participation in AI-supported care.</p><p><strong>Results: </strong>The first theme explored the vision of AI as an unavoidable and potentially harmful force of change in health care. The second theme highlights how patients perceive limitations in their capabilities that may prevent them from meaningfully participating in AI-supported care. The third theme describes patients' adaptive responses, such as relying on experts or making value judgments leading to acceptance or rejection of AI-supported care.</p><p><strong>Conclusions: </strong>Both external and internal preconceptions influence how patients and medical AI professionals perceive patient participation. Patients often internalize AI's complexity and inevitability as an obstacle to their active participation, leading them to feel they have little influence over its development. While some patients rely on doctors or see AI as something to accept or reject, these strategies risk placing them in a disempowering role as passive recipients of care. Without adequate education on their rights and possibilities, these responses may not be enough to position patients at the center of their care.</p>","PeriodicalId":73551,"journal":{"name":"JMIR AI","volume":"4 ","pages":"e50781"},"PeriodicalIF":0.0,"publicationDate":"2025-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12089863/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144061256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Clinical Laboratory Parameter-Driven Machine Learning for Participant Selection in Bioequivalence Studies Among Patients With Gastric Cancer: Framework Development and Validation Study. 用于胃癌患者生物等效性研究参与者选择的临床实验室参数驱动机器学习:框架开发和验证研究。
JMIR AI Pub Date : 2025-05-05 DOI: 10.2196/64845
Byungeun Shon, Sook Jin Seong, Eun Jung Choi, Mi-Ri Gwon, Hae Won Lee, Jaechan Park, Ho-Young Chung, Sungmoon Jeong, Young-Ran Yoon
{"title":"Clinical Laboratory Parameter-Driven Machine Learning for Participant Selection in Bioequivalence Studies Among Patients With Gastric Cancer: Framework Development and Validation Study.","authors":"Byungeun Shon, Sook Jin Seong, Eun Jung Choi, Mi-Ri Gwon, Hae Won Lee, Jaechan Park, Ho-Young Chung, Sungmoon Jeong, Young-Ran Yoon","doi":"10.2196/64845","DOIUrl":"10.2196/64845","url":null,"abstract":"<p><strong>Background: </strong>Insufficient participant enrollment is a major factor responsible for clinical trial failure.</p><p><strong>Objective: </strong>We formulated a machine learning (ML)-based framework using clinical laboratory parameters to identify participants eligible for enrollment in a bioequivalence study.</p><p><strong>Methods: </strong>We acquired records of 11,592 patients with gastric cancer from the electronic medical records of Kyungpook National University Hospital in Korea. The ML model was developed using 8 clinical laboratory parameters, including complete blood count and liver and kidney function tests, along with the dates of acquisition. Two datasets were collected: (1) a training dataset to design an ML-based candidate selection method and (2) a test dataset to evaluate the performance of the proposed method. The generalization performance of the ML-based method was confirmed using the F1-score and the area under the curve (AUC). The proposed model was compared with a random selection method to evaluate its efficacy in recruiting participants.</p><p><strong>Results: </strong>The weighted ensemble model achieved strong performance with an F1-score above 0.8 and an AUC value exceeding 0.8, demonstrating its ability to accurately identify valid clinical trial candidates while minimizing misclassification. Its high sensitivity further enhanced the model's efficiency in prioritizing patients for screening. In a case study, the proposed ML model reduced the workload by 57%, efficiently identifying 150 valid patients from a pool of 209, compared to the 485 patients required by random selection.</p><p><strong>Conclusions: </strong>The proposed ML-based framework using clinical laboratory parameters can be used to identify patients eligible for a clinical trial, enabling faster participant enrollment.</p>","PeriodicalId":73551,"journal":{"name":"JMIR AI","volume":"4 ","pages":"e64845"},"PeriodicalIF":0.0,"publicationDate":"2025-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12223687/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144556092","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信