JMIR AI最新文献

筛选
英文 中文
Real-World Evidence Synthesis of Digital Scribes Using Ambient Listening and Generative Artificial Intelligence for Clinician Documentation Workflows: Rapid Review. 使用环境聆听和生成人工智能的临床医生文档工作流程的数字抄写员的真实世界证据合成:快速回顾。
IF 2
JMIR AI Pub Date : 2025-10-10 DOI: 10.2196/76743
Naga Sasidhar Kanaparthy, Yenny Villuendas-Rey, Tolulope Bakare, Zihan Diao, Mark Iscoe, Andrew Loza, Donald Wright, Conrad Safranek, Isaac V Faustino, Alexandria Brackett, Edward R Melnick, R Andrew Taylor
{"title":"Real-World Evidence Synthesis of Digital Scribes Using Ambient Listening and Generative Artificial Intelligence for Clinician Documentation Workflows: Rapid Review.","authors":"Naga Sasidhar Kanaparthy, Yenny Villuendas-Rey, Tolulope Bakare, Zihan Diao, Mark Iscoe, Andrew Loza, Donald Wright, Conrad Safranek, Isaac V Faustino, Alexandria Brackett, Edward R Melnick, R Andrew Taylor","doi":"10.2196/76743","DOIUrl":"https://doi.org/10.2196/76743","url":null,"abstract":"<p><strong>Background: </strong>As physicians spend up to twice as much time on electronic health record tasks as on direct patient care, digital scribes have emerged as a promising solution to restore patient-clinician communication and reduce documentation burden-making it essential to study their real-world impact on clinical workflows, efficiency, and satisfaction.</p><p><strong>Objective: </strong>This study aimed to synthesize evidence on clinician efficiency, user satisfaction, quality, and practical barriers associated with the use of digital scribes using ambient listening and generative artificial intelligence (AI) in real-world clinical settings.</p><p><strong>Methods: </strong>A rapid review was conducted to evaluate the real-world evidence of digital scribes using ambient listening and generative AI in clinical practice from 2014 to 2024. Data were collected from Ovid MEDLINE, Embase, Web of Science-Core Collection, Cochrane CENTRAL and Reviews, and PubMed Central. Predefined eligibility criteria focused on studies addressing clinical implementation, excluding those centered solely on technical development or model validation. The findings of each study were synthesized and analyzed through the QUEST human evaluation framework for quality and safety and the Systems Engineering Initiative for Patient Safety (SEIPS) 3.0 model to assess integration into clinicians' workflows and experience.</p><p><strong>Results: </strong>Of the 1450 studies identified, 6 met the inclusion criteria. These studies included an observational study, a case report, a peer-matched cohort study, and survey-based assessments conducted across academic health systems, community settings, and outpatient practices. The major themes noted were as follows: (1) they decreased self-reported documentation times, with associated increased length of notes; (2) physician burnout measured using standardized scales was unaffected, but physician engagement improved; (3) physician productivity, assessed via billing metrics, was unchanged; and (4) the studies fell short when compared to standardized frameworks.</p><p><strong>Conclusions: </strong>Digital scribes show promise in reducing documentation burden and enhancing clinician satisfaction, thereby supporting workflow efficiency. However, the currently available evidence is sparse. Future real-world, multifaceted studies are needed before AI scribes can be recommended unequivocally.</p>","PeriodicalId":73551,"journal":{"name":"JMIR AI","volume":"4 ","pages":"e76743"},"PeriodicalIF":2.0,"publicationDate":"2025-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145276742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reinforcement Learning to Prevent Acute Care Events Among Medicaid Populations: Mixed Methods Study. 强化学习预防医疗补助人群中的急性护理事件:混合方法研究。
IF 2
JMIR AI Pub Date : 2025-10-08 DOI: 10.2196/74264
Sanjay Basu, Bhairavi Muralidharan, Parth Sheth, Dan Wanek, John Morgan, Sadiq Patel
{"title":"Reinforcement Learning to Prevent Acute Care Events Among Medicaid Populations: Mixed Methods Study.","authors":"Sanjay Basu, Bhairavi Muralidharan, Parth Sheth, Dan Wanek, John Morgan, Sadiq Patel","doi":"10.2196/74264","DOIUrl":"https://doi.org/10.2196/74264","url":null,"abstract":"<p><strong>Background: </strong>Multidisciplinary care management teams must rapidly prioritize interventions for patients with complex medical and social needs. Current approaches rely on individual training, judgment, and experience, missing opportunities to learn from longitudinal trajectories and prevent adverse outcomes through recommender systems.</p><p><strong>Objective: </strong>This study aims to evaluate whether a reinforcement learning approach could outperform standard care management practices in recommending optimal interventions for patients with complex needs.</p><p><strong>Methods: </strong>Using data from 3175 Medicaid beneficiaries in care management programs across 2 states from 2023 to 2024, we compared alternative approaches for recommending \"next best step\" interventions: the standard experience-based approach (status quo) and a state-action-reward-state-action (SARSA) reinforcement learning model. We evaluated performance using clinical impact metrics, conducted counterfactual causal inference analyses to estimate reductions in acute care events, assessed fairness across demographic subgroups, and performed qualitative chart reviews where the models differed.</p><p><strong>Results: </strong>In counterfactual analyses, SARSA-guided care management reduced acute care events by 12 percentage points (95% CI 2.2-21.8 percentage points, a 20.7% relative reduction; P=.02) compared to the status quo approach, with a number needed to treat of 8.3 (95% CI 4.6-45.2) to prevent 1 acute event. The approach showed improved fairness across demographic groups, including gender (3.8% vs 5.3% disparity in acute event rates, reduction 1.5%, 95% CI 0.3%-2.7%) and race and ethnicity (5.6% vs 8.9% disparity, reduction 3.3%, 95% CI 1.1%-5.5%). In qualitative reviews, the SARSA model detected and recommended interventions for specific medical-social interactions, such as respiratory issues associated with poor housing quality or food insecurity in individuals with diabetes.</p><p><strong>Conclusions: </strong>SARSA-guided care management shows potential to reduce acute care use compared to standard practice. The approach demonstrates how reinforcement learning can improve complex decision-making in situations where patients face concurrent clinical and social factors while maintaining safety and fairness across demographic subgroups.</p>","PeriodicalId":73551,"journal":{"name":"JMIR AI","volume":"4 ","pages":"e74264"},"PeriodicalIF":2.0,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145254070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Assessing the Capability of Large Language Models for Navigation of the Australian Health Care System: Comparative Study. 评估澳洲医疗保健系统导航的大型语言模型的能力:比较研究。
IF 2
JMIR AI Pub Date : 2025-10-07 DOI: 10.2196/76203
Joshua Simmich, Megan Heather Ross, Trevor Glen Russell
{"title":"Assessing the Capability of Large Language Models for Navigation of the Australian Health Care System: Comparative Study.","authors":"Joshua Simmich, Megan Heather Ross, Trevor Glen Russell","doi":"10.2196/76203","DOIUrl":"10.2196/76203","url":null,"abstract":"<p><strong>Background: </strong>Australians can face significant challenges in navigating the health care system, especially in rural and regional areas. Generative search tools, powered by large language models (LLMs), show promise in improving health information retrieval by generating direct answers. However, concerns remain regarding their accuracy and reliability when compared to traditional search engines in a health care context.</p><p><strong>Objective: </strong>This study aimed to compare the effectiveness of a generative artificial intelligence (AI) search (ie, Microsoft Copilot) versus a conventional search engine (Google Web Search) for navigating health care information.</p><p><strong>Methods: </strong>A total of 97 adults in Queensland, Australia, participated in a web-based survey, answering scenario-based health care navigation questions using either Microsoft Copilot or Google Web Search. Accuracy was assessed using binary correct or incorrect ratings, graded correctness (incorrect, partially correct, or correct), and numerical scores (0-2 for service identification and 0-6 for criteria). Participants also completed a Technology Rating Questionnaire (TRQ) to evaluate their experience with their assigned tool.</p><p><strong>Results: </strong>Participants assigned to Microsoft Copilot outperformed the Google Web Search group on 2 health care navigation tasks (identifying aged care application services and listing mobility allowance eligibility criteria), with no clear evidence of a difference in the remaining 6 tasks. On the TRQ, participants rated Google Web Search higher in willingness to adopt and perceived impact on quality of life, and lower in effort needed to learn. Both tools received similar ratings in perceived value, confidence, help required to use, and concerns about privacy.</p><p><strong>Conclusions: </strong>Generative AI tools can achieve comparable accuracy to traditional search engines for health care navigation tasks, though this did not translate into an improved user experience. Further evaluation is needed as AI technology improves and users become more familiar with its use.</p>","PeriodicalId":73551,"journal":{"name":"JMIR AI","volume":"4 ","pages":"e76203"},"PeriodicalIF":2.0,"publicationDate":"2025-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12508777/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145253999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Developing a Tool for Identifying Clinical Risk From Free-Text Clinical Records: Natural Language Processing Study. 开发一种从自由文本临床记录中识别临床风险的工具:自然语言处理研究。
IF 2
JMIR AI Pub Date : 2025-09-22 DOI: 10.2196/64898
Natasha Biscoe, Daniel Leightley, Dominic Murphy
{"title":"Developing a Tool for Identifying Clinical Risk From Free-Text Clinical Records: Natural Language Processing Study.","authors":"Natasha Biscoe, Daniel Leightley, Dominic Murphy","doi":"10.2196/64898","DOIUrl":"10.2196/64898","url":null,"abstract":"<p><strong>Background: </strong>Electronic patient records are a valuable yet underused data source; they have been explored in research using natural language processing, but not yet within a third-sector organization.</p><p><strong>Objective: </strong>This study aimed to apply natural language processing to develop a risk identification tool capable of discerning high and low suicide risk among veterans, using electronic patient records from a United Kingdom-based veteran mental health charity.</p><p><strong>Methods: </strong>A total of 20,342 notes were extracted for this purpose. To develop the risk tool, 70% of the records formed the training dataset, while the remaining 30% were allocated for testing and evaluation. The classification framework was devised and trained to categorize risk as a binary outcome: 1 indicating high risk and 0 indicating low risk.</p><p><strong>Results: </strong>The efficacy of each classifier model was assessed by comparing its results with those from clinical risk assessments. A logistic regression classifier was found to perform best and was used to develop the final model. This comparison allowed for the calculation of the positive predictive value (mean 0.74, SD 0.059; 95% CI 0.70-0.77), negative predictive value (mean 0.73, SD 0.024; 95% CI 0.72-0.75), sensitivity (mean 0.75, SD 0.017; 95% CI 0.74-0.76), F<sub>1</sub>-score (mean 0.74, SD 0.033; 95% CI 0.72-0.76), and accuracy, which was measured using the Youden index (mean 0.73, SD 0.035; 95% CI 0.71-0.76).</p><p><strong>Conclusions: </strong>The risk identification tool successfully determined the correct risk category of veterans from a large sample of clinical notes. Future studies should investigate whether this tool can detect more nuanced differences in risk and be generalizable across data sources.</p>","PeriodicalId":73551,"journal":{"name":"JMIR AI","volume":"4 ","pages":"e64898"},"PeriodicalIF":2.0,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12501529/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145126620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Transparency in Kidney Transplant Recipient Selection Criteria: A Nationwide Analysis Using AI. 肾移植受者选择标准的透明度:一项使用人工智能的全国性分析。
IF 2
JMIR AI Pub Date : 2025-09-20 DOI: 10.2196/74066
Belen Rivera, Stalin Canizares, Gabriel Cojuc-Konigsberg, Olena Holub, Alex Nakonechnyi, Ritah R Chumdermpadetsuk, Keren Ladin, Devin E Eckhoff, Rebecca Allen, Aditya Pawar
{"title":"Transparency in Kidney Transplant Recipient Selection Criteria: A Nationwide Analysis Using AI.","authors":"Belen Rivera, Stalin Canizares, Gabriel Cojuc-Konigsberg, Olena Holub, Alex Nakonechnyi, Ritah R Chumdermpadetsuk, Keren Ladin, Devin E Eckhoff, Rebecca Allen, Aditya Pawar","doi":"10.2196/74066","DOIUrl":"https://doi.org/10.2196/74066","url":null,"abstract":"<p><strong>Background: </strong>Choosing a transplant program impacts a patient's likelihood of receiving a kidney transplant. Most patients are unaware of the factors influencing their candidacy. As patients increasingly rely on online resources for healthcare decisions, this study quantifies the available online patient-level information on kidney transplant recipient (KTR) selection criteria across U.S. transplant centers.</p><p><strong>Objective: </strong>We aimed to use a natural language processing (NLP) and a LLM to quantify the available online patient-level information regarding guideline-recommended kidney transplant recipient (KTR) selection criteria reported by U.S. transplant centers.</p><p><strong>Methods: </strong>A cross-sectional study using natural language processing and a large language model was conducted to review the U.S. kidney transplant centers websites from June to August 2024. Links were explored up to three levels deep, and information on 31 guideline-recommended KTR selection criteria was collected from each transplant center.</p><p><strong>Results: </strong>A total of 255 U.S. kidney transplant centers were analyzed, comprising 10,508 webpages and 9,113,753 words. Among the kidney transplant guideline-recommended KTR selection criteria, only 2.6% of the information was present on the transplant centers webpages. Socioeconomic and behavioral criteria were mentioned more than those related to patient medical conditions and comorbidities. Of the 31 criteria, finances and health insurance was the most frequently mentioned, appearing in 25.5% of the transplant centers. Other socioeconomic and behavioral criteria such as family and social support systems, adherence, and psychosocial assessment, were addressed in less than 4%. No information was found in any webpage for 14 of the criteria. Geographically, disparities in reporting were observed, with the South Atlantic division showing the highest number of distinct criteria, while New England had the fewest.</p><p><strong>Conclusions: </strong>Most transplant center websites do not disclose online patient-level KTR selection criteria. Lack of transparency in the evaluation and listing process for kidney transplantation may limit patients from choosing their most suitable transplant center and successfully receiving a kidney transplant.</p><p><strong>Clinicaltrial: </strong></p>","PeriodicalId":73551,"journal":{"name":"JMIR AI","volume":" ","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145115063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Large Language Model-Supported Identification of Intellectual Disabilities in Clinical Free-Text Summaries: Mixed Methods Study. 临床自由文本摘要中大语言模型支持的智力残疾识别:混合方法研究。
IF 2
JMIR AI Pub Date : 2025-09-18 DOI: 10.2196/72256
Aleksandra Edwards, Antonio F Pardiñas, George Kirov, Elliott Rees, Jose Camacho-Collados
{"title":"Large Language Model-Supported Identification of Intellectual Disabilities in Clinical Free-Text Summaries: Mixed Methods Study.","authors":"Aleksandra Edwards, Antonio F Pardiñas, George Kirov, Elliott Rees, Jose Camacho-Collados","doi":"10.2196/72256","DOIUrl":"10.2196/72256","url":null,"abstract":"&lt;p&gt;&lt;strong&gt;Background: &lt;/strong&gt;Free-text clinical data are unstructured and narrative in nature, providing a rich source of patient information, but extracting research-quality clinical phenotypes from these data remains a challenge. Manually reviewing and extracting clinical phenotypes from free-text patient notes is a time-consuming process and not suitable for large-scale datasets. On the other hand, automatically extracting clinical phenotypes can be challenging because medical researchers lack gold-standard annotated references and other purpose-built resources, including software. Recent large language models (LLMs) can understand natural language instructions, which help them adapt to different domains and tasks without the need for specific training data. This makes them suitable for clinical applications, though their use in this field is limited.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Objective: &lt;/strong&gt;We aimed to develop an LLM pipeline based on the few-shot learning framework that could extract clinical information from free-text clinical summaries. We assessed the performance of this pipeline for classifying individuals with confirmed or suspected comorbid intellectual disability (ID) from clinical summaries of patients with severe mental illness and performed genetic validation of the results by testing whether individuals with LLM-defined ID carried more genetic variants known to confer risk of ID when compared with individuals without LLM-defined ID.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Methods: &lt;/strong&gt;We developed novel approaches for performing classification, based on an intermediate information extraction (IE) step and human-in-the-loop techniques. We evaluated two models: Fine-Tuned Language Text-To-Text Transfer Transformer (Flan-T5) and Large Language Model Architecture (LLaMA). The dataset comprised 1144 free-text clinical summaries, of which 314 were manually annotated and used as a gold standard for evaluating automated methods. We also used published genetic data from 547 individuals to perform a genetic validation of the classification results; Firth's penalized logistic regression framework was used to test whether individuals with LLM-defined ID carry significantly more de novo variants in known developmental disorder risk genes than individuals without LLM-defined ID.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Results: &lt;/strong&gt;The results demonstrate that a 2-stage approach, combining IE with manual validation, can effectively identify individuals with suspected IDs from free-text patient records, requiring only a single training example per classification label. The best-performing method based on the Flan-T5 model and incorporating the IE step achieved an F1-score of 0.867. Individuals classified as having ID by the best performing model were significantly enriched for de novo variants in known developmental disorder risk genes (odds ratio 29.1, 95% CI 7.36-107; P=2.1×10-5).&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Conclusions: &lt;/strong&gt;LLMs and in-context learning techniques combined with human-in-the-loop ap","PeriodicalId":73551,"journal":{"name":"JMIR AI","volume":"4 ","pages":"e72256"},"PeriodicalIF":2.0,"publicationDate":"2025-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12445779/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145088303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Leveraging Smart Bed Technology to Detect COVID-19 Symptoms: Case Study. 利用智能床技术检测COVID-19症状:案例研究。
IF 2
JMIR AI Pub Date : 2025-09-17 DOI: 10.2196/64018
Gary Garcia-Molina, Dmytro Guzenko, Susan DeFranco, Mark Aloia, Rajasi Mills, Faisal Mushtaq, Virend K Somers
{"title":"Leveraging Smart Bed Technology to Detect COVID-19 Symptoms: Case Study.","authors":"Gary Garcia-Molina, Dmytro Guzenko, Susan DeFranco, Mark Aloia, Rajasi Mills, Faisal Mushtaq, Virend K Somers","doi":"10.2196/64018","DOIUrl":"10.2196/64018","url":null,"abstract":"&lt;p&gt;&lt;strong&gt;Background: &lt;/strong&gt;Pathophysiological responses to viral infections such as COVID-19 significantly affect sleep duration, sleep quality, and concomitant cardiorespiratory function. The widespread adoption of consumer smart bed technology presents a unique opportunity for unobtrusive, real-world, longitudinal monitoring of sleep and physiological signals, which may be valuable for infectious illness surveillance and early detection. During the COVID-19 pandemic, scalable and noninvasive methods for identifying subtle early symptoms in naturalistic settings became increasingly important. Existing digital health studies have largely relied on wearables or patient self-reports, with limited adherence and recall bias. In contrast, smart bed-derived signals enable high-frequency objective data capture with minimal user burden.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Objective: &lt;/strong&gt;The aim of this study was to leverage objective, longitudinal biometric data captured using ballistocardiography signals from a consumer smart bed platform, along with predictive modeling, to detect and monitor COVID-19 symptoms at an individual level.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Methods: &lt;/strong&gt;A retrospective cohort of 1725 US adults with sufficient longitudinal data and completed surveys reporting COVID-19 test outcomes was identified from users of a smart bed system. Smart bed ballistocardiography-derived metrics included nightly pulse rate, respiratory rate, total sleep time, sleep stages, and movement patterns. Participants served as their own controls, comparing reference (baseline) and symptomatic periods. A two-stage analytical pipeline was used: (1) a gradient-boosted decision-tree \"symptom detection model\" independently classified each sleep session as symptomatic or not, and (2) an \"illness-symptom progression model\" using a Gaussian Mixture Hidden Markov Model estimated the probability of symptomatic states across contiguous sleep sessions by leveraging the temporal relationship in the data. Statistical analyses evaluated within-subject changes, and the model's ability to discriminate illness windows was quantified using receiver operating characteristic metrics.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Results: &lt;/strong&gt;Out of 122 participants who tested positive for COVID-19, symptoms were detected by the model in 104 cases. Across the cohort, the model captured significant deviations in sleep and cardiorespiratory metrics during symptomatic periods compared to baseline, with an area under the receiver operating characteristic curve of 0.80, indicating high discriminatory performance. Limitations included reliance on self-reported symptoms and test status, as well as the demographic makeup of the smart bed user base.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Conclusions: &lt;/strong&gt;Smart beds represent a valuable resource for passively collecting objective, longitudinal sleep and physiological data. The findings support the feasibility of using these data and machine learning models for real-time detection and tracking of COVID-19 and r","PeriodicalId":73551,"journal":{"name":"JMIR AI","volume":"4 ","pages":"e64018"},"PeriodicalIF":2.0,"publicationDate":"2025-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12452045/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145115046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Comparative Diagnostic Performance of a Multimodal Large Language Model Versus a Dedicated Electrocardiogram AI in Detecting Myocardial Infarction From Electrocardiogram Images: Comparative Study. 多模态大语言模型与专用心电图AI在从心电图图像检测心肌梗死中的比较诊断性能:比较研究。
IF 2
JMIR AI Pub Date : 2025-09-17 DOI: 10.2196/75910
Haemin Lee, Sooyoung Yoo, Joonghee Kim, Youngjin Cho, Dongbum Suh, Keehyuck Lee
{"title":"Comparative Diagnostic Performance of a Multimodal Large Language Model Versus a Dedicated Electrocardiogram AI in Detecting Myocardial Infarction From Electrocardiogram Images: Comparative Study.","authors":"Haemin Lee, Sooyoung Yoo, Joonghee Kim, Youngjin Cho, Dongbum Suh, Keehyuck Lee","doi":"10.2196/75910","DOIUrl":"10.2196/75910","url":null,"abstract":"<p><strong>Background: </strong>Accurate and timely electrocardiogram (ECG) interpretation is critical for diagnosing myocardial infarction (MI) in emergency settings. Recent advances in multimodal large language models (LLMs), such as ChatGPT (OpenAI) and Gemini (Google DeepMind), have shown promise in clinical interpretation for medical imaging. However, whether these models analyze waveform patterns or simply rely on text cues remains unclear, underscoring the need for direct comparisons with dedicated ECG artificial intelligence (AI) tools.</p><p><strong>Objective: </strong>This study aimed to evaluate the diagnostic performance of ChatGPT and Gemini, a general-purpose LLM, in detecting MI from ECG images and to compare its performance with that of ECG Buddy (ARPI Inc), a dedicated AI-driven ECG analysis tool.</p><p><strong>Methods: </strong>This retrospective study evaluated and compared AI models for classifying MI using a publicly available 12-lead ECG dataset from Pakistan, categorizing cases into MI-positive (239 images) and MI-negative (689 images). ChatGPT (GPT-4o, version November 20, 2024) and Gemini (Gemini 2.5 pro) were queried with 5 MI confidence options, whereas ECG Buddy for Microsoft Windows analyzed the images based on ST-elevation MI, acute coronary syndrome, and myocardial injury biomarkers.</p><p><strong>Results: </strong>Among 928 ECG recordings (239/928, 25.8% MI-positive), ChatGPT achieved an accuracy of 65.95% (95% CI 62.80-69.00), area under the curve (AUC) of 57.34% (95% CI 53.44-61.24), sensitivity of 36.40% (95% CI 30.30-42.85), and specificity of 76.2% (95% CI 72.84-79.33). With Gemini 2.5 Pro, accuracy dropped to 29.63% (95% CI 26.71-32.69), AUC to 51.63% (95% CI 50.22-53.04), and sensitivity rose to 97.07% (95% CI 94.06-98.81), but specificity fell sharply to 6.24% (95% CI 4.55-8.31). However, ECG Buddy reached an accuracy of 96.98% (95% CI 95.67-97.99), AUC of 98.8% (95% CI 98.3-99.43), sensitivity of 96.65% (95% CI 93.51-98.54), and specificity of 97.10% (95% CI 95.55-98.22). DeLong test confirmed that ECG Buddy significantly outperformed ChatGPT (all P<.001). In a qualitative error analysis of LLMs' diagnostic explanations, GPT-4o produced fully accurate explanations in only 5% of cases (2/40), was partially accurate in 38% (15/40), and completely inaccurate in 58% (23/40). By contrast, Gemini 2.5 Pro yielded fully accurate explanations in 32% of cases (12/37), was partially accurate in 14% (5/37), and completely inaccurate in 54% (20/37).</p><p><strong>Conclusions: </strong>LLMs, such as ChatGPT and Gemini, underperform relative to specialized tools such as ECG Buddy in ECG image-based MI diagnosis. Further training may improve LLMs; however, domain-specific AI remains essential for clinical accuracy. The high performance of ECG Buddy underscores the importance of specialized models for achieving reliable and robust diagnostic outcomes.</p>","PeriodicalId":73551,"journal":{"name":"JMIR AI","volume":"4 ","pages":"e75910"},"PeriodicalIF":2.0,"publicationDate":"2025-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12443349/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145082735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Trade-Off Analysis of Classical Machine Learning and Deep Learning Models for Robust Brain Tumor Detection: Benchmark Study. 经典机器学习和深度学习模型在稳健脑肿瘤检测中的权衡分析:基准研究。
IF 2
JMIR AI Pub Date : 2025-09-15 DOI: 10.2196/76344
Yuting Tian
{"title":"Trade-Off Analysis of Classical Machine Learning and Deep Learning Models for Robust Brain Tumor Detection: Benchmark Study.","authors":"Yuting Tian","doi":"10.2196/76344","DOIUrl":"10.2196/76344","url":null,"abstract":"&lt;p&gt;&lt;strong&gt;Background: &lt;/strong&gt;Medical image analysis plays a critical role in brain tumor detection, but training deep learning models often requires large, labeled datasets, which can be time-consuming and costly. This study explores a comparative analysis of machine learning and deep learning models for brain tumor classification, focusing on whether deep learning models are necessary for small medical datasets and whether self-supervised learning can reduce annotation costs.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Objective: &lt;/strong&gt;The primary goal is to evaluate trade-offs between traditional machine learning and deep learning, including self-supervised models under small medical image data. The secondary goal is to assess model robustness, transferability, and generalization through evaluation of unseen data within- and cross-domains.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Methods: &lt;/strong&gt;Four models were compared: (1) support vector machine (SVM) with histogram of oriented gradients (HOG) features, (2) a convolutional neural network based on ResNet18, (3) a transformer-based model using vision transformer (ViT-B/16), and (4) a self-supervised learning approach using Simple Contrastive Learning of Visual Representations (SimCLR). These models were selected to represent diverse paradigms. SVM+HOG represents traditional feature engineering with low computational cost, ResNet18 serves as a well-established convolutional neural network with strong baseline performance, ViT-B/16 leverages self-attention to capture long-range spatial features, and SimCLR enables learning from unlabeled data, potentially reducing annotation costs. The primary dataset consisted of 2870 brain magnetic resonance images across 4 classes: glioma, meningioma, pituitary, and nontumor. All models were trained under consistent settings, including data augmentation, early stopping, and 3 independent runs using the different random seeds to account for performance variability. Performance metrics included accuracy, precision, recall, F&lt;sub&gt;1&lt;/sub&gt;-score, and convergence. To assess robustness and generalization capability, evaluation was performed on unseen test data from both the primary and cross datasets. No retraining or test augmentations were applied to the external data, thereby reflecting realistic deployment conditions. The models demonstrated consistently strong performance in both within-domain and cross-domain evaluations.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Results: &lt;/strong&gt;The results revealed distinct trade-offs; ResNet18 achieved the highest validation accuracy (mean 99.77%, SD 0.00%) and the lowest validation loss, along with a weighted test accuracy of 99% within-domain and 95% cross-domain. SimCLR reached a mean validation accuracy of 97.29% (SD 0.86%) and achieved up to 97% weighted test accuracy within-domain and 91% cross-domain, despite requiring 2-stage training phases involving contrastive pretraining followed by linear evaluation. ViT-B/16 reached a mean validation accuracy of 97.36% (SD 0.11%), with a weighted test","PeriodicalId":73551,"journal":{"name":"JMIR AI","volume":"4 ","pages":"e76344"},"PeriodicalIF":2.0,"publicationDate":"2025-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12456844/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145066511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Machine-Learning Predictive Tool for the Individualized Prediction of Outcomes of Hematopoietic Cell Transplantation for Sickle Cell Disease: Registry-Based Study. 用于个体化预测镰状细胞病造血细胞移植结果的机器学习预测工具:基于登记的研究。
IF 2
JMIR AI Pub Date : 2025-09-15 DOI: 10.2196/64519
Rajagopal Subramaniam Chandrasekar, Michael Kane, Lakshmanan Krishnamurti
{"title":"Machine-Learning Predictive Tool for the Individualized Prediction of Outcomes of Hematopoietic Cell Transplantation for Sickle Cell Disease: Registry-Based Study.","authors":"Rajagopal Subramaniam Chandrasekar, Michael Kane, Lakshmanan Krishnamurti","doi":"10.2196/64519","DOIUrl":"10.2196/64519","url":null,"abstract":"&lt;p&gt;&lt;strong&gt;Background: &lt;/strong&gt;Disease-modifying therapies ameliorate disease severity of sickle cell disease (SCD), but hematopoietic cell transplantation (HCT), and more recently, autologous gene therapy are the only treatments that have curative potential for SCD. While registry-based studies provide population-level estimates, they do not address the uncertainty regarding individual outcomes of HCT. Computational machine learning (ML) has the potential to identify generalizable predictive patterns and quantify uncertainty in estimates, thereby improving clinical decision-making. There is no existing ML model for SCD, and ML models for HCT for other diseases focus on single outcomes rather than all relevant outcomes.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Objective: &lt;/strong&gt;This study aims to address the existing knowledge gap by developing and validating an individualized ML prediction model SPRIGHT (Sickle Cell Predicting Outcomes of Hematopoietic Cell Transplantation), incorporating multiple relevant pre-HCT features to make predictions of key post-HCT clinical outcomes.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Methods: &lt;/strong&gt;We applied a supervised random forest ML model to clinical parameters in a deidentified Center for International Blood and Marrow Transplant Research (CIBMTR) dataset of 1641 patients who underwent HCT between 1991 and 2021 and were followed for a median of 42.5 (IQR 52.5;range 0.3-312.9) months. We applied forward and reverse feature selection methods to optimize a set of predictive variables. To counter the imbalance bias toward predicting positive outcomes due to the small number of negative outcomes, we constructed a training dataset, taking each outcome as variable of interest, and performed 2-times repeated 10-fold cross-validation. SPRIGHT is a web-based individualized prediction tool accessible by smartphone, tablet, or personal computer. It incorporates predictive variables of age, age group, Karnofsky or Lansky score, comorbidity index, recipient cytomegalovirus seropositivity, history of acute chest syndrome, need for exchange transfusion, occurrence and frequency of vaso-occlusive crisis (VOC) before HCT, and either a published or custom chemotherapy or radiation conditioning, serotherapy, and graft-versus-host disease prophylaxis. SPRIGHT makes individualized predictions of overall survival (OS), event-free survival, graft failure, acute graft-versus-host disease (AGVHD), chronic graft-versus-host disease (CGVHD), and occurrence of VOC or stroke post-HCT.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Results: &lt;/strong&gt;The model's ability to distinguish between positive and negative classes, that is, discrimination, was evaluated using the area under the curve, accuracy, and balanced accuracy. Discrimination met or exceeded published predictive benchmarks with area under the curve for OS (0.7925), event-free survival (0.7900), graft failure (0.8024), acute graft-versus-host disease (0.6793), chronic graft-versus-host disease (0.7320), and VOC post-HCT (0.8779). SPRIGHT revealed good c","PeriodicalId":73551,"journal":{"name":"JMIR AI","volume":"4 ","pages":"e64519"},"PeriodicalIF":2.0,"publicationDate":"2025-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12435087/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145066559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信