{"title":"Development and Validation of a Large Language Model-Powered Chatbot for Neurosurgery: Mixed Methods Study on Enhancing Perioperative Patient Education.","authors":"Chung Man Ho, Shaowei Guan, Prudence Kwan-Lam Mok, Candice Hw Lam, Wai Ying Ho, Calvin Hoi-Kwan Mak, Harry Qin, Arkers Kwan Ching Wong, Vivian Hui","doi":"10.2196/74299","DOIUrl":"https://doi.org/10.2196/74299","url":null,"abstract":"<p><strong>Background: </strong>Perioperative education is crucial for optimizing outcomes in neuroendovascular procedures, where inadequate understanding can heighten patient anxiety and hinder care plan adherence. Current education models, reliant on traditional consultations and printed materials, often lack scalability and personalization. Artificial intelligence (AI)-powered chatbots have demonstrated efficacy in various health care contexts; however, their role in neuroendovascular perioperative support remains underexplored. Given the complexity of neuroendovascular procedures and the need for continuous, tailored patient education, AI chatbots have the potential to offer tailored perioperative guidance to improve patient education in this specialty.</p><p><strong>Objective: </strong>We aimed to develop, validate, and assess NeuroBot, an AI-driven system that uses large language models (LLMs) with retrieval-augmented generation to deliver timely, accurate, and evidence-based responses to patient inquiries in neurosurgery, ultimately improving the effectiveness of patient education.</p><p><strong>Methods: </strong>A mixed methods approach was used, consisting of 3 phases. In the first phase, internal validation, we compared the performance of Assistants API, ChatGPT, and Qwen by evaluating their responses to 306 bilingual neuroendovascular-related questions. The accuracy, relevance, and completeness of the responses were evaluated using a Likert scale; statistical analyses included ANOVA and paired t tests. In the second phase, external validation, 10 neurosurgical experts rated the responses generated by NeuroBot using the same evaluation metrics applied in the internal validation phase. The consistency of their ratings was measured using the intraclass correlation coefficient. Finally, in the third phase, a qualitative study was conducted through interviews with 18 health care providers, which helped identify key themes related to the NeuroBot's usability and perceived benefits. Thematic analysis was performed using NVivo and interrater reliability was confirmed through Cohen κ.</p><p><strong>Results: </strong>The Assistants API outperformed both ChatGPT and Qwen, achieving a mean accuracy score of 5.28 out of 6 (95% CI 5.21-5.35), with a statistically significant result (P<.001). External expert ratings for NeuroBot demonstrated significant improvements, with scores of 5.70 out of 6 (95% CI 5.46-5.94) for accuracy, 5.58 out of 6 (95% CI 5.45-5.94) for relevance, and 2.70 out of 3 (95% CI 2.73-2.97) for completeness. Qualitative insights highlighted NeuroBot's potential to reduce staff workload, enhance patient education, and deliver evidence-based responses.</p><p><strong>Conclusions: </strong>NeuroBot, leveraging LLMs with the retrieval-augmented generation technique, demonstrates the potential of LLM-based chatbots in perioperative neuroendovascular care, offering scalable and continuous support. By integrating domain-specific knowledg","PeriodicalId":16337,"journal":{"name":"Journal of Medical Internet Research","volume":"27 ","pages":"e74299"},"PeriodicalIF":5.8,"publicationDate":"2025-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144637267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Correction: Large Language Model-Assisted Risk-of-Bias Assessment in Randomized Controlled Trials Using the Revised Risk-of-Bias Tool: Evaluation Study.","authors":"Jmir Editorial Office","doi":"10.2196/80519","DOIUrl":"10.2196/80519","url":null,"abstract":"<p><p>[This corrects the article DOI: 10.2196/70450.].</p>","PeriodicalId":16337,"journal":{"name":"Journal of Medical Internet Research","volume":"27 ","pages":"e80519"},"PeriodicalIF":5.8,"publicationDate":"2025-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144626518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Szilvia Zörgő, Gjalt-Jorn Peters, Anna Jeney, Szilárd Dávid Kovács, Rik Crutzen
{"title":"A Multimodal Analysis of Online Information Foraging in Health-Related Topics Based on Stimulus-Engagement Alignment: Observational Feasibility Study.","authors":"Szilvia Zörgő, Gjalt-Jorn Peters, Anna Jeney, Szilárd Dávid Kovács, Rik Crutzen","doi":"10.2196/64901","DOIUrl":"https://doi.org/10.2196/64901","url":null,"abstract":"<p><strong>Background: </strong>The recent increase in online health information-seeking has prompted extensive user appraisal of encountered content. Information consumption depends crucially on the quality of encountered information and the user's ability to evaluate it; yet, within the context of web-based, organic search behavior, few studies take into account both these aspects simultaneously.</p><p><strong>Objective: </strong>We aimed to explore a method to bridge these two aspects and grant even consideration to both the stimulus (web page content) and the user (ability to appraise encountered content). We examined novices and experts in information retrieval and appraisal to demonstrate a novel approach to studying information foraging theory: stimulus-engagement alignment (SEA).</p><p><strong>Methods: </strong>We sampled from experts and novices in information retrieval and assessment, asking participants to conduct a 10-minute search task with a specific information goal. We used an observational and a retrospective think-aloud protocol to collect data within the framework of an interview. Data from 3 streams (think-aloud, human-computer interaction, and screen content) were manually coded in the Reproducible Open Coding Kit standard and subsequently aligned and represented in a tabularized format with the R package {rock}. SEA scores were derived from designated code co-occurrences in specific segments of data within the stimulus data stream versus the think-aloud and human-computer interaction data streams.</p><p><strong>Results: </strong>SEA scores represented a meaningful comparison of what participants encountered and what they engaged with. Operationalizing codes as either \"present\" or \"absent\" in a particular data stream allowed us to inspect not only which credibility cues participants engaged with with the most frequency, but also whether participants noticed the absence of cues. Code co-occurrence frequencies could thus indicate case-, time-, and context-sensitive information appraisal that also takes into account the quality of information encountered.</p><p><strong>Conclusions: </strong>Using SEA allowed us to retain epistemic access to idiosyncratic manifestations of both stimuli and engagement. In addition, by using the same coding scheme and designated co-occurrences across participants, we were able to pinpoint trends within our sample and subsamples. We believe our approach offers a powerful analysis encompassing the breadth and depth of data, both on par with each other in the feat of understanding organic, web-based search behavior.</p>","PeriodicalId":16337,"journal":{"name":"Journal of Medical Internet Research","volume":"27 ","pages":"e64901"},"PeriodicalIF":5.8,"publicationDate":"2025-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144637266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Eden Avnat, Michal Levy, Daniel Herstain, Elia Yanko, Daniel Ben Joya, Michal Tzuchman Katz, Dafna Eshel, Sahar Laros, Yael Dagan, Shahar Barami, Joseph Mermelstein, Shahar Ovadia, Noam Shomron, Varda Shalev, Raja-Elie E Abdulnour
{"title":"Performance of Large Language Models in Numerical Versus Semantic Medical Knowledge: Cross-Sectional Benchmarking Study on Evidence-Based Questions and Answers.","authors":"Eden Avnat, Michal Levy, Daniel Herstain, Elia Yanko, Daniel Ben Joya, Michal Tzuchman Katz, Dafna Eshel, Sahar Laros, Yael Dagan, Shahar Barami, Joseph Mermelstein, Shahar Ovadia, Noam Shomron, Varda Shalev, Raja-Elie E Abdulnour","doi":"10.2196/64452","DOIUrl":"https://doi.org/10.2196/64452","url":null,"abstract":"<p><strong>Background: </strong>Clinical problem-solving requires processing of semantic medical knowledge, such as illness scripts, and numerical medical knowledge of diagnostic tests for evidence-based decision-making. As large language models (LLMs) show promising results in many aspects of language-based clinical practice, their ability to generate nonlanguage evidence-based answers to clinical questions is inherently limited by tokenization.</p><p><strong>Objective: </strong>This study aimed to evaluate LLMs' performance on two question types: numeric (correlating findings) and semantic (differentiating entities), while examining differences within and between LLMs in medical aspects and comparing their performance to humans.</p><p><strong>Methods: </strong>To generate straightforward multichoice questions and answers (Q and As) based on evidence-based medicine (EBM), we used a comprehensive medical knowledge graph (containing data from more than 50,000 peer-reviewed studies) and created the EBM questions and answers (EBMQAs). EBMQA comprises 105,222 Q and As, categorized by medical topics (eg, medical disciplines) and nonmedical topics (eg, question length), and classified into numerical or semantic types. We benchmarked a dataset of 24,000 Q and As on two state-of-the-art LLMs, GPT-4 (OpenAI) and Claude 3 Opus (Anthropic). We evaluated the LLM's accuracy on semantic and numerical question types and according to sublabeled topics. In addition, we examined the question-answering rate of LLMs by enabling them to choose to abstain from responding to questions. For validation, we compared the results for 100 unrelated numerical EBMQA questions between six human medical experts and the two language models.</p><p><strong>Results: </strong>In an analysis of 24,542 Q and As, Claude 3 and GPT-4 performed better on semantic Q and As (68.7%, n=1593 and 68.4%, n=1709), respectively. Then on numerical Q and As (61.3%, n=8583 and 56.7%, n=12,038), respectively, with Claude 3 outperforming GPT-4 in numeric accuracy (P<.001). A median accuracy gap of 7% (IQR 5%-10%) was observed between the best and worst sublabels per topic, with different LLMs excelling in different sublabels. Focusing on Medical Discipline sublabels, Claude 3 performed well in neoplastic disorders but struggled with genitourinary disorders (69%, n=676 vs 58%, n=464; P<.0001), while GPT-4 excelled in cardiovascular disorders but struggled with neoplastic disorders (60%, n=1076 vs 53%, n=704; P=.0002). Furthermore, humans (82.3%, n=82.3) surpassed both Claude 3 (64.3%, n=64.3; P<.001) and GPT-4 (55.8%, n=55.8; P<.001) in the validation test. Spearman correlation between question-answering and accuracy rate in both Claude 3 and GPT-4 was insignificant (ρ=0.12, P=.69; ρ=0.43, P=.13).</p><p><strong>Conclusions: </strong>Both LLMs excelled more in semantic than numerical Q and As, with Claude 3 surpassing GPT-4 in numerical Q and As. However, both LLMs showed inter- and intramodel gaps in dif","PeriodicalId":16337,"journal":{"name":"Journal of Medical Internet Research","volume":"27 ","pages":"e64452"},"PeriodicalIF":5.8,"publicationDate":"2025-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144637241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Large Language Model Synergy for Ensemble Learning in Medical Question Answering: Design and Evaluation Study.","authors":"Han Yang, Mingchen Li, Huixue Zhou, Yongkang Xiao, Qian Fang, Shuang Zhou, Rui Zhang","doi":"10.2196/70080","DOIUrl":"https://doi.org/10.2196/70080","url":null,"abstract":"<p><strong>Background: </strong>Large language models (LLMs) have demonstrated remarkable capabilities in natural language processing tasks, including medical question-answering (QA). However, individual LLMs often exhibit varying performance across different medical QA datasets. We benchmarked individual zero-shot LLMs (GPT-4, Llama2-13B, Vicuna-13B, MedLlama-13B, and MedAlpaca-13B) to assess their baseline performance. Within the benchmark, GPT-4 achieves the best 71% on MedMCQA (medical multiple-choice question answering dataset), Vicuna-13B achieves 89.5% on PubMedQA (a dataset for biomedical question answering), and MedAlpaca-13B achieves the best 70% among all, showing the potential for better performance across different tasks and highlighting the need for strategies that can harness their collective strengths. Ensemble learning methods, combining multiple models to improve overall accuracy and reliability, offer a promising approach to address this challenge.</p><p><strong>Objective: </strong>To develop and evaluate efficient ensemble learning approaches, we focus on improving performance across 3 medical QA datasets through our proposed two ensemble strategies.</p><p><strong>Methods: </strong>Our study uses 3 medical QA datasets: PubMedQA (1000 manually labeled and 11,269 test, with yes, no, or maybe answered for each question), MedQA-USMLE (Medical Question Answering dataset based on the United States Medical Licensing Examination; 12,724 English board-style questions; 1272 test, 5 options), and MedMCQA (182,822 training/4183 test questions, 4-option multiple choice). We introduced the LLM-Synergy framework, consisting of two ensemble methods: (1) a Boosting-based Weighted Majority Vote ensemble, refining decision-making by adaptively weighting each LLM and (2) a Cluster-based Dynamic Model Selection ensemble, dynamically selecting optimal LLMs for each query based on question-context embeddings and clustering.</p><p><strong>Results: </strong>Both ensemble methods outperformed individual LLMs across all 3 datasets. Specifically comparing the best individual LLM, the Boosting-based Majority Weighted Vote achieved accuracies of 35.84% on MedMCQA (+3.81%), 96.21% on PubMedQA (+0.64%), and 37.26% (tie) on MedQA-USMLE. The Cluster-based Dynamic Model Selection yields even higher accuracies of 38.01% (+5.98%) for MedMCQA, 96.36% (+1.09%) for PubMedQA, and 38.13% (+0.87%) for MedQA-USMLE.</p><p><strong>Conclusions: </strong>The LLM-Synergy framework, using 2 ensemble methods, represents a significant advancement in leveraging LLMs for medical QA tasks. Through effectively combining the strengths of diverse LLMs, this framework provides a flexible and efficient strategy adaptable to current and future challenges in biomedical informatics.</p>","PeriodicalId":16337,"journal":{"name":"Journal of Medical Internet Research","volume":"27 ","pages":"e70080"},"PeriodicalIF":5.8,"publicationDate":"2025-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144637240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Remote Patient Monitoring System for Polypathological Older Adults at High Risk for Hospitalization: Retrospective Cohort Study.","authors":"Damien Testa, Israa Salma, Vincent Iborra, Victoire Roussel, Mireille Dutech, Etienne Minvielle, Elise Cabanes","doi":"10.2196/71527","DOIUrl":"https://doi.org/10.2196/71527","url":null,"abstract":"<p><strong>Background: </strong>Health care systems are increasingly facing challenges posed by the aging of populations. In particular, hospitalization, both initial and subsequent, is often observed among older adult patients. However, research suggests that nearly 23% of all hospitalizations could be avoided. In this perspective, remote patient monitoring (RPM) systems are emerging as a promising solution, enabling professionals to detect and manage patient complexities early within home-based care settings.</p><p><strong>Objective: </strong>This study aims to provide additional analyses regarding the impact of the EPOCA RPM system for polypathological older adult patients on the total number of unplanned hospitalization days and admissions, as well as emergency department (ED) visits. In a prior study, we evaluated the impact when the operator of the RPM system is a geriatrician. In this study, we assess the impact when the general practitioner is the operator.</p><p><strong>Methods: </strong>We used a retrospective, before-and-after cohort design. Polypathological older adult patients aged 70 and older, who benefited from the EPOCA RPM system for at least 1 year (between February 2022 and August 2024), were included in the analysis. We compared the outcomes between the previous year (Y-1) and the follow-up year (Y) by the EPOCA RPM system. Statistical analyses were significant at P value <.05.</p><p><strong>Results: </strong>In total, 80 patients were included in the analysis, with an average age of 87. The results showed a significant reduction (P<.001) between Y-1 and Y in the total number of unplanned hospital admissions (by 57%), hospitalization days (by 49%), and ED visits (by 62%). Our findings reflected a significant decrease per patient from 0.99 to 0.42 in hospital admissions, from 0.99 to 0.37 in ED visits, and a reduction of 9.7 hospitalization days per year (P<.001). Additional analyses stratifying by hospitalization history, disability level, and caregiver status showed that the greatest effect of the RPM system was on patients with high risk and severe disability. Finally, there was no observed increase in mortality or transfers to intensive care units.</p><p><strong>Conclusions: </strong>Our findings are consistent with our previous results regarding the potential benefits of the EPOCA RPM system in managing care for polypathological older adult patients, this time with general practitioners as system operators. They also support existing evidence on the promise of RPM in improving care and health outcomes for older adult patients while alleviating hospital burdens by reducing unplanned hospitalizations and ED visits. It is, therefore, essential to incorporate reimbursement policies for these RPM initiatives so as to facilitate their adoption within health care systems and enhance their impact on health outcomes.</p>","PeriodicalId":16337,"journal":{"name":"Journal of Medical Internet Research","volume":"27 ","pages":"e71527"},"PeriodicalIF":5.8,"publicationDate":"2025-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144637242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Implementing Large Language Models in Health Care: Clinician-Focused Review With Interactive Guideline.","authors":"HongYi Li, Jun-Fen Fu, Andre Python","doi":"10.2196/71916","DOIUrl":"https://doi.org/10.2196/71916","url":null,"abstract":"<p><strong>Background: </strong>Large language models (LLMs) can generate outputs understandable by humans, such as answers to medical questions and radiology reports. With the rapid development of LLMs, clinicians face a growing challenge in determining the most suitable algorithms to support their work.</p><p><strong>Objective: </strong>We aimed to provide clinicians and other health care practitioners with systematic guidance in selecting an LLM that is relevant and appropriate to their needs and facilitate the integration process of LLMs in health care.</p><p><strong>Methods: </strong>We conducted a literature search of full-text publications in English on clinical applications of LLMs published between January 1, 2022, and March 31, 2025, on PubMed, ScienceDirect, Scopus, and IEEE Xplore. We excluded papers from journals below a set citation threshold, as well as papers that did not focus on LLMs, were not research based, or did not involve clinical applications. We also conducted a literature search on arXiv within the same investigated period and included papers on the clinical applications of innovative multimodal LLMs. This led to a total of 270 studies.</p><p><strong>Results: </strong>We collected 330 LLMs and recorded their application frequency in clinical tasks and frequency of best performance in their context. On the basis of a 5-stage clinical workflow, we found that stages 2, 3, and 4 are key stages in the clinical workflow, involving numerous clinical subtasks and LLMs. However, the diversity of LLMs that may perform optimally in each context remains limited. GPT-3.5 and GPT-4 were the most versatile models in the 5-stage clinical workflow, applied to 52% (29/56) and 71% (40/56) of the clinical subtasks, respectively, and they performed best in 29% (16/56) and 54% (30/56) of the clinical subtasks, respectively. General-purpose LLMs may not perform well in specialized areas as they often require lightweight prompt engineering methods or fine-tuning techniques based on specific datasets to improve model performance. Most LLMs with multimodal abilities are closed-source models and, therefore, lack of transparency, model customization, and fine-tuning for specific clinical tasks and may also pose challenges regarding data protection and privacy, which are common requirements in clinical settings.</p><p><strong>Conclusions: </strong>In this review, we found that LLMs may help clinicians in a variety of clinical tasks. However, we did not find evidence of generalist clinical LLMs successfully applicable to a wide range of clinical tasks. Therefore, their clinical deployment remains challenging. On the basis of this review, we propose an interactive online guideline for clinicians to select suitable LLMs by clinical task. With a clinical perspective and free of unnecessary technical jargon, this guideline may be used as a reference to successfully apply LLMs in clinical settings.</p>","PeriodicalId":16337,"journal":{"name":"Journal of Medical Internet Research","volume":"27 ","pages":"e71916"},"PeriodicalIF":5.8,"publicationDate":"2025-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144612178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zixiang Ye, Zhangyu Lin, Enmin Xie, Chenxi Song, Rui Zhang, Hao-Yu Wang, Shanshan Shi, Lei Feng, Kefei Duo
{"title":"Prediction of Percutaneous Coronary Intervention Success in Patients With Moderate to Severe Coronary Artery Calcification Using Machine Learning Based on Coronary Angiography: Prospective Cohort Study.","authors":"Zixiang Ye, Zhangyu Lin, Enmin Xie, Chenxi Song, Rui Zhang, Hao-Yu Wang, Shanshan Shi, Lei Feng, Kefei Duo","doi":"10.2196/70943","DOIUrl":"https://doi.org/10.2196/70943","url":null,"abstract":"<p><strong>Background: </strong>Given the challenges faced during percutaneous coronary intervention (PCI) for heavily calcified lesions, accurately predicting PCI success is crucial for enhancing patient outcomes and optimizing procedural strategies.</p><p><strong>Objective: </strong>This study aimed to use machine learning (ML) to identify coronary angiographic vascular characteristics and PCI procedures associated with the immediate procedural success rates of PCI in patients exhibiting moderate to severe coronary artery calcification (MSCAC).</p><p><strong>Methods: </strong>This study included patients who underwent PCI between January 2017 and December 2018 in a cardiovascular hospital, comprising 3271 patients with MSCAC and 17,998 with no or mild coronary artery calcification. Six ML models-k-nearest neighbor, gradient boosting decision tree, Extreme Gradient Boosting (XGBoost), logistic regression, random forest, and support vector machine-were developed and validated, with synthetic minority oversampling technique used to address imbalance data. Model performance was compared using multiple parameters, and the optimal algorithm was selected. Model interpretability was facilitated by Shapley Additive Explanations (SHAP), identifying the top 6 coronary angiographic features with the highest SHAP values. The importance of different PCI procedures was also elucidated via SHAP values. Testing validation was performed in a separate cohort of 1437 patients with MSCAC in 2013. External validation was conducted in a general hospital of 204 patients with MSCAC in 2021. Sensitivity analyses were conducted in patients with acute coronary syndrome and chronic coronary syndrome.</p><p><strong>Results: </strong>In the development cohort, 7.6% (n=248) of patients with MSCAC experienced PCI failure compared to 4.3% (n=774) of patients with no or mild coronary artery calcification. The XGBoost model demonstrated superior performance, achieving the highest area under the receiver operator characteristic curve (AUC) of 0.984, average precision (AP) of 0.986, F1-score of 0.970, and G-mean of 0.970. Calibration curves indicated reliable predictive accuracy. The key predictive factors identified included lesion length, minimum lumen diameter, thrombolysis in myocardial infarction flow grade, chronic total occlusion, reference vessel diameter, and diffuse lesion (SHAP value 1.65, 1.40, 0.92, 0.60, 0.54, and 0.47, respectively). The use of modified balloons for calcified lesions had a positive effect on PCI success in patients with MSCAC (SHAP value 0.16). Sensitivity analyses showed consistent model performance across subgroups with similar top 5 coronary angiographic variables. The optimized XGBoost model maintained robust predictive performance in the testing cohort, with an AUC of 0.972, AP of 0.962, and F1-score of 0.940, and in the external validation set, with an AUC of 0.810, AP of 0.957, and F1-score of 0.892.</p><p><strong>Conclusions: </strong>This st","PeriodicalId":16337,"journal":{"name":"Journal of Medical Internet Research","volume":"27 ","pages":"e70943"},"PeriodicalIF":5.8,"publicationDate":"2025-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144612179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yibei Chen, Dorota Jarecka, Sanu Ann Abraham, Remi Gau, Evan Ng, Daniel M Low, Isaac Bevers, Alistair Johnson, Anisha Keshavan, Arno Klein, Jon Clucas, Zaliqa Rosli, Steven M Hodge, Janosch Linkersdörfer, Hauke Bartsch, Samir Das, Damien Fair, David Kennedy, Satrajit S Ghosh
{"title":"Standardizing Survey Data Collection to Enhance Reproducibility: Development and Comparative Evaluation of the ReproSchema Ecosystem.","authors":"Yibei Chen, Dorota Jarecka, Sanu Ann Abraham, Remi Gau, Evan Ng, Daniel M Low, Isaac Bevers, Alistair Johnson, Anisha Keshavan, Arno Klein, Jon Clucas, Zaliqa Rosli, Steven M Hodge, Janosch Linkersdörfer, Hauke Bartsch, Samir Das, Damien Fair, David Kennedy, Satrajit S Ghosh","doi":"10.2196/63343","DOIUrl":"https://doi.org/10.2196/63343","url":null,"abstract":"<p><strong>Background: </strong>Inconsistencies in survey-based (eg, questionnaire) data collection across biomedical, clinical, behavioral, and social sciences pose challenges to research reproducibility. ReproSchema is an ecosystem that standardizes survey design and facilitates reproducible data collection through a schema-centric framework, a library of reusable assessments, and computational tools for validation and conversion. Unlike conventional survey platforms that primarily offer graphical user interface-based survey creation, ReproSchema provides a structured, modular approach for defining and managing survey components, enabling interoperability and adaptability across diverse research settings.</p><p><strong>Objective: </strong>This study examines ReproSchema's role in enhancing research reproducibility and reliability. We introduce its conceptual and practical foundations, compare it against 12 platforms to assess its effectiveness in addressing inconsistencies in data collection, and demonstrate its application through 3 use cases: standardizing required mental health survey common data elements, tracking changes in longitudinal data collection, and creating interactive checklists for neuroimaging research.</p><p><strong>Methods: </strong>We describe ReproSchema's core components, including its schema-based design; reusable assessment library with >90 assessments; and tools to validate data, convert survey formats (eg, REDCap [Research Electronic Data Capture] and Fast Healthcare Interoperability Resources), and build protocols. We compared 12 platforms-Center for Expanded Data Annotation and Retrieval, formr, KoboToolbox, Longitudinal Online Research and Imaging System, MindLogger, OpenClinica, Pavlovia, PsyToolkit, Qualtrics, REDCap, SurveyCTO, and SurveyMonkey-against 14 findability, accessibility, interoperability, and reusability (FAIR) principles and assessed their support of 8 survey functionalities (eg, multilingual support and automated scoring). Finally, we applied ReproSchema to 3 use cases-NIMH-Minimal, the Adolescent Brain Cognitive Development and HEALthy Brain and Child Development Studies, and the Committee on Best Practices in Data Analysis and Sharing Checklist-to illustrate ReproSchema's versatility.</p><p><strong>Results: </strong>ReproSchema provides a structured framework for standardizing survey-based data collection while ensuring compatibility with existing survey tools. Our comparison results showed that ReproSchema met 14 of 14 FAIR criteria and supported 6 of 8 key survey functionalities: provision of standardized assessments, multilingual support, multimedia integration, data validation, advanced branching logic, and automated scoring. Three use cases illustrating ReproSchema's flexibility include standardizing essential mental health assessments (NIMH-Minimal), systematically tracking changes in longitudinal studies (Adolescent Brain Cognitive Development and HEALthy Brain and Child Development), and c","PeriodicalId":16337,"journal":{"name":"Journal of Medical Internet Research","volume":"27 ","pages":"e63343"},"PeriodicalIF":5.8,"publicationDate":"2025-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144612180","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Digital Psychosocial Interventions Tailored for People in Opioid Use Disorder Treatment: Scoping Review.","authors":"Madison Scialanca, Karen Alexander, Babak Tofighi","doi":"10.2196/69538","DOIUrl":"https://doi.org/10.2196/69538","url":null,"abstract":"<p><strong>Background: </strong>A total of 60% of patients with opioid use disorder (OUD) leave treatment early. Psychosocial interventions can enhance treatment retention by addressing behavioral and mental health needs related to early treatment discontinuation, but intervention engagement is key. If well-designed, digital platforms can increase the engagement, reach, and accessibility of psychosocial interventions. Prior reviews of OUD treatment have predominantly focused on outcomes, such as reductions in substance use, without identifying the underlying behavior change principles that drive the effectiveness of interventions.</p><p><strong>Objective: </strong>This scoping review aims to document and describe recent digital psychosocial interventions, including their behavior change strategies, for patients receiving medication for OUD (MOUD).</p><p><strong>Methods: </strong>Predefined search terms were used to search Ovid, CINAHL, and PubMed databases for peer-reviewed literature published in the last 10 years. The database search resulted in 1381 relevant studies, and 16 of them remained after applying the inclusion criteria. Studies were included if they (1) evaluated a digital intervention with behavioral, psychosocial, or counseling components for people in OUD treatment and (2) were published in English in peer-reviewed journals.</p><p><strong>Results: </strong>The 16 studies reviewed comprised 6 randomized controlled trials, 6 pilot studies, 2 qualitative studies, and 2 retrospective cohort studies. Smartphone apps (n=8) were the most prevalent intervention delivery method, with other studies using telemedicine (n=3), virtual reality (n=1), telephone calls (n=1), or text messaging (n=3) to deliver psychosocial interventions in either a synchronous (n=7) or asynchronous (n=9) manner. The digital interventions reviewed predominately delivered cognitive behavioral therapy education through a phone call (n=1), a text message (n=2), a smartphone app (n=7), or tele-counseling (n=1). The predominant behavior change strategies implemented were self-monitoring, feedback and reinforcement, psychoeducation, cue awareness, and providing instruction. One intervention reviewed uses the evidence base of mindfulness-oriented recovery enhancement.</p><p><strong>Conclusions: </strong>Participants in the studies reviewed indicated a preference for digital, flexible, patient-centered psychosocial interventions that emphasized improved patient-provider relationships. While randomized controlled trials comprised a significant portion of the studies, the inclusion of pilot studies and qualitative research highlights the field's ongoing exploration of feasibility and effectiveness. These findings underscore the growing role of digital health solutions in psychosocial care, though further research is needed to optimize engagement, delivery, and long-term outcomes.</p>","PeriodicalId":16337,"journal":{"name":"Journal of Medical Internet Research","volume":"27 ","pages":"e69538"},"PeriodicalIF":5.8,"publicationDate":"2025-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144612177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}