Mahmud Omar, Reem Agbareia, Benjamin S Glicksberg, Girish N Nadkarni, Eyal Klang
{"title":"Benchmarking the Confidence of Large Language Models in Answering Clinical Questions: Cross-Sectional Evaluation Study.","authors":"Mahmud Omar, Reem Agbareia, Benjamin S Glicksberg, Girish N Nadkarni, Eyal Klang","doi":"10.2196/66917","DOIUrl":"10.2196/66917","url":null,"abstract":"<p><strong>Background: </strong>The capabilities of large language models (LLMs) to self-assess their own confidence in answering questions within the biomedical realm remain underexplored.</p><p><strong>Objective: </strong>This study evaluates the confidence levels of 12 LLMs across 5 medical specialties to assess LLMs' ability to accurately judge their own responses.</p><p><strong>Methods: </strong>We used 1965 multiple-choice questions that assessed clinical knowledge in the following areas: internal medicine, obstetrics and gynecology, psychiatry, pediatrics, and general surgery. Models were prompted to provide answers and to also provide their confidence for the correct answers (score: range 0%-100%). We calculated the correlation between each model's mean confidence score for correct answers and the overall accuracy of each model across all questions. The confidence scores for correct and incorrect answers were also analyzed to determine the mean difference in confidence, using 2-sample, 2-tailed t tests.</p><p><strong>Results: </strong>The correlation between the mean confidence scores for correct answers and model accuracy was inverse and statistically significant (r=-0.40; P=.001), indicating that worse-performing models exhibited paradoxically higher confidence. For instance, a top-performing model-GPT-4o-had a mean accuracy of 74% (SD 9.4%), with a mean confidence of 63% (SD 8.3%), whereas a low-performing model-Qwen2-7B-showed a mean accuracy of 46% (SD 10.5%) but a mean confidence of 76% (SD 11.7%). The mean difference in confidence between correct and incorrect responses was low for all models, ranging from 0.6% to 5.4%, with GPT-4o having the highest mean difference (5.4%, SD 2.3%; P=.003).</p><p><strong>Conclusions: </strong>Better-performing LLMs show more aligned overall confidence levels. However, even the most accurate models still show minimal variation in confidence between right and wrong answers. This may limit their safe use in clinical settings. Addressing overconfidence could involve refining calibration methods, performing domain-specific fine-tuning, and involving human oversight when decisions carry high risks. Further research is needed to improve these strategies before broader clinical adoption of LLMs.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e66917"},"PeriodicalIF":3.1,"publicationDate":"2025-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12101789/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144082433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Junyan Zhang, Junchen Zhou, Liqin Zhou, Zhichao Ba
{"title":"Extracting Multifaceted Characteristics of Patients With Chronic Disease Comorbidity: Framework Development Using Large Language Models.","authors":"Junyan Zhang, Junchen Zhou, Liqin Zhou, Zhichao Ba","doi":"10.2196/70096","DOIUrl":"10.2196/70096","url":null,"abstract":"<p><strong>Background: </strong>Research on chronic multimorbidity has increasingly become a focal point with the aging of the population. Many studies in this area require detailed patient characteristic information. However, the current methods for extracting such information are complex, time-consuming, and prone to errors. The challenge of quickly and accurately extracting patient characteristics has become a common issue in the study of chronic disease comorbidities.</p><p><strong>Objective: </strong>Our objective was to establish a comprehensive framework for extracting demographic and disease characteristics of patients with multimorbidity. This framework leverages large language models (LLMs) to extract feature information from unstructured and semistructured electronic health records pertaining to these patients. We investigated the model's proficiency in extracting feature information across 7 dimensions: basic information, disease details, lifestyle habits, family medical history, symptom history, medication recommendations, and dietary advice. In addition, we demonstrated the strengths and limitations of this framework.</p><p><strong>Methods: </strong>We used data sourced from a grassroots community health service center in China. We developed a multifaceted feature extraction framework tailored for patients with multimorbidity, which consists of several integral components: feasibility testing, preprocessing, the determination of feature extraction, prompt modeling based on LLMs, postprocessing, and midterm evaluation. Within this framework, 7 types of feature information were extracted as straightforward features, and three types of features were identified as intricate features. On the basis of the straightforward features, we calculated patients' age, BMI, and 12 disease risk factors. Rigorous manual verification experiments were conducted 100 times for straightforward features and 200 times for intricate features, followed by comprehensive quantitative and qualitative assessments of the experimental outcomes.</p><p><strong>Results: </strong>The framework achieved an overall F<sub>1</sub>-score of 99.6% for the 7 straightforward feature extractions, with the highest F<sub>1</sub>-score of 100% for basic information. In addition, the framework demonstrated an overall F<sub>1</sub>-score of 94.4% for the 3 intricate feature extractions. Our analysis of the results revealed that accurate information content extraction is a substantially advantage of this framework, whereas ensuring consistency in the format of extracted information remains one of its challenges.</p><p><strong>Conclusions: </strong>The framework incorporates electronic health record information from 1225 patients with multimorbidity, covering a diverse range of 41 chronic diseases, and can seamlessly accommodate the inclusion of additional diseases. This underscores its scalability and adaptability as a method for extracting patient-specific characteristics, effective","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e70096"},"PeriodicalIF":3.1,"publicationDate":"2025-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12123238/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144082434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Advanced Reasoning Capabilities of Large Language Models for Detecting Contraindicated Options in Medical Exams.","authors":"Yuichiro Yano, Mizuki Ohashi, Taiju Miyagami, Hirotake Mori, Yuji Nishizaki, Hiroyuki Daida, Toshio Naito","doi":"10.2196/68527","DOIUrl":"10.2196/68527","url":null,"abstract":"<p><strong>Unlabelled: </strong>Enhancing clinical reasoning and reducing diagnostic errors are essential in medical practice; OpenAI-o1, with advanced reasoning capabilities, performed better than GPT-4 on 15 Japanese National Medical Licensing Examination questions (accuracy: 100% vs 80%; contraindicated option detection: 87% vs 73%), though findings are preliminary due to the small sample size.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e68527"},"PeriodicalIF":3.1,"publicationDate":"2025-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12088613/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144014124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mai N Nguyen-Huynh, Janet Alexander, Zheng Zhu, Melissa Meighan, Gabriel Escobar
{"title":"Effects of the National Institutes of Health Stroke Scale and Modified Rankin Scale on Predictive Models of 30-Day Nonelective Readmission and Mortality After Ischemic Stroke: Cohort Study.","authors":"Mai N Nguyen-Huynh, Janet Alexander, Zheng Zhu, Melissa Meighan, Gabriel Escobar","doi":"10.2196/69102","DOIUrl":"10.2196/69102","url":null,"abstract":"<p><strong>Background: </strong>Patients with stroke have high rates of all-cause readmission and case fatality. Limited information is available on how to predict these outcomes.</p><p><strong>Objective: </strong>We aimed to assess whether adding the initial National Institutes of Health Stroke Scale (NIHSS) score or modified Rankin scale (mRS) score at discharge improved predictive models of 30-day nonelective readmission or 30-day mortality poststroke.</p><p><strong>Methods: </strong>Using a cohort of patients with ischemic stroke in a large multiethnic integrated health care system from June 15, 2018, to April 29, 2020, we tested 2 predictive models for a composite outcome (30-day nonelective readmission or death). The models were based on administrative data (Length of Stay, Acuity, Charlson Comorbidities, Emergency Department Use score; LACE) as well as a comprehensive model (Transition Support Level; TSL). The models, initial NIHSS score, and mRS scores at discharge, were tested independently and in combination with age and sex. We assessed model performance using the area under the receiver operator characteristic (c-statistic), Nagelkerke pseudo-R2, and Brier score.</p><p><strong>Results: </strong>The study cohort included 4843 patients with 5014 stroke hospitalizations. Average age was 71.9 (SD 14) years, 50.6% (2537/5014) were female, and 52.1% (2614/5014) were White. Median initial NIHSS score was 4 (IQR 2-8). There were 538 (10.7%) nonelective readmissions and 150 (3.9%) deaths within 30 days. The logistic models revealed that the best performing models were TSL (c-statistic=0.69) and TSL plus mRS score at discharge (c-statistic=0.69).</p><p><strong>Conclusions: </strong>We found that neither the initial NIHSS score nor the mRS score at discharge significantly enhanced the predictive ability of the LACE or TSL models. Future efforts at prediction of short-term stroke outcomes will need to incorporate new data elements.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e69102"},"PeriodicalIF":3.1,"publicationDate":"2025-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12083732/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144043048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Transformer-Based Language Models for Group Randomized Trial Classification in Biomedical Literature: Model Development and Validation.","authors":"Elaheh Aghaarabi, David Murray","doi":"10.2196/63267","DOIUrl":"10.2196/63267","url":null,"abstract":"<p><strong>Background: </strong>For the public health community, monitoring recently published articles is crucial for staying informed about the latest research developments. However, identifying publications about studies with specific research designs from the extensive body of public health publications is a challenge with the currently available methods.</p><p><strong>Objective: </strong>Our objective is to develop a fine-tuned pretrained language model that can accurately identify publications from clinical trials that use a group- or cluster-randomized trial (GRT), individually randomized group-treatment trial (IRGT), or stepped wedge group- or cluster-randomized trial (SWGRT) design within the biomedical literature.</p><p><strong>Methods: </strong>We fine-tuned the BioMedBERT language model using a dataset of biomedical literature from the Office of Disease Prevention at the National Institute of Health. The model was trained to classify publications into three categories of clinical trials that use nested designs. The model performance was evaluated on unseen data and demonstrated high sensitivity and specificity for each class.</p><p><strong>Results: </strong>When our proposed model was tested for generalizability with unseen data, it delivered high sensitivity and specificity for each class as follows: negatives (0.95 and 0.93), GRTs (0.94 and 0.90), IRGTs (0.81 and 0.97), and SWGRTs (0.96 and 0.99), respectively.</p><p><strong>Conclusions: </strong>Our work demonstrates the potential of fine-tuned, domain-specific language models to accurately identify publications reporting on complex and specialized study designs, addressing a critical need in the public health research community. This model offers a valuable tool for the public health community to directly identify publications from clinical trials that use one of the three classes of nested designs.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e63267"},"PeriodicalIF":3.1,"publicationDate":"2025-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12148241/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144053945","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Toward Interoperable Digital Medication Records on Fast Healthcare Interoperability Resources: Development and Technical Validation of a Minimal Core Dataset.","authors":"Eduardo Salgado-Baez, Raphael Heidepriem, Renate Delucchi Danhier, Eugenia Rinaldi, Vishnu Ravi, Akira-Sebastian Poncette, Iris Dahlhaus, Daniel Fürstenau, Felix Balzer, Sylvia Thun, Julian Sass","doi":"10.2196/64099","DOIUrl":"10.2196/64099","url":null,"abstract":"<p><strong>Background: </strong>Medication errors represent a widespread, hazardous, and costly challenge in health care settings. The lack of interoperable medication data within and across hospitals not only creates an administrative burden through redundant data entry but also increases the risk of errors due to human mistakes, imprecise data transformations, and misinterpretations. While digital solutions exist, fragmented systems and nonstandardized data hinder effective medication management.</p><p><strong>Objective: </strong>This study aimed to assess medication data available across the multiple systems of a large university hospital, identify a minimum dataset with the most relevant information, and propose a standard interoperable FHIR-based solution that can import and transfer information from a standardized drug master database to various target systems.</p><p><strong>Methods: </strong>Medication data from all relevant departments of a large German hospital were thoroughly analyzed. To ensure interoperability, data elements for developing a minimum dataset were defined based on relevant medication identifiers, the Health Level 7 Fast Health Interoperability Resources (HL7 FHIR) standard, and the German Medical Informatics Initiative (MII) specifications. To enhance medication identification accuracy, the dataset was further enriched with information from Germany's most comprehensive drug database and European Standard Drug Terms (EDQM) to further enrich medication identification accuracy. Finally, data on 60 frequently used medications in the institution were systematically extracted from multiple medication systems used in the institution and integrated into a new structured, dedicated database.</p><p><strong>Results: </strong>The analysis of all the available medication datasets within the institution identified 7964 drugs. However, limited interoperability was observed due to a fragmented local IT infrastructure and challenges in medication data standardization. Data integrated and available in the new structured medication dataset with key elements to ensure data identification accuracy and interoperability, successfully enabled the generation of medication order messages, ensuring medication interoperability, and standardized data exchange.</p><p><strong>Conclusions: </strong>Our approach addresses the lack of interoperability in medication data and the need for standardized data exchange. We propose a minimum set of data elements aligned with German and international coding systems to be used in combination with the FHIR standard for processes such as the digital transfer of discharge medication prescriptions from intensive care units to general wards, which can help to reduce medication errors and enhance patient safety.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":" ","pages":"e64099"},"PeriodicalIF":3.1,"publicationDate":"2025-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12102619/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143525343","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Brigitte Stephan, Kathrin Gehrdau, Christina Sorbe, Matthias Augustin, Martin Scherer, Anne Kis
{"title":"Benefits and Limitations of Teledermatology in German Correctional Facilities: Cross-Sectional Analysis.","authors":"Brigitte Stephan, Kathrin Gehrdau, Christina Sorbe, Matthias Augustin, Martin Scherer, Anne Kis","doi":"10.2196/58712","DOIUrl":"10.2196/58712","url":null,"abstract":"<p><strong>Background: </strong>Teledermatology consultations offer the advantage of rapid diagnosis and care. Since 2019, our institute at the University Medical Center Hamburg-Eppendorf has been part of an interdisciplinary team for teledermatology support in German prisons as an alternative to extramural transports of patients.</p><p><strong>Objective: </strong>This study aims to analyze the benefits and limitations of teledermatology for patients with limited access to medical specialties.</p><p><strong>Methods: </strong>We conducted a descriptive cross-sectional analysis of 651 teleconsultations from prisons from February 2020 to April 2023. All cases were performed in a store-and-forward (asynchronous mode) and optional hybrid live (synchronous) consultation for the patient or in-house staff.</p><p><strong>Results: </strong>The main advantage of this case processing was the avoidance of external transport. Of the 651 teleconsultations, 608 (93.4%) could be finalized with telemedical support and 43 (6.6%) required additional workup, including verifications of the type of tumors (n=22, 51%), which needed biopsies, and open cases that were inflammatory (n=11, 26%) or involved infectious skin conditions (n=5, 12%). Digital imaging of the skin lesions improved with the experience of the personnel but remained a challenge, with the photo quality depending on the technical devices or available broadband supply.</p><p><strong>Conclusions: </strong>Hybrid teledermatology consultation represents an effective and resource-saving method of providing specialized care to patients in situations with limited access to medical specialties. The video consultations with experts and exchange of knowledge about the cases presented opened the opportunity to support and train intramural colleagues. One of the main challenges remains the quality of digital imaging and transmission.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e58712"},"PeriodicalIF":3.1,"publicationDate":"2025-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12280115/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144058612","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Retraction: \"Diagnostic Classification and Prognostic Prediction Using Common Genetic Variants in Autism Spectrum Disorder: Genotype-Based Deep Learning\".","authors":"","doi":"10.2196/76833","DOIUrl":"https://doi.org/10.2196/76833","url":null,"abstract":"","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e76833"},"PeriodicalIF":3.1,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144053507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Catriona Miller, Theo Portlock, Denis M Nyaga, Greg D Gamble, Justin M O'Sullivan
{"title":"Code Error in \"Diagnostic Classification and Prognostic Prediction Using Common Genetic Variants in Autism Spectrum Disorder: Genotype-Based Deep Learning\".","authors":"Catriona Miller, Theo Portlock, Denis M Nyaga, Greg D Gamble, Justin M O'Sullivan","doi":"10.2196/66556","DOIUrl":"10.2196/66556","url":null,"abstract":"","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e66556"},"PeriodicalIF":3.1,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12138136/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144024250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Min Jiang, Shuhua Zhao, Yun Mei, Zhiying Fu, Yannan Yuan, Jie Ai, Yuan Sheng, Ying Gong, Jingjing Chen
{"title":"Real-Time, Risk-Based Clinical Trial Quality Management in China: Development of a Digital Monitoring Platform.","authors":"Min Jiang, Shuhua Zhao, Yun Mei, Zhiying Fu, Yannan Yuan, Jie Ai, Yuan Sheng, Ying Gong, Jingjing Chen","doi":"10.2196/64114","DOIUrl":"https://doi.org/10.2196/64114","url":null,"abstract":"<p><strong>Background: </strong>With the improvement of the drug evaluation system in China, an increasing number of clinical trials have been launched in Chinese hospitals. However, traditional clinical trial quality management models largely rely on human monitoring and counting, which can be time-consuming and are likely to generate errors and biases. There is an urgent need to upgrade and improve the efficiency and accuracy of clinical trial quality monitoring systems in hospital-based research institutions within China.</p><p><strong>Objective: </strong>The objective of this study was to develop a digital monitoring platform that allows for the real-time monitoring and detection of risk points and provides warnings about risk points throughout the entire life cycle of clinical trials, on the basis of historical clinical trial quality control (QC) findings.</p><p><strong>Methods: </strong>Leveraging the risk-based quality management mindset, we built a digital dynamic monitoring platform by using big data analysis and automatic quantitative technology. Data from clinical trial QC reports generated during 2019 to 2023 in Beijing University Cancer Hospital, China, were used to train the automated classification tool, establish warning thresholds, and validate threshold values. Quality findings from the early-stage, interim-stage, and conclusion-stage QC rounds of clinical trials were rated by using 3 severity grades (minor, major, or critical) and classified into 5 categories (with 4 taxonomy levels under each category). QC report text was processed by using an automated natural language processing tool. All QC reports were grouped into 2 clusters via hierarchical clustering analysis. QC findings from the relatively high-risk cluster (reports that were more likely to have major and critical findings, as determined by experienced QC analysts) were used to determine warning threshold values for the monitoring platform (ie, the lowest number of findings was set as the threshold value for each specific study stage, Level-3 taxonomy, and severity grade combination).</p><p><strong>Results: </strong>The most frequently reported Level-3 taxonomies in QC reports from 2019 to 2022 were \"Standard Procedure and Process,\" \"Safety Reporting,\" and \"Source Data Collection and/or Recording.\" In total, 189 warning threshold values were established based on data from 1380 QC reports generated during 2019 to 2022, covering 3 severity grades, 21 Level-3 taxonomies, and 3 QC rounds. The warning thresholds were applied to 211 QC reports generated in 2023, of which 19.9% (n=42) triggered warnings. Similar patterns of QC findings, including the most frequently noted Level-3 QC findings, were observed between reports generated in 2023 and those from 2019 to 2022.</p><p><strong>Conclusions: </strong>In clinical practice, our tool would enable the automated monitoring and detection of risk points throughout all clinical trial stages; accurately identify the most relevant ","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e64114"},"PeriodicalIF":3.1,"publicationDate":"2025-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12047851/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144056972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}