Journal of Biomedical Informatics最新文献

ADENER: A syntax-augmented grid-tagging model for Adverse Drug Event extraction in social media. 社交媒体中不良药物事件提取的语法增强网格标记模型。

IF 4.5 2区医学

Journal of Biomedical Informatics Pub Date : 2025-10-22 DOI: 10.1016/j.jbi.2025.104944

Weiru Fu, Hao Li, Ling Luo, Hongfei Lin

{"title":"ADENER: A syntax-augmented grid-tagging model for Adverse Drug Event extraction in social media.","authors":"Weiru Fu, Hao Li, Ling Luo, Hongfei Lin","doi":"10.1016/j.jbi.2025.104944","DOIUrl":"https://doi.org/10.1016/j.jbi.2025.104944","url":null,"abstract":"<p><strong>Objective: </strong>Adverse Drug Event (ADE) extraction from social media is a critical yet challenging task due to the semantic similarity between adverse effects and therapeutic indications, as well as the prevalence of overlapping and discontinuous mentions often caused by comorbid conditions. This study aims to develop a robust model for accurate ADE extraction from noisy and irregular social media texts.</p><p><strong>Methods: </strong>We propose ADENER, a grid-tagging architecture that models ADE extraction as multi-label word-pair classification. ADENER incorporates two core encoding mechanisms: the convolutional capture layer fuses multi-dimensional textual features, captures long-range word-pair dependencies via dilated convolutions, and enhances interactions through semantic association matrices for social media text irregularities; the syntactic affine layer integrates path-level dependency information to enhance global logic understanding, enabling the model to distinguish between therapeutic symptom entities and ADE entities through syntactic cues. The decoding stage uses four-type relational labels to uniformly decode flat, overlapping, and discontinuous ADE mentions.</p><p><strong>Results: </strong>We evaluated ADENER on three widely used ADE extraction datasets: CADEC, CADECv2, SMM4H. The model achieved F1 scores of 74.64%, 77.97%, 61.73% on these datasets, respectively, outperforming all compared baseline models while maintaining competitive computational efficiency. The results demonstrate the effectiveness of our model in addressing the challenges posed by irregular and noisy social media data.</p><p><strong>Conclusion: </strong>ADENER offers a unified and effective solution for ADE extraction from social media, capable of handling flat, overlapping, and discontinuous entity mentions and correctly distinguishing ADE entities from therapeutic symptom entities. By incorporating convolutional capture layers for semantic word-pair interactions and syntactic affine layers for dependency-based logic understanding, our approach significantly improves extraction accuracy, providing a valuable tool for pharmacovigilance research and real-world drug safety monitoring.</p>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"171 ","pages":"104944"},"PeriodicalIF":4.5,"publicationDate":"2025-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145355020","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Discovering signature disease trajectories in pancreatic cancer and soft-tissue sarcoma from longitudinal patient records. 从纵向患者记录中发现胰腺癌和软组织肉瘤的标志性疾病轨迹。

IF 4.5 2区医学

Journal of Biomedical Informatics Pub Date : 2025-10-19 DOI: 10.1016/j.jbi.2025.104935

Liwei Wang, Rui Li, Andrew Wen, Qiuhao Lu, Jinlian Wang, Xiaoyang Ruan, Adriana Gamboa, Neha Malik, Christina L Roland, Matthew H G Katz, Heather Lyu, Hongfang Liu

{"title":"Discovering signature disease trajectories in pancreatic cancer and soft-tissue sarcoma from longitudinal patient records.","authors":"Liwei Wang, Rui Li, Andrew Wen, Qiuhao Lu, Jinlian Wang, Xiaoyang Ruan, Adriana Gamboa, Neha Malik, Christina L Roland, Matthew H G Katz, Heather Lyu, Hongfang Liu","doi":"10.1016/j.jbi.2025.104935","DOIUrl":"https://doi.org/10.1016/j.jbi.2025.104935","url":null,"abstract":"<p><strong>Background: </strong>Most clinicians have limited experience with rare diseases, making diagnosis and treatment challenging. Large real-world data sources, such as electronic health records (EHRs), provide a massive amount of information that can potentially be leveraged to determine the patterns of diagnoses and treatments for rare tumors that can serve as clinical decision aids.</p><p><strong>Objectives: </strong>We aimed to discover signature disease trajectories of 3 rare cancer types: pancreatic cancer, STS of the trunk and extremity (STS-TE), and STS of the abdomen and retroperitoneum (STS-AR).</p><p><strong>Materials and methods: </strong>Leveraging IQVIA Oncology Electronic Medical Record, we identified significant diagnosis pairs across 3 years in patients with these cancers through matched cohort sampling, statistical computation, right-tailed binomial hypothesis test, and then visualized trajectories up to 3 progressions. We further conducted systematic validation for the discovered trajectories with the UTHealth Electronic Health Records (EHR).</p><p><strong>Results: </strong>Results included 266 significant diagnosis pairs for pancreatic cancer, 130 for STS-TE, and 118 for STS-AR. We further found 44 2-hop (i.e., 2-progression) and 136 3-hop trajectories before pancreatic cancer, 36 2-hop and 37 3-hop trajectories before STS-TE, and 17 2-hop and 5 3-hop trajectories before STS-AR. Meanwhile, we found 54 2-hop and 129 3-hop trajectories following pancreatic cancer, 11 2-hop and 17 3-hop trajectories following STS-TE, 5 2-hop and 0 3-hop trajectories following STS-AR. For example, pain in joint and gastro-oesophageal reflux disease occurred before pancreatic cancer in 64 (0.5%) patients, pain in joint and \"pain in limb, hand, foot, fingers and toes\" occurred before STS-TE in 40 (0.9%) patients, agranulocytosis secondary to cancer chemotherapy and neoplasm related pain occurred after pancreatic cancer in 256 (1.9%) patients. Systematic validation using the UTHealth EHR confirmed the validity of the discovered trajectories.</p><p><strong>Conclusion: </strong>We identified signature disease trajectories for the studied rare cancers by leveraging large-scale EHR data and trajectory mining approaches. These disease trajectories could serve as potential resources for clinicians to deepen their understanding of the temporal progression of conditions preceding and following these rare cancers, further informing patient-care decisions.</p>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":" ","pages":"104935"},"PeriodicalIF":4.5,"publicationDate":"2025-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145344968","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A non-interactive Online Medical Pre-Diagnosis system on encrypted vertically partitioned data. 基于加密垂直分区数据的非交互式在线医疗预诊断系统。

IF 4.5 2区医学

Journal of Biomedical Informatics Pub Date : 2025-10-17 DOI: 10.1016/j.jbi.2025.104940

Min Tang, Yuhao Zhang, Ronghua Liang, Guoqiang Deng

{"title":"A non-interactive Online Medical Pre-Diagnosis system on encrypted vertically partitioned data.","authors":"Min Tang, Yuhao Zhang, Ronghua Liang, Guoqiang Deng","doi":"10.1016/j.jbi.2025.104940","DOIUrl":"https://doi.org/10.1016/j.jbi.2025.104940","url":null,"abstract":"<p><strong>Objective: </strong>In medical environments, patient records are stored as heterogeneous features across various institutions, prohibiting raw data sharing due to legal or institutional constraints. This fragmentation presents challenges for Online Medical Pre-Diagnosis (OMPD) systems. Existing methods (such as federated learning) require multiple rounds of interactions among all participating parties (hospitals and cloud servers), resulting in frequent communication. Moreover, due to the sharing of global gradients, they are vulnerable to inference attacks, leading to information leakage. In this paper, we propose a secure and efficient the OMPD system framework to address the problem of vertical data fragmentation, aiming to resolve the contradiction between medical data isolation and model collaboration.</p><p><strong>Methods: </strong>We propose PPNLR, a secure framework for building the OMPD systems. This framework combines functional encryption and blinding factors to design the sample-feature dimension encryption algorithm and the privacy-preserving vectorization training algorithm. Decoupling sample computation from model training enables cross-client data aggregation with only a single communication between hospitals and cloud servers.</p><p><strong>Results: </strong>Security analysis shows that PPNLR is resistant to semi-honest inference attacks and collusion attacks. Evaluation results based on six real-world medical datasets (text and images) show that: (i) The inference accuracy is close to that of the centralized plaintext training benchmark; (ii) The computational efficiency is at least 3.6× higher than that of comparable approaches; (iii) The communication complexity is significantly reduced by eliminating dependencies on iteration count.</p><p><strong>Conclusion: </strong>PPNLR achieves data protection through cryptographic primitives, maintaining high diagnostic accuracy while ensuring the security of medical data and model parameters. Its single-communication architecture significantly reduces the deployment threshold in resource-constrained scenarios, providing a practical framework for building the privacy-friendly OMPD systems.</p>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":" ","pages":"104940"},"PeriodicalIF":4.5,"publicationDate":"2025-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145329250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Synthetic-to-real attentive deep learning for Alzheimer's assessment: A domain-agnostic framework for ROCF scoring. 用于阿尔茨海默氏症评估的综合到真实的专注深度学习：ROCF评分的领域不可知框架。

IF 4.5 2区医学

Journal of Biomedical Informatics Pub Date : 2025-10-17 DOI: 10.1016/j.jbi.2025.104929

Kassem Anis Bouali, Elena Šikudová

{"title":"Synthetic-to-real attentive deep learning for Alzheimer's assessment: A domain-agnostic framework for ROCF scoring.","authors":"Kassem Anis Bouali, Elena Šikudová","doi":"10.1016/j.jbi.2025.104929","DOIUrl":"https://doi.org/10.1016/j.jbi.2025.104929","url":null,"abstract":"<p><strong>Objective: </strong>Early diagnosis of Alzheimer's disease depends on accessible cognitive assessments, such as the Rey-Osterrieth Complex Figure (ROCF) test. However, manual scoring of this test is labor-intensive and subjective, which introduces experimental biases. Additionally, deep learning models face challenges due to the limited availability of annotated clinical data, particularly for assessments like the ROCF test. This scarcity of data restricts model generalization and exacerbates domain shifts across different populations.</p><p><strong>Methods: </strong>We propose a novel framework comprising a data synthesis pipeline and ROCF-Net, a deep learning model specifically designed for ROCF scoring. The synthesis pipeline is lightweight and capable of generating realistic, diverse, and annotated ROCF drawings. ROCF-Net, on the other hand, is a cross-domain scoring model engineered to address domain discrepancies in stroke texture and line artifacts. It maintains high scoring accuracy through a novel line-specific attention mechanism tailored to the unique characteristics of ROCF drawings.</p><p><strong>Results: </strong>Unlike conventional synthetic medical imaging methods, our approach generates ROCF drawings that accurately reflect Alzheimer's-specific abnormalities with minimal computational cost. Our scoring model achieves SOTA performance across differently sourced datasets, with a Mean Absolute Error (MAE) of 3.53 and a Pearson Correlation Coefficient (PCC) of 0.86. This demonstrates both high predictive accuracy and computational efficiency, outperforming existing ROCF scoring methods that rely on Convolutional Neural Networks (CNNs) while avoiding the overhead of parameter-heavy transformer models. We also show that training on our synthetic data generalizes as well as training on real clinical data, where the difference in performance was minimal (MAE differed by 1.43 and PCC by 0.07), indicating no statistically significant performance gap.</p><p><strong>Conclusion: </strong>Our work introduces four contributions: (1) a cost-effective pipeline for generating synthetic ROCF data, reducing dependency on clinical datasets; (2) a domain-agnostic model for automated ROCF scoring across diverse drawing styles; (3) a lightweight attention mechanism aligning model decisions with clinical scoring for transparency; and (4) a bias-aware framework using synthetic data to reduce demographic disparities, promoting fair cognitive assessment across populations.</p>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":" ","pages":"104929"},"PeriodicalIF":4.5,"publicationDate":"2025-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145329252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Towards a Biological Evaluation Framework for Oversampling (BEFO) gene expression data. 构建过采样（BEFO）基因表达数据生物学评价框架。

IF 4.5 2区医学

Journal of Biomedical Informatics Pub Date : 2025-10-17 DOI: 10.1016/j.jbi.2025.104932

Kevin Fee, Suneil Jain, Ross G Murphy, Anna Jurek-Loughrey

{"title":"Towards a Biological Evaluation Framework for Oversampling (BEFO) gene expression data.","authors":"Kevin Fee, Suneil Jain, Ross G Murphy, Anna Jurek-Loughrey","doi":"10.1016/j.jbi.2025.104932","DOIUrl":"10.1016/j.jbi.2025.104932","url":null,"abstract":"<p><p>Machine learning (ML) techniques are progressively being used in biomedical research to improve diagnostic and prognostic accuracy when used in conjunction with a clinician as a decision support system. However, many datasets used in biomedical research often suffer from severe class imbalance due to small population sizes, which causes machine learning models to become biased to majority class samples. Current oversampling methods primarily focus on balancing datasets without adequately validating the biological relevance of synthetic data, risking the clinical applicability of downstream model predictions. To address these shortcomings, we propose the Biological Evaluation Framework for Oversampling (BEFO) designed to ensure that synthetic gene expression samples accurately reflect the biological patterns present in original datasets. This innovation not only mitigates bias but enhances the trustworthiness of predictive models in clinical scenarios. We have developed a ranking method for synthetic samples based on this and evaluated each sample's inclusion based on its rank. This ranking method calculates the WGCNA gene co-expression clusters on the original dataset. Several random forests are constructed to assess the alignment of each synthetic sample to each cluster. Only synthetic samples more important than real samples are included in a study. The experimental results demonstrate that our proposed ML oversampling framework can improve the biological feasibility of oversampled datasets by an average of 11%, leading to improved classification performance by an average of 9% when compared against five state-of-the-art (SOTA) oversampling methods and ten classification algorithms across six real world gene expressions datasets. Thereby establishing a new standard for synthetic data evaluation in biomedical ML applications.</p>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":" ","pages":"104932"},"PeriodicalIF":4.5,"publicationDate":"2025-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145329281","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

TriMedPrompt: A unified prompting framework for realistic and layout-conformant clinical progress note synthesis. TriMedPrompt：一个统一的提示框架，用于现实和符合布局的临床进展记录合成。

IF 4.5 2区医学

Journal of Biomedical Informatics Pub Date : 2025-10-17 DOI: 10.1016/j.jbi.2025.104927

Garapati Keerthana, Manik Gupta

{"title":"TriMedPrompt: A unified prompting framework for realistic and layout-conformant clinical progress note synthesis.","authors":"Garapati Keerthana, Manik Gupta","doi":"10.1016/j.jbi.2025.104927","DOIUrl":"https://doi.org/10.1016/j.jbi.2025.104927","url":null,"abstract":"<p><p>Clinical progress notes are critical artifacts for modeling patient trajectories, auditing clinical decision-making, and powering downstream applications in clinical natural language processing (NLP). However, public resources such as MIMIC-III provide limited progress notes, constraining the development of robust and generalizable machine learning models. This work proposes a novel hybrid prompting framework - TriMedPrompt - to generate high-quality, structurally and semantically coherent synthetic progress notes using large language models (LLMs). Our approach conditions the LLMs on a triad of complementary biomedical signals: (1) real-world progress notes from MIMIC-III, (2) clinically aligned case reports from the PMC Patients dataset, selected via embedding-based retrieval, and (3) structured disease-centric knowledge from PrimeKG. We design a multi-source, layout-aware prompting pipeline that dynamically integrates structured and unstructured information to produce notes across standard clinical formats (e.g., SOAP, BIRP, PIE, DAP). Through rigorous evaluations-including layout adherence, entity extraction comparisons, semantic similarity analysis, and controlled ablations, we demonstrate that our generated notes achieve a 98.6% semantic entity alignment score with real clinical notes, while maintaining high structural fidelity. Ablation studies further confirm the critical role of combining structured biomedical knowledge and unstructured narrative data in improving note quality. In addition, we illustrate the potential of our synthetic notes in privacy-preserving clinical NLP, offering a safe alternative for model development and benchmarking in sensitive healthcare settings. This work establishes a scalable, controllable paradigm for clinical text synthesis, significantly expanding access to realistic, diverse progress notes and laying the foundation for advancing trustworthy clinical NLP research.</p>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":" ","pages":"104927"},"PeriodicalIF":4.5,"publicationDate":"2025-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145329248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Integrated analysis for electronic health records with structured and sporadic missingness. 具有结构化和零星缺失的电子健康记录的综合分析。

IF 4.5 2区医学

Journal of Biomedical Informatics Pub Date : 2025-10-16 DOI: 10.1016/j.jbi.2025.104933

Jianbin Tan, Yan Zhang, Chuan Hong, T Tony Cai, Tianxi Cai, Anru R Zhang

{"title":"Integrated analysis for electronic health records with structured and sporadic missingness.","authors":"Jianbin Tan, Yan Zhang, Chuan Hong, T Tony Cai, Tianxi Cai, Anru R Zhang","doi":"10.1016/j.jbi.2025.104933","DOIUrl":"10.1016/j.jbi.2025.104933","url":null,"abstract":"<p><strong>Objectives: </strong>We propose a novel imputation method tailored for Electronic Health Records (EHRs) with structured and sporadic missingness. Such missingness frequently arises in the integration of heterogeneous EHR datasets for downstream clinical applications. By addressing these gaps, our method provides a practical solution for integrated analysis, enhancing data utility and advancing the understanding of population health.</p><p><strong>Materials and methods: </strong>We begin by demonstrating structured and sporadic missing mechanisms in the integrated analysis of EHR data. Following this, we introduce a novel imputation framework, Macomss, specifically designed to handle structurally and heterogeneously occurring missing data. We establish theoretical guarantees for Macomss, ensuring its robustness in preserving the integrity and reliability of integrated analyses. To assess its empirical performance, we conduct extensive simulation studies that replicate the complex missingness patterns observed in real-world EHR systems, complemented by validation using EHR datasets from the Duke University Health System (DUHS).</p><p><strong>Results: </strong>Simulation studies show that our approach consistently outperforms existing imputation methods. Using datasets from three hospitals within DUHS, Macomss achieves the lowest imputation errors for missing data in most cases and provides superior or comparable downstream prediction performance compared to benchmark methods.</p><p><strong>Discussion: </strong>The proposed method effectively addresses critical missingness patterns that arise in the integrated analysis of EHR datasets, enhancing the robustness and generalizability of clinical predictions.</p><p><strong>Conclusions: </strong>We provide a theoretically guaranteed and practically meaningful method for imputing structured and sporadic missing data, enabling accurate and reliable integrated analysis across multiple EHR datasets. The proposed approach holds significant potential for advancing research in population health.</p>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":" ","pages":"104933"},"PeriodicalIF":4.5,"publicationDate":"2025-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145318292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Advancing healthcare analytics: a thematic review of machine learning, health informatics, and real-world data applications. 推进医疗保健分析：机器学习，健康信息学和现实世界数据应用的专题审查。

IF 4.5 2区医学

Journal of Biomedical Informatics Pub Date : 2025-10-16 DOI: 10.1016/j.jbi.2025.104934

Maria I Arias, Lorena Cadavid, Juan D Velásquez

{"title":"Advancing healthcare analytics: a thematic review of machine learning, health informatics, and real-world data applications.","authors":"Maria I Arias, Lorena Cadavid, Juan D Velásquez","doi":"10.1016/j.jbi.2025.104934","DOIUrl":"10.1016/j.jbi.2025.104934","url":null,"abstract":"<p><strong>Objective: </strong>To map the conceptual and methodological landscape of healthcare analytics by identifying dominant thematic clusters, synthesizing key trends, and outlining translational challenges and research opportunities in the field.</p><p><strong>Methods: </strong>A total of 2,281 Scopus-indexed publications were analyzed using unsupervised text mining and clustering techniques. The analysis focused on identifying recurring themes, methodological innovations, and gaps within healthcare analytics literature across clinical, administrative, and public health contexts.</p><p><strong>Results: </strong>Eight dominant themes were identified: intelligent systems for predictive healthcare, patient-centered health analytics, adaptive AI for clinical insights, demographic health analytics, digital mental health surveillance, ethical analytics for health surveillance, personalized care through data analytics, and AI-driven insights for outbreak response. These reflect a transition toward real-time, multimodal, and ethically grounded analytics ecosystems. Persistent challenges include data interoperability, algorithmic opacity, standardization of evaluation, and demographic bias.</p><p><strong>Conclusions: </strong>The review highlights emerging priorities, including explainable AI, federated learning, and context-aware modeling, as well as ethical considerations related to data privacy and digital equity. Practical recommendations include co-designing with healthcare professionals, investing in infrastructure, and deploying real-time clinical decision support. Healthcare analytics is positioned as a foundational pillar of learning health systems with broad implications for translational research and precision health.</p>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":" ","pages":"104934"},"PeriodicalIF":4.5,"publicationDate":"2025-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145318166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Interpretable statistical modeling of patient flow in emergency departments 急诊科病人流动的可解释统计模型。

IF 4.5 2区医学

Journal of Biomedical Informatics Pub Date : 2025-10-14 DOI: 10.1016/j.jbi.2025.104937

Hugo Álvarez-Chaves, María D. R-Moreno

{"title":"Interpretable statistical modeling of patient flow in emergency departments","authors":"Hugo Álvarez-Chaves, María D. R-Moreno","doi":"10.1016/j.jbi.2025.104937","DOIUrl":"10.1016/j.jbi.2025.104937","url":null,"abstract":"<div><h3>Objective:</h3><div>This paper aims to develop a data-driven simulation framework for modeling patient flow in a hospital Emergency Department using interpretable methods throughout the entire process in the absence of system resource data. The goal is to improve understanding of system dynamics and support decision-making processes through transparent simulations, even when resource data are unavailable.</div></div><div><h3>Methods:</h3><div>We developed a simulation framework using anonymized medical records from a Spanish hospital’s Emergency Department. The model captures patient flow considering triage levels by identifying routes and measuring the transition times between each stage in them. We estimated these transitions using both parametric (theoretical) distributions and non-parametric Kernel Density Estimation (KDE). Patient admissions times are modeled by using probability distributions. We enhanced realism through an iterative refinement process guided by tolerance thresholds and quantitative metrics. This process refined the synthetic data to match the original distributions.</div></div><div><h3>Results:</h3><div>Our approach produces highly realistic patient flow simulations with low tolerance values in the iterative method. The process gradually converges toward the original data. Distance and divergence metrics, together with statistical test results, indicate a high degree of similarity between the simulations and the real data, passing the Mann–Whitney U and Kolmogorov–Smirnov tests simultaneously in 100% of the generated samples when the tolerance threshold is low.</div></div><div><h3>Conclusion:</h3><div>The experimental results demonstrate that our simulation method effectively reproduces patient flow dynamics with a high level of realism and flexibility, even in the absence of information related to service resources. Its interpretable design and adjustable parameters enable safe data analysis and the exploration of alternative management strategies (e.g., modifying potential patient routes or restricting some transitions). These features position the methodology as a valuable tool for supporting informed decision-making and suggest its potential for use in other hospitals with suitable data, pending validation on external datasets.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"171 ","pages":"Article 104937"},"PeriodicalIF":4.5,"publicationDate":"2025-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145308156","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A comprehensive evaluation framework for synthetic medical tabular data generation 合成医学表格数据生成的综合评价框架。

IF 4.5 2区医学

Journal of Biomedical Informatics Pub Date : 2025-10-14 DOI: 10.1016/j.jbi.2025.104939

Anastasia Kurakova, Hajar Homayouni

{"title":"A comprehensive evaluation framework for synthetic medical tabular data generation","authors":"Anastasia Kurakova, Hajar Homayouni","doi":"10.1016/j.jbi.2025.104939","DOIUrl":"10.1016/j.jbi.2025.104939","url":null,"abstract":"<div><div>Machine learning (ML) applications have enabled significant advancements in healthcare, such as predicting pandemics, personalizing treatments, and developing life-saving drugs. However, ML model training requires large datasets, which are difficult to obtain in healthcare due to privacy concerns. Synthetic data generation offers a promising solution by providing access to large-scale training data while protecting patient privacy. Our research focuses on tabular medical data, the predominant format for Electronic Health Records (EHRs), and introduces a comprehensive evaluation framework that assesses synthetic data in four critical dimensions: quality, privacy, usability, and computational complexity of the data generation process. The framework ensures that synthetic data maintains sufficient similarity to real data for ML applications while preserving patient confidentiality. To validate our approach, we applied six state-of-the-art (SOTA) generative models to generate synthetic medical datasets and evaluated them within our framework. In contrast to conventional approaches that focus primarily on statistical similarity, our framework provides a broader assessment that incorporates outlier detection, privacy risks, and domain-specific constraints. Our findings demonstrate that our framework can identify critical shortcomings in synthetic data generation models, such as the amplification of duplicate rows and the generation of out-of-range values, which are overlooked by traditional statistical evaluation methods. Our implementation of the framework is available at: <span><span>https://github.com/akurakova/SDE_Framework</span><svg><path></path></svg></span></div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"171 ","pages":"Article 104939"},"PeriodicalIF":4.5,"publicationDate":"2025-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145308155","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0