{"title":"An analytics-based framework for early detection of cervical cancer using predictive modeling","authors":"Wirapong Chansanam , Kittichai Nilubol , Pichayada Suphajaroonshab , Chunqiu Li","doi":"10.1016/j.health.2025.100442","DOIUrl":"10.1016/j.health.2025.100442","url":null,"abstract":"<div><div>This study aims to develop and evaluate advanced machine learning (ML) models for accurate and scalable early detection of cervical cancer, addressing critical limitations in current diagnostic practices. In leveraging exploratory data analysis (EDA), rigorous data preprocessing, and multiple ML techniques—including Random Forest, ANN, SVM, XGBoost, and ensemble models—we systematically analyzed a comprehensive dataset from the UCI repository comprising demographic, clinical, and behavioral features. Results indicated that the Random Forest model achieved the highest performance, with an accuracy of 98.4 %, a sensitivity of 99.3 %, and a specificity of 97.6 %, substantially surpassing the other evaluated models. Despite limitations related to dataset homogeneity and potential biases introduced by synthetic oversampling methods, these findings represent significant methodological and practical advancements. By offering an interpretable and robust diagnostic tool, the study significantly contributes to the improvement of cervical cancer detection, particularly benefitting low-resource clinical environments where effective, scalable screening methods are urgently needed. The proposed framework—developed and evaluated solely on the UCI tabular cervical cancer dataset—achieved high discriminative performance with the Random Forest model (accuracy = 98.4 %, sensitivity = 99.3 %, specificity = 97.6 %). A previously published imaging-based ResNet-50 model (AUC = 0.97) is referenced for contextual comparison only and was not part of our experimental work. However, deployment in resource-constrained environments will require further optimization and cost-efficiency analyses to confirm feasibility.</div></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":"9 ","pages":"Article 100442"},"PeriodicalIF":0.0,"publicationDate":"2026-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145791821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ripon Kumar Debnath, Al Musabbir, Md. Motaharul Islam
{"title":"A frequency-driven quantum and graph-based method for robust brain tumor analysis","authors":"Ripon Kumar Debnath, Al Musabbir, Md. Motaharul Islam","doi":"10.1016/j.health.2026.100451","DOIUrl":"10.1016/j.health.2026.100451","url":null,"abstract":"<div><div>Brain tumor segmentation remains a significant challenge in medical image analytics due to the limited ability of current models to detect small lesions, capture spectral information, and represent anatomical context effectively. This study introduces the Frequency-Quantum-Graph Network (FQG-Net), an analytical framework that integrates quantum computing principles, adaptive frequency-domain processing, and graph-based contextual learning to enhance segmentation precision. The model employs quantum entanglement and superposition effects to enrich feature representation, an adaptive frequency enhancement mechanism to amplify tumor-specific spectral characteristics, and a graph neural contextual memory to preserve spatial and anatomical relationships. Multimodal MRI data are processed through selective quantum residual blocks that dynamically activate network components based on analytical requirements, ensuring both efficiency and stability. Empirical evaluations across multiple benchmark datasets demonstrate that FQG-Net delivers consistent improvements over state-of-the-art segmentation models, achieving higher accuracy, stronger generalization across datasets, and superior performance in detecting small and heterogeneous tumor regions. These findings highlight the analytical strength of quantum-enhanced deep learning and its potential to advance precision diagnostics in healthcare imaging.</div></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":"9 ","pages":"Article 100451"},"PeriodicalIF":0.0,"publicationDate":"2026-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145977605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Eleni Rozaki , Christina Skourou , David Fitzpatrick , Michael Amoo , Eoin Minnock , Caroline Hayhurst , Nachi Palaniappan , Mohsen Javadpour
{"title":"An unsupervised learning approach to optimising tumour therapy through clinical data mining","authors":"Eleni Rozaki , Christina Skourou , David Fitzpatrick , Michael Amoo , Eoin Minnock , Caroline Hayhurst , Nachi Palaniappan , Mohsen Javadpour","doi":"10.1016/j.health.2026.100450","DOIUrl":"10.1016/j.health.2026.100450","url":null,"abstract":"<div><div>Clinical data mining and healthcare analytics enable systematic evaluation of treatment strategies in precision oncology. This study analysed a harmonised multi-centre dataset of 150 patients treated with linac-based stereotactic radiosurgery for vestibular schwannoma across three institutions in Ireland and the UK. A data-driven framework combining descriptive analytics, unsupervised clustering (K-means and Gaussian Mixture Models), and Random Forest modelling was used to assess treatment plan consistency, explore dose–response patterns, and estimate clinical outcomes. Clustering identified four treatment plan groups with distinct profiles of tumour reduction and organ-at-risk exposure. Random Forest models linked these clusters and dosimetric factors with tumour control and functional preservation. While internal performance was high, results are interpreted cautiously due to the limited sample size and absence of external validation. By integrating unsupervised learning with interpretable predictive modelling, this study provides a reproducible approach to characterising dose–response heterogeneity across centres. The findings support the future development of decision-support tools, while recognising that prescriptive optimisation requires further causal or optimisation-based modelling beyond the present work.</div></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":"9 ","pages":"Article 100450"},"PeriodicalIF":0.0,"publicationDate":"2026-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146173152","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Romuald Daniel Boy-ngbogbele , Oscar Ngesa , Thomas Mageto , Célestin C. Kokonendji
{"title":"A Bayesian framework for enhancing health data accuracy in pooled cross-sectional analysis","authors":"Romuald Daniel Boy-ngbogbele , Oscar Ngesa , Thomas Mageto , Célestin C. Kokonendji","doi":"10.1016/j.health.2026.100448","DOIUrl":"10.1016/j.health.2026.100448","url":null,"abstract":"<div><div>The analysis of pooled cross-sectional data plays a vital role in various disciplines, including epidemiology, economics, and the social sciences, by enabling the identification of trends and patterns over time. This study develops statistical models specifically designed to analyze pooled cross-sectional data while accounting for measurement error, with a particular focus on estimating the prevalence of malnutrition among children under five years of age in Cameroon. Measurement error is a persistent issue in surveys, especially in resource-limited settings where data collection accuracy may be compromised. To address this, the research employs logistic regression within a Bayesian framework to reduce the impact of measurement error on malnutrition prevalence estimates, thereby providing more reliable information for policymakers and public health professionals. Through both simulation studies and application to real-world data from Cameroon, the study demonstrates the effectiveness of the proposed models in improving the accuracy and precision of estimates, offering deeper insights into childhood malnutrition in the country. This work advances statistical methodologies for survey data analysis by providing robust tools to address measurement error and support evidence-based interventions to combat malnutrition in Cameroon and similar contexts worldwide.</div></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":"9 ","pages":"Article 100448"},"PeriodicalIF":0.0,"publicationDate":"2026-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146023111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Md Saykot Khandakar, Md Samsuddoha, Sohely Jahan, Rahat Hossain Faisal
{"title":"An automated analytics approach for diabetic retinopathy detection with ensemble deep learning models in healthcare","authors":"Md Saykot Khandakar, Md Samsuddoha, Sohely Jahan, Rahat Hossain Faisal","doi":"10.1016/j.health.2026.100449","DOIUrl":"10.1016/j.health.2026.100449","url":null,"abstract":"<div><div>Diabetic Retinopathy (DR) is a leading complication of prolonged diabetes, which poses a significant threat to vision and may lead to permanent blindness. Early identification and timely intervention are crucial to preventing disease progression. Traditionally, DR diagnosis relies on medical examination of retinal fundus images by expert ophthalmologists, which is time-consuming and resource intensive. However, deep learning techniques, particularly medical imaging, have demonstrated remarkable performance in the automated detection and classification of DR. This study proposes an ensemble-based deep learning framework using feature-level fusion stacking, which integrates four complementary convolutional neural networks named ReXInDen and three complementary convolutional neural networks named ReXDen for automated DR detection from retinal fundus images. These frameworks extract high-level features from each backbone, concatenate them into a unified representation, and classify using a feedforward neural network. Three datasets were utilized to validate the model including a region-specific dataset collected from Bangladeshi medical sources. The proposed ReXInDen model achieved accuracies of 98.27%, and 98.69% on Dataset 1 and Dataset 2, while ReXDen achieved the highest accuracy of 99.05% on Dataset 3. These results indicate a substantial improvement over individual models and demonstrate the potential of the ensemble approach to support early-stage DR detection. Moreover, these models show promise for integration into automated DR screening tools that can aid in reducing the global burden of diabetic vision loss.</div></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":"9 ","pages":"Article 100449"},"PeriodicalIF":0.0,"publicationDate":"2026-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146023161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An unsupervised machine learning approach for defining surge levels in emergency medical services","authors":"Qixuan Zhao , Adair Collins , Judah Goldstein , Onur Pakkanlilar , Peter Vanberkel","doi":"10.1016/j.health.2025.100443","DOIUrl":"10.1016/j.health.2025.100443","url":null,"abstract":"<div><div>A surge period occurs when demand significantly exceeds available capacity, creating operational strain in emergency medical services (EMS) and leading to measurable declines in system performance. Although surge levels are a critical metric for EMS operations, no established method exists for their objective definition. This study introduces a genetic algorithm-based unsupervised clustering model designed to define surge levels using EMS operational data. Unlike the National Emergency Department Overcrowding Scale, which depends on subjective assessments, the proposed approach objectively categorizes surge levels and supports regional customization through hyperparameter tuning and feature selection. The model's adaptability allows healthcare leaders to determine the desired number of surge-level categories and tailor the feature set to local operational needs. A case study in Nova Scotia, Canada, demonstrates the model's effectiveness, accurately identifying 88.96 % of busy periods with recall and precision of 96.49 % and 78.57 %, respectively. These results indicate that the approach provides a robust and flexible tool for defining surge levels, enabling data-driven decision-making in EMS system management.</div></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":"9 ","pages":"Article 100443"},"PeriodicalIF":0.0,"publicationDate":"2026-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145791820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zequn Chen , Dianhao Liu , Wesley J. Marrero , Nicholas C. Jacobson , Thomas Thesen
{"title":"A hierarchical Bayesian approach for predictive analytics of depression severity in medical students","authors":"Zequn Chen , Dianhao Liu , Wesley J. Marrero , Nicholas C. Jacobson , Thomas Thesen","doi":"10.1016/j.health.2026.100453","DOIUrl":"10.1016/j.health.2026.100453","url":null,"abstract":"<div><div>Medical students experience disproportionately high rates of depression due to intense academic pressures and clinical demands. Without timely, targeted intervention, they face increased risks of academic underperformance and adverse outcomes. Existing predictive models often adopt a one-size-fits-all approach to predict depression for the entire student population. This approach may perform poorly for sparsely represented subgroups, such as medical students. To address this limitation, we propose a hierarchical Bayesian predictive model that estimates medical students' depression severity, even when medical students comprise only a small fraction of the overall dataset. Our hierarchical Bayesian modeling framework generalizes across subgroups via partial pooling, offering a novel analytical contribution to healthcare modeling in which subgroup imbalance is prevalent. Based on data from nearly 168,000 students, our model reduces the mean absolute error of predictions by at least 31% compared to baseline models, including XGBoost and deep neural networks. Statistical analysis using Wilcoxon Rank-Sum Tests with Bonferroni correction across more than 650 previously unseen medical students confirms that our model's performance is significantly superior to established baselines. Beyond improved predictive accuracy, our model identifies key depression-related stressors, including financial hardship, international student status, smoking frequency, and eating disorders. Accurate predictions and identified stressors help clinicians and academic administrators to recognize at-risk medical students and deliver timely, targeted interventions.</div></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":"9 ","pages":"Article 100453"},"PeriodicalIF":0.0,"publicationDate":"2026-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146173153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rajarajeswari Ganesan , Carlijn M.A. Buck , Chang Sun , Marcel van’t Veer , Lukas R.C. Dekker , Frans N. van de Vosse , Wouter Huberts
{"title":"An analytical evaluation of imputation methods for enhancing cardiac care data integrity","authors":"Rajarajeswari Ganesan , Carlijn M.A. Buck , Chang Sun , Marcel van’t Veer , Lukas R.C. Dekker , Frans N. van de Vosse , Wouter Huberts","doi":"10.1016/j.health.2026.100452","DOIUrl":"10.1016/j.health.2026.100452","url":null,"abstract":"<div><div>Electronic Health Records (EHRs) are comprised of digitally stored patient and population health data. Unfortunately, EHRs are often far from complete, and these incomplete health records are referred to as missingness. Missingness in EHRs is a hindrance factor in utilizing Machine Learning (ML) for data mining and developing decision support applications. Missingness also limits EHRs’ reusability for retrospective clinical studies. In fact, missingness adversely affects the accuracy and reliability of ML models and clinical studies. Imputation is an effective approach to deal with missing values and to improve the reliability of ML models and clinical studies. However, previous imputation studies are spread across different healthcare datasets and are not universally applicable. In addition, there is a lack of studies focusing on the rationale for the imputation of healthcare datasets. Moreover, the quality of imputation methods is often assessed without considering the medical interpretation. In this study, we therefore aim to characterize the impact on the accuracy of different methods for the imputation of cardiac EHRs, specifically from a ML and medical perspective. Two cardiac EHR datasets with missing values for cardiovascular diseases (CVDs) are used. Multiple imputation methods (mean, median, K-nearest neighbor, and multiple variants of iterative imputation) are considered. From an ML perspective, the post-imputation effects are assessed by quantifying the ML models’ capability to classify CVDs. The distribution of clinically interesting variables is evaluated for clinical comprehension. Our study shows that information in missingness and magnitude of variable missingness are the key factors in the selection of imputation methods for diverse EHR-based applications.</div></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":"9 ","pages":"Article 100452"},"PeriodicalIF":0.0,"publicationDate":"2026-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146023163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Roopashri Shetty, Aditi Shrivastava, Shwetha Rai, Geetha M.
{"title":"A comparative analysis of predictive analytics approaches to uncovering subtypes of acute inflammation using machine learning","authors":"Roopashri Shetty, Aditi Shrivastava, Shwetha Rai, Geetha M.","doi":"10.1016/j.health.2025.100446","DOIUrl":"10.1016/j.health.2025.100446","url":null,"abstract":"<div><div>Early prediction of acute cystitis and acute pyelonephritis plays a critical role in improving patient outcomes. This study develops predictive analytics models for these conditions using a pre-processed Acute Inflammation Dataset and four classification algorithms: Logistic Regression (LR), Support Vector Machine (SVM), Decision Tree (DT), and Random Forest (RF). In addition, two clustering techniques, Density-Based Spatial Clustering of Applications with Noise (DBSCAN) and K-Means, are employed to uncover latent structures within the data. Both random sampling and stratified sampling are applied to ensure balanced data representation across clinical classes. The performance of the classification models is evaluated using accuracy, precision, recall, and the F1-score, while clustering performance is assessed using the Silhouette score. The results show that stratified sampling improves the performance of the DT, SVM, and LR classifiers, whereas the RF classifier achieves optimal performance under random sampling. Clustering analysis identifies two disease subclasses, with DBSCAN achieving a maximum Silhouette score of 1.0 for MinPts = 5 and epsilon values of 0.5, 1, and 2 using both Euclidean and Manhattan distance metrics. The K-Means algorithm achieves its best performance with a Silhouette score of 0.67 for K = 5 using the Minkowski distance metric. Overall, the findings demonstrate the effectiveness of machine learning and data mining techniques in enhancing diagnostic modeling and clinical decision-making for acute inflammatory conditions, contributing to more timely and accurate patient care.</div></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":"9 ","pages":"Article 100446"},"PeriodicalIF":0.0,"publicationDate":"2026-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145926224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An ensemble learning approach for predicting hospital stay in transplant patients","authors":"Zahra Gharibi","doi":"10.1016/j.health.2025.100444","DOIUrl":"10.1016/j.health.2025.100444","url":null,"abstract":"<div><div>The rising incidence of heart and lung failure has increased the demand for effective transplant management strategies. Predicting Hospital Length of Stay (HLOS) is essential for reducing cost variability, optimizing resource utilization, and supporting patient recovery. This study uses data from the United Network for Organ Sharing (UNOS) to develop and validate an Ensemble Meta Stacked (EMS) model for predicting hospitalization duration after heart and lung transplantation. Expert-informed feature engineering incorporates donor and recipient compatibility measures, while a hybrid two-stage feature selection process combines expert evaluation with the Boruta algorithm to identify key predictors across demographic, clinical, behavioral, and geographical domains. Twelve predictive models are developed, including five base learners for each organ type and an EMS model that integrates their outputs through a Random Forest (RF) meta learner. Among the base learners, RF achieves the highest accuracy, but the EMS consistently outperforms all individual models. Sensitivity analysis confirms the robustness of model performance under different feature sources and scaling procedures, while paired statistical tests confirm that the improvement in predictive accuracy of EMS compared to the base learners is not due to random variation. The study also links predictive metrics to stakeholder priorities: policymakers and payers benefit from stable forecasts that control financial variability, hospital administrators rely on consistent prediction accuracy for capacity planning and resource allocation, and clinicians depend on bias-related metrics to guide safer discharge decisions. The EMS framework advances data-driven management in transplantation, supporting more efficient, equitable, and clinically responsible care.</div></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":"9 ","pages":"Article 100444"},"PeriodicalIF":0.0,"publicationDate":"2026-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145791819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}