Raquel Ochoa-Ornelas , Alberto Gudiño-Ochoa , Julio Alberto García-Rodríguez , Sofia Uribe-Toscano
{"title":"A robust transfer learning approach with histopathological images for lung and colon cancer detection using EfficientNetB3","authors":"Raquel Ochoa-Ornelas , Alberto Gudiño-Ochoa , Julio Alberto García-Rodríguez , Sofia Uribe-Toscano","doi":"10.1016/j.health.2025.100391","DOIUrl":"10.1016/j.health.2025.100391","url":null,"abstract":"<div><div>Lung and colon cancers are among the deadliest diseases worldwide, necessitating early and accurate detection to improve patient outcomes. This study utilizes the EfficientNetB3 model, a state-of-the-art transfer learning approach, to enhance the detection of colon and lung cancers from histopathological images. The research leverages the LC25000 dataset, comprising 25,000 histopathological images evenly distributed across five classes: colon adenocarcinoma, benign colon tissue, lung adenocarcinoma, lung squamous cell carcinoma, and benign lung tissue. The EfficientNetB3 model initially achieved an impressive accuracy of 99.39% across all classes. To further validate and enhance the model’s robustness and generalizability, we augmented the dataset by replacing 1,000 cancerous class images with new Genomic Data Commons (GDC) Data Portal - National Cancer Institute images, simulating more diverse clinical scenarios. This modification resulted in an accuracy of 99.39%, with equally high performance across other metrics, including precision, recall, and F1-Score, all reaching 99.39%, and a Matthew’s Correlation Coefficient (MCC) of 99.24%. The Gradient-weighted Class Activation Mapping (Grad-CAM) technique was utilized to visually interpret the model’s decisions, enhancing its transparency and reliability. These findings demonstrate that EfficientNetB3 is an effective and generalizable end-to-end framework for histopathological image analysis with minimal preprocessing. The promising results underscore the potential of EfficientNetB3 to advance automated cancer detection, thereby contributing to earlier diagnosis and more effective treatment strategies.</div></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":"7 ","pages":"Article 100391"},"PeriodicalIF":0.0,"publicationDate":"2025-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143806834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A clustering-based federated deep learning approach for enhancing diabetes management with privacy-preserving edge artificial intelligence","authors":"Xinyi Yang, Juan Li","doi":"10.1016/j.health.2025.100392","DOIUrl":"10.1016/j.health.2025.100392","url":null,"abstract":"<div><div>The increasing prevalence of diabetes necessitates innovative glucose prediction methods that prioritize patient privacy. While edge artificial intelligence (AI) offers potential, its limitations in resource-constrained devices can be mitigated through federated learning (FL). However, challenges remain in accounting for patient variability and optimizing FL for glucose prediction. This research introduces a novel personalized clustering-based federated deep learning (Clu-FDL) model to address these challenges. We develop tailored models that enhance prediction accuracy by clustering patients based on carbohydrate (CHO) intake patterns. Utilizing Simple Recurrent Neural Network (SimpleRNN) and Gated Recurrent Unit (GRU) methods, the study evaluates the performance of local patients who contribute to training the cluster and global (non-cluster) models. The results show that the Clu-FDL approach achieves high precision (0.93), recall (0.96), and F1 scores (0.95), along with low Root Mean Square Error (RMSE) values (11.08 ± 1.77 mg/dL). Additionally, for new patients with different data durations, analysis based on 0.25–3 days of data indicates that Clu-FDL models exhibit greater stability, with smaller RMSE and higher precision, recall, and F1 scores compared to non-clustering models. The study identifies that SimpleRNN and GRU models are most effective for new patients with 9 and 6 days of data. This privacy-preserving, clustering-based personalized approach empowers patients to manage their diabetes effectively.</div></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":"7 ","pages":"Article 100392"},"PeriodicalIF":0.0,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143760594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A comparative study of explainable machine learning models with Shapley values for diabetes prediction","authors":"Keona Pang","doi":"10.1016/j.health.2025.100390","DOIUrl":"10.1016/j.health.2025.100390","url":null,"abstract":"<div><div>Over the years, numerous machine learning models have been developed, leading to successful applications across various fields. This study uses a large dataset related to type 2 diabetes prediction from the Centers for Disease Control and Prevention (CDC) in the United States. The dataset with 70692 samples has 21 input features and one output (non-diabetes or diabetes). In addition to health indicators like Body Mass Index (BMI), blood pressure, and cholesterol level, the features include socioeconomic factors (e.g., income, education) and lifestyle factors such as diet and physical activity. This paper aims to study how these features influence diabetes risk. 80 % of the dataset is used for training and 20 % for testing. Six machine learning models, as well as the Multivariate Adaptive Regression Splines (MARS) model, were used in the investigation. A detailed comparison of the performance of these models is given. Shapley values explain the nature of various machine learning models using visualization by color graphs to demonstrate the reliability of different machine learning models. This paper shows how Shapley values can improve their explainability and interpretability on diabetes prediction. We leverage the SHapley Additive exPlanations (SHAP) scores to provide information about the relative importance of each predictive feature, and these results shed light on the relationship between the features and the risk of developing type 2 diabetes.</div></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":"7 ","pages":"Article 100390"},"PeriodicalIF":0.0,"publicationDate":"2025-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143629245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A machine learning model for automated contact tracing during disease outbreaks","authors":"Zeyad Aklah , Amean Al-Safi , Marwa H. Abdali , Khalid Al-jabery","doi":"10.1016/j.health.2025.100389","DOIUrl":"10.1016/j.health.2025.100389","url":null,"abstract":"<div><div>This study aims to develop and evaluate a conceptual model for assessing the Risk of Infection (ROI) within the context of automated digital contact tracing during pandemics. The proposed model incorporates five input parameters: distance, overlap time, contamination interval, incubation time, and contact facility size. These parameters capture various aspects of disease transmission dynamics. The model employs logistic functions to quantify the influence of each parameter on the overall ROI. The evaluation of the model involves two methods: a partial evaluation to observe the impact of parameter pairs on ROI, and a full evaluation, which is trained on a dataset of 24,000 simulated scenarios to identify central clusters for high, medium, and low-risk categories using K-means and the Hidden Markov Model. Additionally, the model is tested on another 16,000 simulated scenarios to assess its overall performance. Results indicate that the Hidden Markov Model categorizes 63.8% of the testing dataset as low risk, 20.7% as medium risk, and 15.5% as high risk. In contrast, K-means classifies 44.3% as low risk, 30.7% as medium risk, and 25% as high risk. The evaluation metrics favor the Hidden Markov Model, which demonstrates higher performance in terms of Log-Likelihood, with a value of 50,688, as well as in the Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC), with values of -101,365.6430 and -101,319.5609, respectively. In both evaluations, the results validate the model’s ability to automate digital contact tracing based on the input parameters. Future studies could explore classification accuracy using real contact tracing datasets. The proposed approach enhances the efficiency of public health authorities by directing their efforts toward individuals with the highest risk of infection, rather than applying the same level of intervention indiscriminately to everyone.</div></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":"7 ","pages":"Article 100389"},"PeriodicalIF":0.0,"publicationDate":"2025-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143619509","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A machine learning and neural network approach for classifying multidrug-resistant bacterial infections","authors":"Preeda Mengsiri , Ratchadaporn Ungcharoen , Sethavidh Gertphol","doi":"10.1016/j.health.2025.100388","DOIUrl":"10.1016/j.health.2025.100388","url":null,"abstract":"<div><div>Antimicrobial resistance (AMR) represents a major public health challenge, significantly complicating infection prevention and treatment. This study employs machine learning and neural network techniques to classify multidrug-resistant Gram-negative bacterial (MDR-GNB) infections using electronic health records from 624 patients at Thatphanom Crown Prince Hospital in Thailand. We compared several algorithms, including Logistic Regression, Random Forest, Support Vector Machine (SVM), Extreme Gradient Boosting (XGBoost), K-Nearest Neighbors (KNN), Multilayer Perceptron (MLP), and Light Gradient Boosting Machine (LightGBM), with the MLP model exhibiting the highest accuracy and specificity. Performance was further enhanced by integrating feature selection methods such as Sequential Forward Selection (SFS), Recursive Feature Elimination with Cross-Validation (RFE-CV), and SelectKBest with data augmentation techniques, including ADASYN and SMOTE variants. Utilizing SHapley Additive exPlanations (SHAP) provided valuable insights into the most influential predictors for MDR-GNB. Notably, the MLP model achieved an AUC of 0.70, surpassing prior studies and highlighting its potential to advance clinical decision-making in managing MDR-GNB infections.</div></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":"7 ","pages":"Article 100388"},"PeriodicalIF":0.0,"publicationDate":"2025-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143510183","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An exploration of the interplay between treatment and vaccination in an Age-Structured Malaria Model using non-linear ordinary differential equations","authors":"Mahmudul Bari Hridoy, Angela Peace","doi":"10.1016/j.health.2025.100386","DOIUrl":"10.1016/j.health.2025.100386","url":null,"abstract":"<div><div>Malaria continues to be a significant global health challenge, particularly in tropical regions. Resistance to key antimalarial drugs is spreading, complicating treatment efforts. While progress toward eradication has been slow, the development and introduction of novel malaria vaccines offer hope for reducing the disease burden in endemic areas. To address these challenges, we develop an extended Susceptible–Exposed–Infected–Recovered (SEIR) age-structured model incorporating malaria vaccination for children, drug-sensitive and drug-resistant strains, and interactions between human hosts and mosquitoes. Our research evaluates how malaria vaccination coverage influences disease prevalence and transmission dynamics. We derive both strains’ basic, intervention, and invasion reproduction numbers and conduct sensitivity analysis to identify key parameters affecting infection prevalence. Our findings reveal that model outcomes are primarily influenced by scale factors that reduce transmission and natural recovery rates for the resistant strain, as well as by drug treatment and vaccination efficacies and mosquito death rates. Numerical simulations indicate that while treatment reduces the malaria disease burden, it also increases the proportion of drug-resistant cases. Conversely, higher vaccination efficacy correlates with lower infection cases for both strains. These results suggest that a synergistic approach involving vaccination and treatment could effectively decrease the overall proportion of the infected population.</div></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":"7 ","pages":"Article 100386"},"PeriodicalIF":0.0,"publicationDate":"2025-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143480475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A data-driven approach to pricing models for balanced public–private healthcare systems","authors":"Aydin Teymourifar , Onur Kaya , Gurkan Ozturk","doi":"10.1016/j.health.2025.100385","DOIUrl":"10.1016/j.health.2025.100385","url":null,"abstract":"<div><div>This study focuses on a real-world healthcare system with coexisting public and private hospitals with distinct characteristics. While public hospitals have lower costs, they also suffer from long waiting times and diminishing patients’ perceived quality of care. Conversely, despite their higher fees, private hospitals offer shorter waiting times, leading to a more favorable perception of quality. A balanced healthcare system could provide societal benefits. Pricing strategies greatly influence a patient’s hospital selection. For instance, reduced fees in private hospitals attract more patients, consequently reducing overcrowding in public facilities and elevating the overall quality of services provided. This study aims to develop pricing models to foster a balanced and socially advantageous healthcare system. This system determines private hospital pricing through contract mechanisms with the government. Thus, we delve into the ramifications of various contract models between the government and private hospitals on social utility. Our findings underscore the communal advantages of contract mechanisms. Furthermore, we generalize the proposed models to apply to similar systems.</div></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":"7 ","pages":"Article 100385"},"PeriodicalIF":0.0,"publicationDate":"2025-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143430002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An interpretable machine learning study for developing a binary classifier for predicting rehospitalization from skilled nursing facilities","authors":"Zhouyang Lou , Zachary Hass , Nan Kong","doi":"10.1016/j.health.2025.100387","DOIUrl":"10.1016/j.health.2025.100387","url":null,"abstract":"<div><div>Reducing hospital readmissions for older adults discharged to a skilled nursing facility (SNF) is important to the Unites States (U.S.) both from financial and care quality perspectives. To identify potential risk factors, researchers have used data from claims, national surveys, and administrative databases to train models that predict hospital readmissions that occur within 30 days of discharge. Machine learning techniques hold promise for this binary classification task. However, analysis pipelines are underdeveloped in data balancing, feature selection, and model interpretability. In this paper, we utilized individual resident-level data from the Long-Term Care Minimum Data Set (MDS) collected from SNFs in a midwestern U.S. state (n = 93,058). We further triangulated this data with publicly available facility quality and staffing data from the Nursing Home Compares tool of the Medicare.gov and facility neighborhood data from the National Neighborhood Data Archive. We compared several machine learning models, data balancing techniques, and feature selection methods, for the prediction task. We found that XGBoost, with Synthetic Minority Oversampling Edited Nearest Neighbor (SMOTE-ENN) to balance the data, and hierarchical clustering based on spearman correlation to select the features that produces the best prediction performance. We then used SHapley Additive exPlanations (SHAP) values to identify features that contribute most to the performance and used partial dependence plots to examine curvilinear and moderating relationships between features and the risk of 30-day rehospitalization.</div></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":"7 ","pages":"Article 100387"},"PeriodicalIF":0.0,"publicationDate":"2025-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143480474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A recommender system with multi-objective hybrid Harris Hawk optimization for feature selection and disease diagnosis","authors":"Madhusree Kuanr, Puspanjali Mohapatra","doi":"10.1016/j.health.2025.100384","DOIUrl":"10.1016/j.health.2025.100384","url":null,"abstract":"<div><div>This study proposes a health recommender system to analyze health risk and disease prediction by identifying the most responsible disease-causing factors using a hybrid Genetic–Harris Hawk optimization multi-objective feature selection approach. The proposed recommender system uses the Tree-based Pipeline Optimization Tool (TPOT) automated machine learning model to recommend the most suitable machine learning prediction model with the best classifier in terms of classification accuracy for a disease with the selected features. It also recommends the top three disease-causing features for a particular disease that can be utilized to analyze a person’s health risk. The proposed system has also been compared with the competing prediction approaches using Principal Component Analysis (PCA), Singular Vector Decomposition (SVD), and Autoencoders. We show that the proposed system outperforms competing approaches in terms of classification accuracy.</div></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":"7 ","pages":"Article 100384"},"PeriodicalIF":0.0,"publicationDate":"2025-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143172046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J.E. Camacho-Cogollo , Cristhian Felipe Patiño Zambrano , Christian Lochmuller , Claudia C. Colmenares-Mejia , Nicolas Rozo , Mario A. Isaza-Ruget , Paul Rodriguez , Andrés García
{"title":"An application of natural language processing for hypoglycemic event identification in patients with diabetes mellitus","authors":"J.E. Camacho-Cogollo , Cristhian Felipe Patiño Zambrano , Christian Lochmuller , Claudia C. Colmenares-Mejia , Nicolas Rozo , Mario A. Isaza-Ruget , Paul Rodriguez , Andrés García","doi":"10.1016/j.health.2024.100381","DOIUrl":"10.1016/j.health.2024.100381","url":null,"abstract":"<div><div>The therapeutic goal for diabetes mellitus is to maintain normal blood glucose levels, but in some cases, hypoglycemia may occur as a consequence of treatment. Identifying patients with hypoglycemia is critical to preventing adverse events and mortality. However, hypoglycemic events are often not accurately documented in electronic health records (EHRs). This study presents a retrospective analysis of the EHRs of patients with diabetes mellitus. We hypothesize that text analytics and machine learning can identify possible hypoglycemic incidents from unstructured physician notes in electronic health records. Our analysis applies these techniques using the Python programming language as a tool. It also considers words that describe symptoms related to hypoglycemia. The analysis involves searching physicians' notes for keywords and applying supervised classification methods to 146,542 records. Natural language processing (NLP) and machine learning algorithms are used to identify possible hypoglycemic events and related symptoms in physicians’ notes. A multi-layer perceptron (MLP) model produces the best classification performance among all the models tested in this study, with an obtained accuracy of 0.87. We show that the NLP approach can effectively identify and automate the text-based detection process of potential hypoglycemic events, and can subsequently be used to make informed decisions about potential patient risks.</div></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":"7 ","pages":"Article 100381"},"PeriodicalIF":0.0,"publicationDate":"2025-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143172047","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}