Robert M. Siepmann , Giulia Baldini , Cynthia S. Schmidt , Daniel Truhn , Gustav Anton Müller-Franzes , Amin Dada , Jens Kleesiek , Felix Nensa , René Hosch
{"title":"An automated information extraction model for unstructured discharge letters using large language models and GPT-4","authors":"Robert M. Siepmann , Giulia Baldini , Cynthia S. Schmidt , Daniel Truhn , Gustav Anton Müller-Franzes , Amin Dada , Jens Kleesiek , Felix Nensa , René Hosch","doi":"10.1016/j.health.2024.100378","DOIUrl":"10.1016/j.health.2024.100378","url":null,"abstract":"<div><div>The administrative burden of manually extracting clinical information from discharge letters is a common challenge in healthcare. This study aims to explore the use of Large Language Models (LLMs), specifically Generative Pretrained Transformer 4 (GPT-4) by OpenAI, for automated extraction of diagnoses, medications, and allergies from discharge letters. Data for this study were sourced from two healthcare institutions in Germany, comprising discharge letters for ten patients from each institution. The first experiment is conducted using a standardized prompt for information extraction. However, challenges were encountered, and the prompt was fine-tuned in a second experiment to improve the results. We further tested whether open-source LLMs can achieve similar results. In the first experiment, primary diagnoses were identified with 85% accuracy and secondary diagnoses with 55.8%. Medications and allergies were extracted with 85.9% and 100% accuracy, respectively. The International Classification of Diseases, 10th revision (ICD-10) codes for the identified diagnoses achieved an accuracy of 85% for primary diagnoses and 60.7% for secondary diagnoses. Anatomical Therapeutic Chemical (ATC) codes were identified with an accuracy of 78.8%. On the other hand, open-source LLMs did not provide similar levels of accuracy and could not consistently fill the template. With prompt fine-tuning in the second experiment, the primary diagnoses, secondary diagnoses, and medications could be predicted with 95%, 88.9%, and 92.2% accuracy, respectively. GPT-4 shows excellent potential for automated extraction of crucial diagnostic and medication information from discharge letters, presumably lowering the administrative burden for healthcare professionals and improving patient outcomes.</div></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":"7 ","pages":"Article 100378"},"PeriodicalIF":0.0,"publicationDate":"2025-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143172049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An optimal control model with sensitivity analysis for COVID-19 transmission using logistic recruitment rate","authors":"Jonner Nainggolan , Moch. Fandi Ansori , Hengki Tasman","doi":"10.1016/j.health.2024.100375","DOIUrl":"10.1016/j.health.2024.100375","url":null,"abstract":"<div><div>This study proposes an optimal control model for COVID-19 spread, incorporating a logistic recruitment rate. The observations show the disease-free equilibrium exists when the population-existing threshold exceeds 1. The stability of equilibrium is determined by the basic reproduction number <span><math><msub><mrow><mi>R</mi></mrow><mrow><mn>0</mn></mrow></msub></math></span>. This implies that equilibrium is stable when <span><math><msub><mrow><mi>R</mi></mrow><mrow><mn>0</mn></mrow></msub></math></span> is less than or equal to 1, but it is unstable when the value is greater than 1. Furthermore, an endemic equilibrium and stability is recorded when <span><math><msub><mrow><mi>R</mi></mrow><mrow><mn>0</mn></mrow></msub></math></span> exceeds 1. To identify influential factors in COVID-19 spread, sensitivity index and sensitivity analyses of <span><math><msub><mrow><mi>R</mi></mrow><mrow><mn>0</mn></mrow></msub></math></span> are conducted. The model perfectly integrates both prevention and therapy controls. As a result, numerical simulations show that the prevention control is more effective than the treatment control in reducing COVID-19 spread. Moreover, the simultaneous implementation of prevention and treatment controls outperforms individual control methods in mitigating COVID-19 spread. Finally, sensitivity analysis conducted with constant controls shows the contributions of the controls to disease dynamics.</div></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":"7 ","pages":"Article 100375"},"PeriodicalIF":0.0,"publicationDate":"2025-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143172048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deterministic compartmental model for optimal control strategies of Giardiasis infection with saturating incidence and environmental dynamics","authors":"Stephen Edward , Nyimvua Shaban","doi":"10.1016/j.health.2025.100383","DOIUrl":"10.1016/j.health.2025.100383","url":null,"abstract":"<div><div>This study develops a deterministic compartmental model that tracks Giardiasis’s direct and indirect transmission dynamics. The study begins by constructing a model incorporating four constant controls: health education, screening, hospitalization, and sanitation. The analytical results of the model are investigated and presented. The positivity of the solutions and the existence of invariant regions were established. The model exhibits a unique disease-free equilibrium and multiple endemic equilibria. The effective reproduction number was derived using the Next-Generation Matrix (NGM) approach, and its implications for the stability of the equilibria were explored. Local stability of the disease-free equilibrium was confirmed using the Routh–Hurwitz criteria, while global stability results were also presented. Sensitivity analysis was conducted based on the effective reproduction number, identifying the most influential parameters. We introduce an optimal control problem to curb the spread of Giardiasis. We rigorously establish the existence of optimal control solutions and analytically characterize these solutions using Pontryagin’s Maximum Principle. We conduct numerical simulations to evaluate the effectiveness of various control strategies. The results are promising, showing that the simultaneous implementation of all four control measures, education, screening, treatment, and sanitation, can lead to a significant reduction in disease cases, thereby offering a reassuring solution to the spread of Giardiasis.</div></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":"7 ","pages":"Article 100383"},"PeriodicalIF":0.0,"publicationDate":"2025-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143171082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An exploration of machine learning approaches for early Autism Spectrum Disorder detection","authors":"Nawshin Haque, Tania Islam, Md Erfan","doi":"10.1016/j.health.2024.100379","DOIUrl":"10.1016/j.health.2024.100379","url":null,"abstract":"<div><div>Autism Spectrum Disorder is a neurodevelopmental condition impacting an individual’s repetitive behaviours, social skills, verbal and nonverbal communication abilities, and capacity for acquiring new knowledge. Manifesting typically in early childhood, specifically between 6 months and 5 years, the symptoms of autism exhibit a progressive nature over time. This study explores the application of Logistic Regression, Support Vector Classifier, K-Nearest Neighbour, Decision Tree, and Random Forest for predicting Autism in children and toddlers by leveraging advancements in machine learning. The efficacy of these techniques is evaluated using publicly accessible datasets specific to both age groups. The findings indicate remarkable performance, with the toddler dataset achieving a mean Intersection over Union (mIoU) of 100<span><math><mtext>%</mtext></math></span> for Support Vector Classifier and 99.80<span><math><mtext>%</mtext></math></span> for Logistic Regression. Similarly, the children dataset demonstrates outstanding results, achieving an mIoU of 100<span><math><mtext>%</mtext></math></span> for Support Vector Classifier and 99.96<span><math><mtext>%</mtext></math></span> for Logistic Regression. Furthermore, all algorithms achieved 100<span><math><mtext>%</mtext></math></span> accuracy on the children (age 4–11) dataset collected from real-world sources. Logistic Regression, Random Forest, Support Vector Classifier, and Decision Tree attained 100<span><math><mtext>%</mtext></math></span> accuracy and mIoU with the real-world dataset. These results underscore the potential of machine learning in aiding the early detection of ASD in children and toddlers, offering promising avenues for future research and clinical applications.</div></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":"7 ","pages":"Article 100379"},"PeriodicalIF":0.0,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143169862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Viljami Männikkö , Juha Turunen , Heidi Åhman , Esa Harju
{"title":"A large-scale risk assessment and classification model for pneumococcus using Finnish national health data","authors":"Viljami Männikkö , Juha Turunen , Heidi Åhman , Esa Harju","doi":"10.1016/j.health.2025.100382","DOIUrl":"10.1016/j.health.2025.100382","url":null,"abstract":"<div><div><em>Streptococcus pneumoniae</em>, or pneumococcus, poses a significant health risk, particularly to infants, the elderly, and individuals with underlying medical conditions. In Finland, pneumococcal vaccination is part of the national immunization program, with vaccination provided to young children and only selected at-risk adult populations included. This study aims to leverage the Finnish national electronic health record system, Kanta, to analyze treatment histories and identify individuals at increased risk for disease to improve vaccination strategies. Kanta provides a comprehensive, nationwide database of patient treatment histories, which can be utilized to track individual risk factors and disease episodes. We analyzed health data from 96,200 Finnish residents with risk factors for pneumococcal disease following guidelines from the Finnish Institute for Health and Welfare and the World Health Organization. We prioritize vaccination for those at the greatest risk by categorizing individuals based on their identified risk factors. This study demonstrates the potential for using national health record data to conduct large-scale risk analyses, allowing for more targeted and efficient vaccination strategies. The novelty of our approach lies in the automatic identification of high-risk individuals, which can inform public health initiatives and enhance the monitoring of pneumococcal disease risk at a population level.</div></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":"7 ","pages":"Article 100382"},"PeriodicalIF":0.0,"publicationDate":"2025-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143171083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A comparative assessment of machine learning models and algorithms for osteosarcoma cancer detection and classification","authors":"Amoakoh Gyasi-Agyei","doi":"10.1016/j.health.2024.100380","DOIUrl":"10.1016/j.health.2024.100380","url":null,"abstract":"<div><div>Osteosarcoma is a bone-forming tumor that is more common in children and young adults than in adults. Timely detection and classification of its type is crucial to its proper treatment and possible survival. Machine learning (ML) models trained on disease datasets are more effective in detection and classification than the conventional methods with hand-crafted features highly dependent on pathologists’ expertise. A publicly available raw osteosarcoma dataset was explored and then preprocessed using different combinations of data denoising techniques (including principal component analysis, mutual information gain, analysis of variance and Kendall’s rank correlation analysis) and data augmentation to <em>derive</em> seven different datasets. Using the seven derived datasets and eight ML algorithms, this study designed and performed an extensive comparative analysis of seven sets of ML models (altogether over 160 models) with their hyperparameters optimized using grid search. The performance differences between the learned ML models were then validated using repeated stratified 10-fold cross-validation and 5x2 cross-validation paired <em>t</em>-tests to select the best model for our task. The empirical model based on the extra trees algorithm and fitted to class-balanced dataset via random oversampling and multicollinearity removed via principal component analysis proved to be the best, as it detected and classified osteosarcoma cancer in 10 ms with 97.8% area under the receiver operating characteristics curve and acceptably low false alarm and misdetection. Thus, the proposed models can be cutting-edge techniques for automated detection and classification of osteosarcoma tumors to aid timely diagnosis, prognosis, and treatment.</div></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":"7 ","pages":"Article 100380"},"PeriodicalIF":0.0,"publicationDate":"2025-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143169863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Agus Mansur , Ivan Darma Wangsa , Novrianty Rizky , Iwan Vanany
{"title":"An efficient blood supply chain network model with multiple echelons for managing outdated products","authors":"Agus Mansur , Ivan Darma Wangsa , Novrianty Rizky , Iwan Vanany","doi":"10.1016/j.health.2024.100377","DOIUrl":"10.1016/j.health.2024.100377","url":null,"abstract":"<div><div>This study examines the lack of coordination between blood production and inventories in the blood supply chain networks. Prior studies neglect to optimize operational costs through blood production, inventory, and waste. We propose a mixed-integer linear programming approach addressing multiple echelons, types of blood, and blood bag shelf lifetime. The model is developed by determining the facility locations, assigning regional blood banks, and allocating the right products. Indonesia's blood supply chain is used as a case study to evaluate the applicability of the proposed model using optimization software. A sensitivity analysis is performed on production rate and patient demand to assess how these factors affect the overall cost of expired products. The results show that the proposed method's total cost and expired products are 4.69%–5.60% and 4.71%–5.75%, respectively.</div></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":"7 ","pages":"Article 100377"},"PeriodicalIF":0.0,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143169864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An enhanced machine learning approach with stacking ensemble learner for accurate liver cancer diagnosis using feature selection and gene expression data","authors":"Amena Mahmoud , Eiko Takaoka","doi":"10.1016/j.health.2024.100373","DOIUrl":"10.1016/j.health.2024.100373","url":null,"abstract":"<div><div>Liver cancer is a significant global health concern, necessitating accurate and timely diagnosis for effective treatment. Machine learning approaches have emerged as promising tools for improving liver cancer classification using gene expression data in recent years. This study presents an advanced machine learning approach for liver cancer diagnosis using gene expression data, combining feature selection techniques with a stacking ensemble learning model. Our method addresses the challenges of high dimensionality and complex patterns in genomic data to improve diagnostic accuracy and interpretability. We employed a feature selection process to identify the most relevant gene expressions associated with liver cancer. This approach reduced the dimensionality of the data while preserving crucial biological information. The selected features were then used to train a stacking ensemble model, which combined multiple base learners, including Multi-Layer Perceptron (MLP), Random Forest (RF) model, K-nearest neighbor (KNN) model, and Support vector machine (SVM), with a meta-learner Extreme Gradient Boosting (Xgboost) model to make final predictions. The stacking ensemble achieved an accuracy of (97%), outperforming individual machine learning algorithms and traditional diagnostic methods. Furthermore, the model demonstrated high sensitivity (96.8%) and specificity (98.1%), crucial for early detection and minimizing false positives.</div></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":"7 ","pages":"Article 100373"},"PeriodicalIF":0.0,"publicationDate":"2024-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143171470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Syed Muhammad Salman Bukhari , Muhammad Hamza Zafar , Syed Kumayl Raza Moosavi , Majad Mansoor , Filippo Sanfilippo
{"title":"An integrated stacked convolutional neural network and the levy flight-based grasshopper optimization algorithm for predicting heart disease","authors":"Syed Muhammad Salman Bukhari , Muhammad Hamza Zafar , Syed Kumayl Raza Moosavi , Majad Mansoor , Filippo Sanfilippo","doi":"10.1016/j.health.2024.100374","DOIUrl":"10.1016/j.health.2024.100374","url":null,"abstract":"<div><div>Cardiovascular disease is the leading cause of death worldwide, including critical conditions such as blood vessel blockage, heart failure, and stroke. Accurate and early prediction of heart disease remains a significant challenge due to the complexity of symptoms and the variability of contributing factors. This study proposes a novel hybrid model integrating a Stacked Convolutional Neural Network (SCNN) with the Levy Flight-based Grasshopper Optimization Algorithm (LFGOA) to address this challenge. The SCNN provides robust feature extraction, while LFGOA enhances the model by optimizing hyperparameters, improving classification accuracy, and reducing overfitting. The proposed approach is evaluated using four publicly available heart disease datasets, each representing diverse clinical and demographic features. Compared to traditional classifiers, including Regression Trees, Support Vector Machine, Logistic Regression, K-Nearest Neighbors, and standard Neural Networks, the SCNN-LFGOA consistently outperforms these methods. The results highlight that the SCNN-LFGOA achieves an average accuracy of 99%, with significant improvements in specificity, sensitivity, and F1-Score, showcasing its adaptability and robustness across datasets. This study highlights the SCNN-LFGOA's potential as a transformative tool for early and accurate heart disease prediction, contributing to improved patient outcomes and more efficient healthcare resource utilization. By combining deep learning with an advanced optimization technique, this research introduces a scalable and effective solution to a critical healthcare problem.</div></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":"7 ","pages":"Article 100374"},"PeriodicalIF":0.0,"publicationDate":"2024-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143171471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optimized early fusion of handcrafted and deep learning descriptors for voice pathology detection and classification","authors":"Roohum Jegan, R. Jayagowri","doi":"10.1016/j.health.2024.100369","DOIUrl":"10.1016/j.health.2024.100369","url":null,"abstract":"<div><div>This study presents an automated noninvasive voice disorder detection and classification approach using an optimized fusion of modified glottal source estimation and deep transfer learning neural network descriptors. A new set of modified descriptors based on a glottal source estimator and pre-trained Inception-ResNet-v2 convolutional neural network-based features are proposed for the speech disorder detection and classification task. The modified feature set is obtained using mel-cepstral coefficients, harmonic model, phase discrimination means, distortion deviation descriptors, conventional wavelet, and glottal source estimation features. Early descriptor-level fusion is employed in this study for performance enhancement-however, the fusion results in higher feature vector dimensionality. A nature-inspired slime mould algorithm is utilized to remove redundant and select the best discriminating features. Finally, the classification is performed using the K-nearest neighbor (KNN) classifier. The proposed algorithm was evaluated using extensive experiments with different feature combinations, with and without feature selection, and with two popular datasets: the Arabic Voice Pathology Database (AVPD) and the Saarbrucken Voice Database (SVD). We show that the proposed optimized fusion method attained an enhanced voice pathology detection accuracy of 98.46%, encompassing a wide spectrum of voice disorders on the SVD database. Furthermore, compared to traditional handcrafted and deep neural network-based techniques, the proposed method demonstrates competitive performance with fewer features.</div></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":"6 ","pages":"Article 100369"},"PeriodicalIF":0.0,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142742998","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}