Soheila Molaei, Nima Ghanbari Bousejin, Ghadeer O Ghosheh, Anshul Thakur, Vinod Kumar Chauhan, Tingting Zhu, David A Clifton
{"title":"CliqueFluxNet: Unveiling EHR Insights with Stochastic Edge Fluxing and Maximal Clique Utilisation Using Graph Neural Networks.","authors":"Soheila Molaei, Nima Ghanbari Bousejin, Ghadeer O Ghosheh, Anshul Thakur, Vinod Kumar Chauhan, Tingting Zhu, David A Clifton","doi":"10.1007/s41666-024-00169-2","DOIUrl":"10.1007/s41666-024-00169-2","url":null,"abstract":"<p><p>Electronic Health Records (EHRs) play a crucial role in shaping predictive are models, yet they encounter challenges such as significant data gaps and class imbalances. Traditional Graph Neural Network (GNN) approaches have limitations in fully leveraging neighbourhood data or demanding intensive computational requirements for regularisation. To address this challenge, we introduce CliqueFluxNet, a novel framework that innovatively constructs a patient similarity graph to maximise cliques, thereby highlighting strong inter-patient connections. At the heart of CliqueFluxNet lies its stochastic edge fluxing strategy - a dynamic process involving random edge addition and removal during training. This strategy aims to enhance the model's generalisability and mitigate overfitting. Our empirical analysis, conducted on MIMIC-III and eICU datasets, focuses on the tasks of mortality and readmission prediction. It demonstrates significant progress in representation learning, particularly in scenarios with limited data availability. Qualitative assessments further underscore CliqueFluxNet's effectiveness in extracting meaningful EHR representations, solidifying its potential for advancing GNN applications in healthcare analytics.</p>","PeriodicalId":101413,"journal":{"name":"Journal of healthcare informatics research","volume":"8 3","pages":"555-575"},"PeriodicalIF":5.4,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11310186/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141918532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cathy Shyr, Yan Hu, L. Bastarache, Alex Cheng, Rizwan Hamid, Paul Harris, Hua Xu
{"title":"Identifying and Extracting Rare Diseases and Their Phenotypes with Large Language Models","authors":"Cathy Shyr, Yan Hu, L. Bastarache, Alex Cheng, Rizwan Hamid, Paul Harris, Hua Xu","doi":"10.1007/s41666-023-00155-0","DOIUrl":"https://doi.org/10.1007/s41666-023-00155-0","url":null,"abstract":"","PeriodicalId":101413,"journal":{"name":"Journal of healthcare informatics research","volume":"54 2","pages":"1-24"},"PeriodicalIF":0.0,"publicationDate":"2024-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139381790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Contextual Word Embedding for Biomedical Knowledge Extraction: a Rapid Review and Case Study","authors":"Dinithi Vithanage, Ping Yu, Lei Wang, Chao Deng","doi":"10.1007/s41666-023-00157-y","DOIUrl":"https://doi.org/10.1007/s41666-023-00157-y","url":null,"abstract":"","PeriodicalId":101413,"journal":{"name":"Journal of healthcare informatics research","volume":"22 7","pages":"1-22"},"PeriodicalIF":0.0,"publicationDate":"2024-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139389543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Brain Activity is Influenced by How High Dimensional Data are Represented: An EEG Study of Scatterplot Diagnostic (Scagnostics) Measures","authors":"Ronak Etemadpour, Sonali Shintree, A. D. Shereen","doi":"10.1007/s41666-023-00145-2","DOIUrl":"https://doi.org/10.1007/s41666-023-00145-2","url":null,"abstract":"","PeriodicalId":101413,"journal":{"name":"Journal of healthcare informatics research","volume":"65 9","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139009866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ban Al-Sahab, Alan Leviton, Tobias Loddenkemper, Nigel Paneth, Bo Zhang
{"title":"Biases in Electronic Health Records Data for Generating Real-World Evidence: An Overview","authors":"Ban Al-Sahab, Alan Leviton, Tobias Loddenkemper, Nigel Paneth, Bo Zhang","doi":"10.1007/s41666-023-00153-2","DOIUrl":"https://doi.org/10.1007/s41666-023-00153-2","url":null,"abstract":"","PeriodicalId":101413,"journal":{"name":"Journal of healthcare informatics research","volume":"58 32","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134902772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tomas M. Bosschieter, Zifei Xu, Hui Lan, Benjamin J. Lengerich, Harsha Nori, Ian Painter, Vivienne Souter, Rich Caruana
{"title":"Interpretable Predictive Models to Understand Risk Factors for Maternal and Fetal Outcomes","authors":"Tomas M. Bosschieter, Zifei Xu, Hui Lan, Benjamin J. Lengerich, Harsha Nori, Ian Painter, Vivienne Souter, Rich Caruana","doi":"10.1007/s41666-023-00151-4","DOIUrl":"https://doi.org/10.1007/s41666-023-00151-4","url":null,"abstract":"Although most pregnancies result in a good outcome, complications are not uncommon and can be associated with serious implications for mothers and babies. Predictive modeling has the potential to improve outcomes through better understanding of risk factors, heightened surveillance for high risk patients, and more timely and appropriate interventions, thereby helping obstetricians deliver better care. We identify and study the most important risk factors for four types of pregnancy complications: (i) severe maternal morbidity, (ii) shoulder dystocia, (iii) preterm preeclampsia, and (iv) antepartum stillbirth. We use an Explainable Boosting Machine (EBM), a high-accuracy glass-box learning method, for prediction and identification of important risk factors. We undertake external validation and perform an extensive robustness analysis of the EBM models. EBMs match the accuracy of other black-box ML methods such as deep neural networks and random forests, and outperform logistic regression, while being more interpretable. EBMs prove to be robust. The interpretability of the EBM models reveals surprising insights into the features contributing to risk (e.g. maternal height is the second most important feature for shoulder dystocia) and may have potential for clinical application in the prediction and prevention of serious complications in pregnancy.","PeriodicalId":101413,"journal":{"name":"Journal of healthcare informatics research","volume":"115 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135858010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lindsey E. Scierka, Brooklyn A. Bradley, Earl Glynn, Sierra Davis, Mark Hoffman, Jade B. Tam-Williams, Carlos Mena-Hurtado, Kim G. Smolderen
{"title":"Chronic Cough: Characterizing and Quantifying Burden in Adults Using a Nationwide Electronic Health Records Database","authors":"Lindsey E. Scierka, Brooklyn A. Bradley, Earl Glynn, Sierra Davis, Mark Hoffman, Jade B. Tam-Williams, Carlos Mena-Hurtado, Kim G. Smolderen","doi":"10.1007/s41666-023-00150-5","DOIUrl":"https://doi.org/10.1007/s41666-023-00150-5","url":null,"abstract":"","PeriodicalId":101413,"journal":{"name":"Journal of healthcare informatics research","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135535655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Clinical Feature Ranking Based on Ensemble Machine Learning Reveals Top Survival Factors for Glioblastoma Multiforme","authors":"Gabriel Cerono, Ombretta Melaiu, Davide Chicco","doi":"10.1007/s41666-023-00138-1","DOIUrl":"https://doi.org/10.1007/s41666-023-00138-1","url":null,"abstract":"Abstract Glioblastoma multiforme (GM) is a malignant tumor of the central nervous system considered to be highly aggressive and often carrying a terrible survival prognosis. An accurate prognosis is therefore pivotal for deciding a good treatment plan for patients. In this context, computational intelligence applied to data of electronic health records (EHRs) of patients diagnosed with this disease can be useful to predict the patients’ survival time. In this study, we evaluated different machine learning models to predict survival time in patients suffering from glioblastoma and further investigated which features were the most predictive for survival time. We applied our computational methods to three different independent open datasets of EHRs of patients with glioblastoma: the Shieh dataset of 84 patients, the Berendsen dataset of 647 patients, and the Lammer dataset of 60 patients. Our survival time prediction techniques obtained concordance index (C-index) = 0.583 in the Shieh dataset, C-index = 0.776 in the Berendsen dataset, and C-index = 0.64 in the Lammer dataset, as best results in each dataset. Since the original studies regarding the three datasets analyzed here did not provide insights about the most predictive clinical features for survival time, we investigated the feature importance among these datasets. To this end, we then utilized Random Survival Forests, which is a decision tree-based algorithm able to model non-linear interaction between different features and might be able to better capture the highly complex clinical and genetic status of these patients. Our discoveries can impact clinical practice, aiding clinicians and patients alike to decide which therapy plan is best suited for their unique clinical status.","PeriodicalId":101413,"journal":{"name":"Journal of healthcare informatics research","volume":"171 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136308334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hazal Türkmen, Oğuz Dikenelli, Cenk Eraslan, Mehmet Cem Çallı, Süha Süreyya Özbek
{"title":"BioBERTurk: Exploring Turkish Biomedical Language Model Development Strategies in Low-Resource Setting.","authors":"Hazal Türkmen, Oğuz Dikenelli, Cenk Eraslan, Mehmet Cem Çallı, Süha Süreyya Özbek","doi":"10.1007/s41666-023-00140-7","DOIUrl":"10.1007/s41666-023-00140-7","url":null,"abstract":"<p><p>Pretrained language models augmented with in-domain corpora show impressive results in biomedicine and clinical Natural Language Processing (NLP) tasks in English. However, there has been minimal work in low-resource languages. Although some pioneering works have shown promising results, many scenarios still need to be explored to engineer effective pretrained language models in biomedicine for low-resource settings. This study introduces the BioBERTurk family and four pretrained models in Turkish for biomedicine. To evaluate the models, we also introduced a labeled dataset to classify radiology reports of head CT examinations. Two parts of the reports, impressions and findings, are evaluated separately to observe the performance of models on longer and less informative text. We compared the models with the Turkish BERT (BERTurk) pretrained with general domain text, multilingual BERT (mBERT), and LSTM+attention-based baseline models. The first model initialized from BERTurk and then further pretrained with biomedical corpus performs statistically better than BERTurk, multilingual BERT, and baseline for both datasets. The second model continues to pretrain the BERTurk model by using only radiology Ph.D. theses to test the effect of task-related text. This model slightly outperformed all models on the impression dataset and showed that using only radiology-related data for continual pre-training could be effective. The third model continues to pretrain by adding radiology theses to the biomedical corpus but does not show a statistically meaningful difference for both datasets. The final model combines radiology and biomedicine corpora with the corpus of BERTurk and pretrains a BERT model from scratch. This model is the worst-performing model of the BioBERT family, even worse than BERTurk and multilingual BERT.</p>","PeriodicalId":101413,"journal":{"name":"Journal of healthcare informatics research","volume":"7 4","pages":"433-446"},"PeriodicalIF":5.4,"publicationDate":"2023-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10620363/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71490985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Vera C Kaelin, Andrew D Boyd, Martha M Werler, Natalie Parde, Mary A Khetani
{"title":"Natural Language Processing to Classify Caregiver Strategies Supporting Participation Among Children and Youth with Craniofacial Microsomia and Other Childhood-Onset Disabilities.","authors":"Vera C Kaelin, Andrew D Boyd, Martha M Werler, Natalie Parde, Mary A Khetani","doi":"10.1007/s41666-023-00149-y","DOIUrl":"10.1007/s41666-023-00149-y","url":null,"abstract":"<p><p>Customizing participation-focused pediatric rehabilitation interventions is an important but also complex and potentially resource intensive process, which may benefit from automated and simplified steps. This research aimed at applying natural language processing to develop and identify a best performing predictive model that classifies caregiver strategies into participation-related constructs, while filtering out non-strategies. We created a dataset including 1,576 caregiver strategies obtained from 236 families of children and youth (11-17 years) with craniofacial microsomia or other childhood-onset disabilities. These strategies were annotated to four participation-related constructs and a non-strategy class. We experimented with manually created features (i.e., speech and dependency tags, predefined likely sets of words, dense lexicon features (i.e., Unified Medical Language System (UMLS) concepts)) and three classical methods (i.e., logistic regression, naïve Bayes, support vector machines (SVM)). We tested a series of binary and multinomial classification tasks applying 10-fold cross-validation on the training set (80%) to test the best performing model on the held-out test set (20%). SVM using term frequency-inverse document frequency (TF-IDF) was the best performing model for all four classification tasks, with accuracy ranging from 78.10 to 94.92% and a macro-averaged F1-score ranging from 0.58 to 0.83. Manually created features only increased model performance when filtering out non-strategies. Results suggest pipelined classification tasks (i.e., filtering out non-strategies; classification into intrinsic and extrinsic strategies; classification into participation-related constructs) for implementation into participation-focused pediatric rehabilitation interventions like Participation and Environment Measure Plus (PEM+) among caregivers who complete the Participation and Environment Measure for Children and Youth (PEM-CY).</p><p><strong>Supplementary information: </strong>The online version contains supplementary material available at 10.1007/s41666-023-00149-y.</p>","PeriodicalId":101413,"journal":{"name":"Journal of healthcare informatics research","volume":"7 4","pages":"480-500"},"PeriodicalIF":0.0,"publicationDate":"2023-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10620347/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71490987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}