{"title":"Attention-based Imputation of Missing Values in Electronic Health Records Tabular Data.","authors":"Ibna Kowsar, Shourav B Rabbani, Manar D Samad","doi":"10.1109/ichi61247.2024.00030","DOIUrl":"10.1109/ichi61247.2024.00030","url":null,"abstract":"<p><p>The imputation of missing values (IMV) in electronic health records tabular data is crucial to enable machine learning for patient-specific predictive modeling. While IMV methods are developed in biostatistics and recently in machine learning, deep learning-based solutions have shown limited success in learning tabular data. This paper proposes a novel attention-based missing value imputation framework that learns to reconstruct data with missing values leveraging between-feature (self-attention) or between-sample attentions. We adopt data manipulation methods used in contrastive learning to improve the generalization of the trained imputation model. The proposed self-attention imputation method outperforms state-of-the-art statistical and machine learning-based (decision-tree) imputation methods, reducing the normalized root mean squared error by 18.4% to 74.7% on five tabular data sets and 52.6% to 82.6% on two electronic health records data sets. The proposed attention-based missing value imputation method shows superior performance across a wide range of missingness (10% to 50%) when the values are missing completely at random.</p>","PeriodicalId":73284,"journal":{"name":"IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11463999/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142395730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"<ArticleTitle xmlns:ns0=\"http://www.w3.org/1998/Math/MathML\">An average-case efficient two-stage algorithm for enumerating all longest common substrings of minimum length <ns0:math><ns0:mi>k</ns0:mi></ns0:math> between genome pairs.","authors":"Mattia Prosperi, Simone Marini, Christina Boucher","doi":"10.1109/ichi61247.2024.00020","DOIUrl":"10.1109/ichi61247.2024.00020","url":null,"abstract":"<p><p>A problem extension of the longest common substring (LCS) between two texts is the enumeration of all LCSs given a minimum length <math><mi>k</mi></math> (ALCS- <math><mi>k</mi></math> ), along with their positions in each text. In bioinformatics, an efficient solution to the ALCS- <math><mi>k</mi></math> for very long texts -genomes or metagenomes- can provide useful insights to discover genetic signatures responsible for biological mechanisms. The ALCS- <math><mi>k</mi></math> problem has two additional requirements compared to the LCS problem: one is the minimum length <math><mi>k</mi></math> , and the other is that all common strings longer than <math><mi>k</mi></math> must be reported. We present an efficient, two-stage ALCS- <math><mi>k</mi></math> algorithm exploiting the spectrum of text substrings of length <math><mi>k</mi></math> ( <math><mi>k</mi></math> -mers). Our approach yields a worst-case time complexity loglinear in the number of <math><mi>k</mi></math> -mers for the first stage, and an average-case loglinear in the number of common <math><mi>k</mi></math> -mers for the second stage (several orders of magnitudes smaller than the total <math><mi>k</mi></math> -mer spectrum). The space complexity is linear in the first phase (disk-based), and on average linear in the second phase (disk- and memory-based). Tests performed on genomes for different organisms (including viruses, bacteria and animal chromosomes) show that run times are consistent with our theoretical estimates; further, comparisons with MUMmer4 show an asymptotic advantage with divergent genomes.</p>","PeriodicalId":73284,"journal":{"name":"IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11412151/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142302596","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Eloisa Nguyen, Rebecca Z Lin, Yang Gong, Cui Tao, Muhammad Tuan Amith
{"title":"Developing a computational representation of human physical activity and exercise using open ontology-based approach: a Tai Chi use case.","authors":"Eloisa Nguyen, Rebecca Z Lin, Yang Gong, Cui Tao, Muhammad Tuan Amith","doi":"10.1109/ichi61247.2024.00012","DOIUrl":"10.1109/ichi61247.2024.00012","url":null,"abstract":"<p><p>Many studies have examined the impact of exercise and other physical activities in influencing the health outcomes of individuals. These physical activities entail an intricate sequence and series of physical anatomy, physiological movement, movement of the anatomy, etc. To better understand how these components interact with one another and their downstream impact on health outcomes, there needs to be an information model that conceptualizes all entities involved. In this study, we introduced our early development of an ontology model to computationally describe human physical activities and the various entities that compose each activity. We developed an open-sourced biomedical ontology called the Kinetic Human Movement Ontology that reused OBO Foundry terminologies and encoded in OWL2. We applied this ontology in modeling and linking a specific Tai Chi movement. The contribution of this work could enable modeling of information relating to human physical activity, like exercise, and lead towards information standardization of human movement for analysis. Future work will include expanding our ontology to include more expressive information and completely modeling entire sets of movement from human physical activity.</p>","PeriodicalId":73284,"journal":{"name":"IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11503552/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142514161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Richard Li Xu, Song Wang, Zewei Wang, Yuhan Zhang, Yunyu Xiao, Jyotishman Pathak, David Hodge, Yan Leng, S Craig Watkins, Ying Ding, Yifan Peng
{"title":"Analyzing Social Factors to Enhance Suicide Prevention Across Population Groups.","authors":"Richard Li Xu, Song Wang, Zewei Wang, Yuhan Zhang, Yunyu Xiao, Jyotishman Pathak, David Hodge, Yan Leng, S Craig Watkins, Ying Ding, Yifan Peng","doi":"10.1109/ichi61247.2024.00032","DOIUrl":"10.1109/ichi61247.2024.00032","url":null,"abstract":"<p><p>Social factors like family background, education level, financial status, and stress can impact public health outcomes, such as suicidal ideation. However, the analysis of social factors for suicide prevention has been limited by the lack of up-to-date suicide reporting data, variations in reporting practices, and small sample sizes. In this study, we analyzed 172,629 suicide incidents from 2014 to 2020 utilizing the National Violent Death Reporting System Restricted Access Database (NVDRS-RAD). Logistic regression models were developed to examine the relationships between demographics and suicide-related circumstances. Trends over time were assessed, and Latent Dirichlet Allocation (LDA) was used to identify common suicide-related social factors. Mental health, interpersonal relationships, mental health treatment and disclosure, and school/work-related stressors were identified as the main themes of suicide-related social factors. This study also identified systemic disparities across various population groups, particularly concerning Black individuals, young people aged under 24, healthcare practitioners, and those with limited education backgrounds, which shed light on potential directions for demographic-specific suicidal interventions.</p>","PeriodicalId":73284,"journal":{"name":"IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11450796/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142382637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Evaluating Generative Models in Medical Imaging.","authors":"Liyue Fan, Ashley Bang, Luca Bonomi","doi":"10.1109/ichi61247.2024.00084","DOIUrl":"10.1109/ichi61247.2024.00084","url":null,"abstract":"<p><p>Data synthesis can address important data availability challenges in biomedical informatics. Quantitative evaluation of generative models may help understand their applications to synthesizing biomedical data. This poster paper examines state-of-the-art generative models used in medical imaging, such as StyleGAN and DDPM models, and evaluates their performance in learning data manifolds and in the visible features of generated samples. Results show that existing generative models have much to improve based on the studied measures.</p>","PeriodicalId":73284,"journal":{"name":"IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11508590/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142514162","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Mitigating Membership Inference in Deep Survival Analyses with Differential Privacy.","authors":"Liyue Fan, Luca Bonomi","doi":"10.1109/ichi57859.2023.00022","DOIUrl":"10.1109/ichi57859.2023.00022","url":null,"abstract":"<p><p>Deep neural networks have been increasingly integrated in healthcare applications to enable accurate predicative analyses. Sharing trained deep models not only facilitates knowledge integration in collaborative research efforts but also enables equitable access to computational intelligence. However, recent studies have shown that an adversary may leverage a shared model to learn the participation of a target individual in the training set. In this work, we investigate privacy-protecting model sharing for survival studies. Specifically, we pose three research questions. (1) Do deep survival models leak membership information? (2) How effective is differential privacy in defending against membership inference in deep survival analyses? (3) Are there other effects of differential privacy on deep survival analyses? Our study assesses the membership leakage in emerging deep survival models and develops differentially private training procedures to provide rigorous privacy protection. The experimental results show that deep survival models leak membership information and our approach effectively reduces membership inference risks. The results also show that differential privacy introduces a limited performance loss, and may improve the model robustness in the presence of noisy data, compared to non-private models.</p>","PeriodicalId":73284,"journal":{"name":"IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10751041/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139049861","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An LSTM-based Gesture-to-Speech Recognition System.","authors":"Riyad Bin Rafiq, Syed Araib Karim, Mark V Albert","doi":"10.1109/ichi57859.2023.00062","DOIUrl":"10.1109/ichi57859.2023.00062","url":null,"abstract":"<p><p>Fast and flexible communication options are limited for speech-impaired people. Hand gestures coupled with fast, generated speech can enable a more natural social dynamic for those individuals - particularly individuals without the fine motor skills to type on a keyboard or tablet reliably. We created a mobile phone application prototype that generates audible responses associated with trained hand movements and collects and organizes the accelerometer data for rapid training to allow tailored models for individuals who may not be able to perform standard movements such as sign language. Six participants performed 11 distinct gestures to produce the dataset. A mobile application was developed that integrated a bidirectional LSTM network architecture which was trained from this data. After evaluation using nested subject-wise cross-validation, our integrated bidirectional LSTM model demonstrates an overall recall of 91.8% in recognition of these pre-selected 11 hand gestures, with recall at 95.8% when two commonly confused gestures were not assessed. This prototype is a step in creating a mobile phone system capable of capturing new gestures and developing tailored gesture recognition models for individuals in speech-impaired populations. Further refinement of this prototype can enable fast and efficient communication with the goal of further improving social interaction for individuals unable to speak.</p>","PeriodicalId":73284,"journal":{"name":"IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10894657/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139974844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiaoyu Wang, Dipankar Gupta, Michael Killian, Zhe He
{"title":"Benchmarking Transformer-Based Models for Identifying Social Determinants of Health in Clinical Notes.","authors":"Xiaoyu Wang, Dipankar Gupta, Michael Killian, Zhe He","doi":"10.1109/ichi57859.2023.00102","DOIUrl":"10.1109/ichi57859.2023.00102","url":null,"abstract":"<p><p>Electronic health records (EHR) have been widely used in building machine learning models for health outcomes prediction. However, many EHR-based models are inherently biased due to lack of risk factors on social determinants of health (SDoH), which are responsible for up to 40% preventive deaths. As SDoH information is often captured in clinical notes, recent efforts have been made to extract such information from notes with natural language processing and append it to other structured data. In this work, we benchmark 7 pre-trained transformer-based models, including BERT, ALBERT, BioBERT, BioClinicalBERT, RoBERTa, ELECTRA, and RoBERTa-MIMIC-Trial, for recognizing SDoH terms using a previously annotated corpus of MIMIC-III clinical notes. Our study shows that BioClinicalBERT model performs best on F-1 scores (0.911, 0.923) under both strict and relaxed criteria. This work shows the promise of using transformer-based models for recognizing SDoH information from clinical notes.</p>","PeriodicalId":73284,"journal":{"name":"IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10795706/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139492901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ko-Hong Lin, Jay-Jiguang Zhu, Judith A Smith, Yejin Kim, Xiaoqian Jiang
{"title":"An End-to-end <i>In-Silico</i> and <i>In-Vitro</i> Drug Repurposing Pipeline for Glioblastoma.","authors":"Ko-Hong Lin, Jay-Jiguang Zhu, Judith A Smith, Yejin Kim, Xiaoqian Jiang","doi":"10.1109/ichi57859.2023.00135","DOIUrl":"10.1109/ichi57859.2023.00135","url":null,"abstract":"<p><p>Our study aims to address the challenges in drug development for glioblastoma, a highly aggressive brain cancer with poor prognosis. We propose a computational framework that utilizes machine learning-based propensity score matching to estimate counterfactual treatment effects and predict synergistic effects of drug combinations. Through our <i>in-silico</i> analysis, we identified promising drug candidates and drug combinations that warrant further investigation. To validate these computational findings, we conducted <i>in-vitro</i> experiments on two GBM cell lines, U87 and T98G. The experimental results demonstrated that some of the identified drugs and drug combinations indeed exhibit strong suppressive effects on GBM cell growth. Our end-to-end pipeline showcases the feasibility of integrating computational models with biological experiments to expedite drug repurposing and discovery efforts. By bridging the gap between <i>in-silico</i> analysis and <i>in-vitro</i> validation, we demonstrate the potential of this approach to accelerate the development of novel and effective treatments for glioblastoma.</p>","PeriodicalId":73284,"journal":{"name":"IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10956733/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140186468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yaohua Wang, Lisanne Van Dijk, Abdallah S R Mohamed, Mohamed Naser, Clifton David Fuller, Xinhua Zhang, G Elisabeta Marai, Guadalupe Canahuate
{"title":"Improving Prediction of Late Symptoms using LSTM and Patient-reported Outcomes for Head and Neck Cancer Patients.","authors":"Yaohua Wang, Lisanne Van Dijk, Abdallah S R Mohamed, Mohamed Naser, Clifton David Fuller, Xinhua Zhang, G Elisabeta Marai, Guadalupe Canahuate","doi":"10.1109/ichi57859.2023.00047","DOIUrl":"10.1109/ichi57859.2023.00047","url":null,"abstract":"<p><p>Patient-Reported Outcomes (PRO) are collected directly from the patients using symptom questionnaires. In the case of head and neck cancer patients, PRO surveys are recorded every week during treatment with each patient's visit to the clinic and at different follow-up times after the treatment has concluded. PRO surveys can be very informative regarding the patient's status and the effect of treatment on the patient's quality of life (QoL). Processing PRO data is challenging for several reasons. First, missing data is frequent as patients might skip a question or a questionnaire altogether. Second, PROs are patient-dependent, a rating of 5 for one patient might be a rating of 10 for another patient. Finally, most patients experience severe symptoms during treatment which usually subside over time. However, for some patients, late toxicities persist negatively affecting the patient's QoL. These long-term severe symptoms are hard to predict and are the focus of this study. In this work, we model PRO data collected from head and neck cancer patients treated at the MD Anderson Cancer Center using the MD Anderson Symptom Inventory (MDASI) questionnaire as time series. We impute missing values with a combination of K nearest neighbor (KNN) and Long Short-Term Memory (LSTM) neural networks, and finally, apply LSTM to predict late symptom severity 12 months after treatment. We compare performance against clinical and ARIMA models. We show that the LSTM model combined with KNN imputation is effective in predicting late-stage symptom ratings for occurrence and severity under the AUC and F1 score metrics.</p>","PeriodicalId":73284,"journal":{"name":"IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10853990/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139725194","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}