{"title":"Predicting HIV Diagnosis Among Emerging Adults Using Electronic Health Records and Health Survey Data in All of Us Research Program.","authors":"Balu Bhasuran, Yiyang Liu, Mattia Prosperi, Karen MacDonell, Sylvie Naar, Zhe He","doi":"10.1109/bibm62325.2024.10822296","DOIUrl":"10.1109/bibm62325.2024.10822296","url":null,"abstract":"<p><p>The global decline in HIV incidence has not been mirrored in the United States, where young adults (ages 18-29) continue to account for a significant portion of new infections. In this study, we leverage the All of Us (AoU) Research Program's extensive electronic health records (EHRs) and health survey data to develop machine learning models capable of predicting HIV diagnoses at least three months before clinical identification. Among various models tested, the Support Vector Machine (SVM) model demonstrated a balanced performance, integrating clinically relevant features with robust predictive accuracy (AUC = 0.91). Risky drinking behaviors emerged as consistent top predictors across models, highlighting the importance of targeted interventions in this age group. Our findings underscore the potential of predictive analytics in enhancing HIV prevention strategies and informing public health efforts aimed at reducing HIV transmission among emerging adults.</p>","PeriodicalId":74563,"journal":{"name":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","volume":"2024 ","pages":"5433-5440"},"PeriodicalIF":0.0,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11823436/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143415967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Arman Behnam, Muskan Garg, Xingyi Liu, Maria Vassilaki, Jennifer St Sauver, Ronald C Petersen, Sunghwan Sohn
{"title":"Causal Explanation from Mild Cognitive Impairment Progression using Graph Neural Networks.","authors":"Arman Behnam, Muskan Garg, Xingyi Liu, Maria Vassilaki, Jennifer St Sauver, Ronald C Petersen, Sunghwan Sohn","doi":"10.1109/bibm62325.2024.10822848","DOIUrl":"10.1109/bibm62325.2024.10822848","url":null,"abstract":"<p><p>Mild Cognitive Impairment (MCI) is a transitional stage between normal cognitive aging and dementia. Some individuals with MCI revert to normal, while others progress to dementia. There are limited studies using explainable artificial intelligence on longitudinal data, particularly including genotypes, biomarkers and chronic diseases, to explore these differences. This study introduces a novel approach to understanding MCI progression using explainable graph neural networks. Utilizing longitudinal temporal data, we constructed a comprehensive graph representation of each individual in the study cohort. Our temporal graph convolutional network achieved 72.4% accuracy in predicting MCI transitions, while our causal explanation method outperformed existing explanation techniques in stability, accuracy, and faithfulness. We identified a causal subgraph with informative variables including hypertension, arrhythmia, congestive heart failure, coronary artery disease, stroke, lipid-related issues, and sex.</p>","PeriodicalId":74563,"journal":{"name":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","volume":"2024 ","pages":"6349-6355"},"PeriodicalIF":0.0,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11803575/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143384106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Interpreting Lung Cancer Health Disparity between African American Males and European American Males.","authors":"Masrur Sobhan, Md Mezbahul Islam, Ananda Mohan Mondal","doi":"10.1109/bibm62325.2024.10822014","DOIUrl":"10.1109/bibm62325.2024.10822014","url":null,"abstract":"<p><p>Lung cancer remains a predominant cause of cancer-related deaths, with notable disparities in incidence and outcomes across racial and gender groups. This study addresses these disparities by developing a computational framework leveraging explainable artificial intelligence (XAI) to identify both patient- and cohort-specific biomarker genes in lung cancer. Specifically, we focus on two lung cancer subtypes, Lung Adenocarcinoma (LUAD) and Lung Squamous Cell Carcinoma (LUSC), examining distinct racial and sex-specific cohorts: African American males (AAMs) and European American males (EAMs). This study innovatively structures classification tasks based on disease conditions rather than racial labels to avoid race-specific imbalance. We constructed four classification tasks- one three-class problem (LUAD-LUSC-HEALTHY) and three two-class problems (LUAD-LUSC, LUAD-HEALTHY, LUSC-HEALTHY)- to interpret the disease behavior of the patients in terms of genes and pathways. This methodology allows a LUAD or LUSC patient to be analyzed via multiple classifications, yielding robust disparity information for every patient. This preliminary work reports the disparity information for LUAD only. Utilizing Transcriptome data from The Cancer Genome Atlas (TCGA) and Genotype-Tissue Expression (GTEx) projects, we processed samples for LUAD, LUSC, and HEALTHY cohorts. We applied machine learning models, including convolutional neural network (CNN), logistic regression (LR), naïve Bayesian classifier (NB), support vector machine (SVM), random forest (RF), and extreme gradient boosting (XGBoost) for the classification. The SHapley Additive exPlanation (SHAP)-based interpretation of the best performing classification model uncovered cohort-specific genes and pathways related to health disparities between LUAD-AAM and LUAD-EAM cohorts.</p>","PeriodicalId":74563,"journal":{"name":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","volume":"2024 ","pages":"7141-7143"},"PeriodicalIF":0.0,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11753458/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143026044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Parsing Clinical Trial Eligibility Criteria for Cohort Query by a Multi-Input Multi-Output Sequence Labeling Model.","authors":"Shubo Tian, Pengfei Yin, Hansi Zhang, Arslan Erdengasileng, Jiang Bian, Zhe He","doi":"10.1109/bibm58861.2023.10385876","DOIUrl":"10.1109/bibm58861.2023.10385876","url":null,"abstract":"<p><p>To enable electronic screening of eligible patients for clinical trials, free-text clinical trial eligibility criteria should be translated to a computable format. Natural language processing (NLP) techniques have the potential to automate this process. In this study, we explored a supervised multi-input multi-output (MIMO) sequence labelling model to parse eligibility criteria into combinations of fact and condition tuples. Our experiments on a small manually annotated training dataset showed that that the performance of the MIMO framework with a BERT-based encoder using all the input sequences achieved an overall lenient-level AUROC of 0.61. Although the performance is suboptimal, representing eligibility criteria into logical and semantically clear tuples can potentially make subsequent translation of these tuples into database queries more reliable.</p>","PeriodicalId":74563,"journal":{"name":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","volume":"2023 ","pages":"4426-4430"},"PeriodicalIF":0.0,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11251129/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141629519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hongyi Yang, Rich Gonzalez, Brahmajee K Nallamothu, Keith D Aaronson, Kevin R Ward, Alfred O Hero, Sardar Ansari
{"title":"A Practical Approach to Disease Risk Prediction: Focus on High-Risk Patients via Highest-<i>k</i> Loss.","authors":"Hongyi Yang, Rich Gonzalez, Brahmajee K Nallamothu, Keith D Aaronson, Kevin R Ward, Alfred O Hero, Sardar Ansari","doi":"10.1109/bibm58861.2023.10385816","DOIUrl":"10.1109/bibm58861.2023.10385816","url":null,"abstract":"<p><p>Disease risk prediction models play an important role in preventing disease developments in modern healthcare. However, the lack of focus on high-risk patients has hindered the large-scale practical application of these models, especially considering the limitation of medical resources available for following up on patients who are deemed high-risk. In this study, we propose a novel and practical approach that focuses on minimizing the number of false positive observations among high-risk patients by introducing the <i>Highest</i>-<i>k Loss</i>. The solution is to estimate the weights of the highest <math><mi>k</mi></math> scores with a differentiable estimation of the sorting operation and apply the weights to the loss function. We extracted 253,680 survey responses from a public dataset of the U.S. health survey system to define a diabetes prediction task. This study employs nested cross-validation as well as an aggregated model applied to an independent test set to systematically evaluate the proposed method. Compared with traditional binary cross entropy loss and Focal loss, the Highest- <math><mi>k</mi></math> loss improved the precision (positive predictive value) for the highest 1% scores by 0.05 (95% CI: 0.041-0.055), the highest 5% scores by 0.03 (95% CI: 0.024-0.032), and the highest 10% scores by 0.02 (95% CI: 0.016-0.021). The introduced Highest- <math><mi>k</mi></math> loss function addresses the problem of prevailing risk prediction models and offers a practical solution that focuses on patients with the <math><mi>k</mi></math> highest predictive scores who can realistically receive an intervention as opposed to the entire patient population.</p>","PeriodicalId":74563,"journal":{"name":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","volume":"2023 ","pages":"3226-3233"},"PeriodicalIF":0.0,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11821551/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143415935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Building Prediction Models for 30-Day Readmissions Among ICU Patients Using Both Structured and Unstructured Data in Electronic Health Records.","authors":"Alex Moerschbacher, Zhe He","doi":"10.1109/bibm58861.2023.10385612","DOIUrl":"10.1109/bibm58861.2023.10385612","url":null,"abstract":"<p><p>ICU readmissions are associated with poor outcomes for patients and poor performance of hospitals. Patients who are readmitted have an increased risk of in-hospital deaths; hospitals with a higher read-mission rate have a reduced profitability, due to an increase in cost and reduced payments from Medicare and Medicaid programs. Predicting a patient's likelihood of being readmitted to the ICU can help reduce early discharges, the risk of in-hospital deaths, and help in-crease profitability. In this study, we built and evaluated multiple machine learning models to predict 30-day readmission rates of ICU patients in the MIMIC-III database. We used both the structured data including demographics, laboratory tests, comorbidities, and unstructured discharge summaries as the predictors and evaluated different combinations of features. The best performing model in this study Logistic Regression achieved an AUROC of 75.7%. This study shows the potential of leveraging machine learning and deep learning for predicting ICU readmissions.</p>","PeriodicalId":74563,"journal":{"name":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","volume":"2023 ","pages":"4368-4373"},"PeriodicalIF":0.0,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11271049/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141763104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Muskan Garg, Xingyi Liu, Maria Vassilaki, Ronald C Petersen, Jennifer St Sauver, Sunghwan Sohn
{"title":"Navigating Sex-Specific Disease Dynamics in Incident Dementia.","authors":"Muskan Garg, Xingyi Liu, Maria Vassilaki, Ronald C Petersen, Jennifer St Sauver, Sunghwan Sohn","doi":"10.1109/bibm58861.2023.10385324","DOIUrl":"10.1109/bibm58861.2023.10385324","url":null,"abstract":"<p><p>Dementia is among the leading causes of cognitive and functional loss and disability in older adults. Past studies suggested sex differences in health conditions and progression of cognitive decline. Existing studies on the temporal trajectory of health conditions for patient characterization after dementia diagnosis are scarce and ambiguous. Thus, there's limited and unclear research on how health conditions change over time after a dementia diagnosis. To this end, we aim to analyze the shift in medical conditions and examine sex-specific changes in patterns of chronic health conditions after dementia diagnosis. We centered our analysis on a 15-year window around the point of dementia diagnosis, encompassing the 5 years leading up to the diagnosis and the 10 years following it. We introduce (i) MedMet, a network metric to quantify the contribution of each medical condition, and (ii) growth and decay function for temporal trajectory analysis of medical conditions. Our experiments demonstrate that certain health conditions are more prevalent among females than males. Thus, our findings underscore the pressing need to examine differences between men and women, which could be important for healthcare utilization after a dementia diagnosis.</p>","PeriodicalId":74563,"journal":{"name":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","volume":"2023 ","pages":"4065-4072"},"PeriodicalIF":0.0,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10883293/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139974920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ziming Liu, Muskan Garg, Sunyang Fu, Surjodeep Sarkar, Maria Vassilaki, Ronald C Petersen, Jennifer St Sauver, Sunghwan Sohn
{"title":"Harnessing Transfer Learning for Dementia Prediction: Leveraging Sex-Different Mild Cognitive Impairment Prognosis.","authors":"Ziming Liu, Muskan Garg, Sunyang Fu, Surjodeep Sarkar, Maria Vassilaki, Ronald C Petersen, Jennifer St Sauver, Sunghwan Sohn","doi":"10.1109/bibm58861.2023.10385516","DOIUrl":"10.1109/bibm58861.2023.10385516","url":null,"abstract":"<p><p>This paper presents a machine learning-based prediction for dementia, leveraging transfer learning to reuse the knowledge learned from prediction of mild cognitive impairment, a precursor of dementia. We also examine the impacts of temporal aspects of longitudinal data and sex differences. The methodology encompasses key components such as setting the duration window, comparing different modeling strategies, conducting comprehensive evaluations, and examining the sex-specific impacts of simulated scenarios. The findings reveal that cognitive deficits in females, once detected at the mild cognitive impairment stage, tend to deteriorate over time, while males exhibit more diverse decline across various characteristics without highlighting specific ones. However, the underlying reasons for these sex differences remain unknown and warrant further investigation.</p>","PeriodicalId":74563,"journal":{"name":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","volume":"2023 ","pages":"2097-2100"},"PeriodicalIF":0.0,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10883588/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139974919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ASD-GResTM: Deep Learning Framework for ASD classification using Gramian Angular Field.","authors":"Fahad Almuqhim, Fahad Saeed","doi":"10.1109/bibm58861.2023.10385743","DOIUrl":"10.1109/bibm58861.2023.10385743","url":null,"abstract":"<p><p>Autism Spectrum Disorder (ASD) is a heterogeneous disorder in children, and the current clinical diagnosis is accomplished using behavioral, cognitive, developmental, and language metrics. These clinical metrics can be imperfect measures as they are subject to high test-retest variability, and are influenced by assessment factors such as environment, social structure, or comorbid disorders. Advances in neuroimaging coupled with machine-learning provides an opportunity to develop methods that are more quantifiable, and reliable than existing clinical techniques. In this paper, we design and develop a deep-learning model that operates on functional magnetic resonance imaging (fMRI) data, and can classify between ASD and neurotypical brains. We introduce a novel strategy to transform time-series data extracted from fMRI signals into Gramian Angular Field (GAF) while locking in the temporal and spatial patterns in the data. Our motivation is to design and develop a novel framework that could encode the time-series, acquired from fMRI data, into images that can be used by deep-learning architectures that have been successful in computer vision. In our proposed framework called <i>ASD-GResTM</i>, we used a Convolutional Neural Network (CNN) to extract useful features from GAF images. We then used a Long Short-Term Memory (LSTM) layer to learn the activities between the regions. Finally, the output representations of the last LSTM layer are applied to a single-layer perceptron (SPL) to get the final classification. Our extensive experimentation demonstrates high accuracy across 4 centers, and outperforms state-of-the-art models on two centers with an increase in the accuracy of 17.58% and 6.7%, respectively as compared to the state of the art. Our model achieved the maximum accuracy of 81.78% with high degree of sensitivity and specificity. All training, validation, and testing was accomplished using openly available ABIDE-I benchmarking dataset.</p>","PeriodicalId":74563,"journal":{"name":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","volume":"2023 ","pages":"2837-2843"},"PeriodicalIF":0.0,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11254319/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141636062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ahmed Elhussein, Murad Megjhani, Daniel Nametz, Miriam Weiss, Jude Savarraj, Soon Bin Kwon, David J Roh, Sachin Agarwal, E Sander Connolly, Angela Velazquez, Jan Claassen, Huimahn A Choi, Gerrit A Schubert, Soojin Park, Gamze Gürsoy
{"title":"A generalizable physiological model for detection of Delayed Cerebral Ischemia using Federated Learning.","authors":"Ahmed Elhussein, Murad Megjhani, Daniel Nametz, Miriam Weiss, Jude Savarraj, Soon Bin Kwon, David J Roh, Sachin Agarwal, E Sander Connolly, Angela Velazquez, Jan Claassen, Huimahn A Choi, Gerrit A Schubert, Soojin Park, Gamze Gürsoy","doi":"10.1109/bibm58861.2023.10385383","DOIUrl":"10.1109/bibm58861.2023.10385383","url":null,"abstract":"<p><p>Delayed cerebral ischemia (DCI) is a complication seen in patients with subarachnoid hemorrhage stroke. It is a major predictor of poor outcomes and is detected late. Machine learning models are shown to be useful for early detection, however training such models suffers from small sample sizes due to rarity of the condition. Here we propose a Federated Learning approach to train a DCI classifier across three institutions to overcome challenges of sharing data across hospitals. We developed a framework for federated feature selection and built a federated ensemble classifier. We compared the performance of FL model to that obtained by training separate models at each site. FL significantly improved performance at only two sites. We found that this was due to feature distribution differences across sites. FL improves performance in sites with similar feature distributions, however, FL can worsen performance in sites with heterogeneous distributions. The results highlight both the benefit of FL and the need to assess dataset distribution similarity before conducting FL.</p>","PeriodicalId":74563,"journal":{"name":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","volume":"2023 ","pages":"1886-1889"},"PeriodicalIF":0.0,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10883332/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139934591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}