{"title":"Mitigating health disparities in EHR via deconfounder","authors":"Zheng Liu, Xiaohan Li, Philip S. Yu","doi":"10.1145/3535508.3545516","DOIUrl":"https://doi.org/10.1145/3535508.3545516","url":null,"abstract":"Health disparities, or inequalities between different patient demographics, are becoming a crucial issue in medical decision-making, especially in Electronic Health Record (EHR) predictive modeling. In order to ensure the fairness of sensitive attributes, conventional studies mainly adopt calibration or re-weighting methods to balance the performance on among different demographic groups. However, we argue that these methods have some limitations. First, these methods usually mean making a trade-off between the model's performance and fairness. Second, many methods attribute the existence of unfairness completely to the data collection process, which lacks substantial evidence. In this paper, we provide an empirical study to discover the possibility of using deconfounder to address the disparity issue in healthcare. Our study can be summarized in two parts. The first part is a pilot study demonstrating the exacerbation of disparity when unobserved confounders exist. The second part proposed a novel framework, Parity Medical Deconfounder (PriMeD), to deal with the disparity issue in healthcare datasets. Inspired by the deconfounder theory, PriMeD adopts a Conditional Variational Autoencoder (CVAE) to learn latent factors (substitute confounders) for observational data, and extensive experiments are provided to show its effectiveness.","PeriodicalId":354504,"journal":{"name":"Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133010738","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sapana Bhandari, Nathan P. Whitener, Konghao Zhao, Natalia Khuri
{"title":"Multi-target integration and annotation of single-cell RNA-sequencing data","authors":"Sapana Bhandari, Nathan P. Whitener, Konghao Zhao, Natalia Khuri","doi":"10.1145/3535508.3545511","DOIUrl":"https://doi.org/10.1145/3535508.3545511","url":null,"abstract":"Cells are the building blocks of human tissues and organs, and the distributions of different cell-types change due to environmental or disease conditions and treatments. Single-cell RNA sequencing is used to study heterogeneity of cells in biological samples. To date, computational approaches aided in the discovery of dominant and rare cell-types and facilitated the construction of cell atlases. Integration of new data with the existing reference atlases is an emerging computational problem, and this paper proposes to frame it as a multi-target prediction task, solvable using supervised machine learning. We systematically and rigorously test 63 different predictors on synthetic benchmarks with different properties. The best performing predictor has high Cohen's Kappa scores and low mean absolute errors in single-batch and multi-batch integration experiments.","PeriodicalId":354504,"journal":{"name":"Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133155310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deep sequence representation learning for predicting human proteins with liquid-liquid phase separation propensity and synaptic functions","authors":"Anqi Wei, Liangjiang Wang","doi":"10.1145/3535508.3545550","DOIUrl":"https://doi.org/10.1145/3535508.3545550","url":null,"abstract":"With advancements in next-generation sequencing techniques, the whole protein sequence repertoire has increased to a great extent. In the meantime, deep learning techniques have promoted the development of computational methods to interpret large-scale proteomic data and facilitate functional studies of proteins. Inferring properties from protein amino acid sequences has been a long-standing problem in Bioinformatics. Extensive studies have successfully applied natural language processing (NLP) techniques for the representation learning of protein sequences. In this paper, we applied the deep sequence model - UDSMProt, to fine-tune and evaluate two protein prediction tasks: (1) predict proteins with liquid-liquid phase separation propensity and (2) predict synaptic proteins. Our results have shown that, without prior domain knowledge and only based on protein sequences, the fine-tuned language models achieved high classification accuracies and outperformed baseline models using compositional k-mer features in both tasks. Hence, it is promising to apply the protein language model to some learning tasks and the fine-tuned models can be used to predict protein candidates for biological studies.","PeriodicalId":354504,"journal":{"name":"Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125396058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Electronic health records","authors":"Sudha Tushara Sadasivuni","doi":"10.1145/3552471","DOIUrl":"https://doi.org/10.1145/3552471","url":null,"abstract":"","PeriodicalId":354504,"journal":{"name":"Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics","volume":"131 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127029836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Predicting acute events using the movement patterns of older adults: an unsupervised clustering method","authors":"Ramin Ramazi, M. Bowen, Rahmatollah Beheshti","doi":"10.1145/3535508.3545561","DOIUrl":"https://doi.org/10.1145/3535508.3545561","url":null,"abstract":"Timely identification of individuals with a high risk of imminent acute events in long-term care facilities can aid in reducing the frequency or severity of such events and lead to safer residential environments. Specifically, an interval-based classification of mobility behavior (i.e., the real-time pattern of walking and physical activities in older adults) has been used for early recognition and prevention of acute events such as falls, delirium, and urinary tract infections. It has also been shown that supplementing such temporal mobility behavior data with static cognitive condition information (such as test scores) can yield better prediction results. However, classifying such multi-modal (static+time-series) data is a challenging task as it requires simultaneously taking different similarity relationships into account. In this work, we present an unsupervised clustering technique for classifying this type of multi-modal data points via jointly optimizing separate objective functions associated with the static and time-series parts. We show that our customized deep learning pipeline achieves competitive or superior results compared to several recent clustering baselines when studied on a few generic tasks aiming at clustering time-series data using both static and time-series data. Following this, we show that our clustering model can be used to cluster movement patterns into clinically meaningful clusters that can effectively capture the risk of near future acute events.","PeriodicalId":354504,"journal":{"name":"Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115463357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ashley Babjac, T. Royalty, A. D. Steen, Scott J. Emrich
{"title":"A comparison of dimensionality reduction methods for large biological data","authors":"Ashley Babjac, T. Royalty, A. D. Steen, Scott J. Emrich","doi":"10.1145/3535508.3545536","DOIUrl":"https://doi.org/10.1145/3535508.3545536","url":null,"abstract":"Large-scale data often suffer from the curse of dimensionality and the constraints associated with it; therefore, dimensionality reduction methods are often performed prior to most machine learning pipelines. In this paper, we directly compare autoencoders performance as a dimensionality reduction technique (via the latent space) to other established methods: PCA, LASSO, and t-SNE. To do so, we use four distinct datasets that vary in the types of features, metadata, labels, and size to robustly compare different methods. We test prediction capability using both Support Vector Machines (SVM) and Random Forests (RF). Significantly, we conclude that autoencoders are an equivalent dimensionality reduction architecture to the previously established methods, and often outperform them in both prediction accuracy and time performance when condensing large, sparse datasets.","PeriodicalId":354504,"journal":{"name":"Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125007248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deep learning for assembly of haplotypes and viral quasispecies from short and long sequencing reads","authors":"Ziqi Ke, H. Vikalo","doi":"10.1145/3535508.3545524","DOIUrl":"https://doi.org/10.1145/3535508.3545524","url":null,"abstract":"Information about genetic variations in either individual genomes or viral populations provides insight in genetic signatures of diseases and suggests directions for medical and pharmaceutical research. State-of-the-art sequencing platforms generate massive amounts of reads, with length varying from one technology to another, that provide data needed for the reconstruction of haplotypes and viral quasispecies. On the one hand, high-throughput platforms are capable of providing enormous amounts of highly accurate but relatively short reads; inability to bridge long genetic distances renders the reconstruction with such reads challenging. On the other hand, the latest generation of sequencing technologies is capable of generating much longer reads but those reads suffer from sequencing errors at a rate higher than the error rate of short reads. This motivates search for reconstruction methods capable of leveraging both the high accuracy of short reads and the phase resolving power of long reads. We present a deep learning framework that relies on convolutional auto-encoders with a clustering layer to reconstruct individual haplotypes or viral populations from hybrid data sources. First, an auto-encoder for haplotype assembly / viral population reconstruction from short reads is pre-trained separately from another one utilizing long reads for the same task. The pre-trained models are then retrained simultaneously to enable decision fusion. Results on realistic synthetic as well as experimental data demonstrate that the proposed framework outperforms state-of-the-art techniques for haplotype assembly and viral quasispecies reconstruction, and achieves significantly higher accuracy on those tasks than methods utilizing only one type of reads. Code is available at https://github.com/WuLoli/HybSeq.","PeriodicalId":354504,"journal":{"name":"Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122946749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Semi-supervised 3D neural networks to track iPS cell division in label-free phase contrast time series images","authors":"A. Peskin, J. Chalfoun, M. Halter, A. Plant","doi":"10.1145/3535508.3545532","DOIUrl":"https://doi.org/10.1145/3535508.3545532","url":null,"abstract":"In order to predict cell population behavior, it is important to understand the dynamic characteristics of individual cells. Individual induced pluripotent stem (iPS) cells in colonies have been difficult to track over long times, both because segmentation is challenging due to close proximity of cells and because cell morphology at the time of cell division does not change dramatically in phase contrast images; image features do not provide sufficient discrimination for 2D neural network models of label-free images. However, these cells do not move significantly during division, and they display a distinct temporal pattern of morphologies. As a result, we can detect cell division with images overlaid in time. Using a combination of a 3D neural network applied over time-lapse data to find regions of cell division activity, followed by a 2D neural network for images in these selected regions to find individual dividing cells, we developed a robust detector of iPS cell division. We created an initial 3D neural network to find 3D image regions in (x,y,t) in which identified cell divisions occurred, then used semi-supervised training with additional stacks of images to create a more refined 3D model. These regions were then inferenced with our 2D neural network to find the location and time immediately before cells divide when they contain two sets of chromatin, information needed to track the cells after division. False positives from the 3D inferenced results were identified and removed with the addition of the 2D model. We successfully identified 37 of the 38 cell division events in our manually annotated test image stack, and specified the time and (x,y) location of each cell just before division within an accuracy of 10 pixels.","PeriodicalId":354504,"journal":{"name":"Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123495084","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Medical imaging","authors":"S. Nabavi","doi":"10.1145/3552476","DOIUrl":"https://doi.org/10.1145/3552476","url":null,"abstract":"","PeriodicalId":354504,"journal":{"name":"Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122611511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuanfang Ren, Aisharjya Sarkar, Aysegül Bumin, Kejun Huang, P. Veltri, A. Dobra, Tamer Kahveci
{"title":"Identification of co-existing embeddings of a motif in multilayer networks","authors":"Yuanfang Ren, Aisharjya Sarkar, Aysegül Bumin, Kejun Huang, P. Veltri, A. Dobra, Tamer Kahveci","doi":"10.1145/3535508.3545528","DOIUrl":"https://doi.org/10.1145/3535508.3545528","url":null,"abstract":"Interactions among molecules, also known as biological networks, are often modeled as binary graphs, where nodes and edges represent the molecules and the interaction among those molecules, such as signal transmission, genes-regulation, and protein-protein interactions. Subgraph patterns which are recurring in these networks, called motifs, describe conserved biological functions. Although traditional binary graph provides a simple model to study biological interactions, it lacks the expressive power to provide a holistic view of cell behavior as the interaction topology alters and adopts under different stress conditions as well as genetic variations. Multilayer network model captures the complexity of cell functions for such systems. Unlike the classic binary network model, multilayer network model provides an opportunity to identify conserved functions in cell among varying conditions. In this paper, we introduce the problem of co-existing motifs in multilayer networks. These motifs describe the dual conservation of the functions of cells within a network layer (i.e., cell condition) as well as across different layers of networks. We propose a new algorithm to solve the co-existing motif identification problem efficiently and accurately. Our experiments on both synthetic and real datasets demonstrate that our method identifies all co-existing motifs at near 100 % accuracy for all networks we tested on, while competing method's accuracy varies greatly between 10 to 95 %. Furthermore, our method runs at least an order of magnitude faster than state of the art motif identification methods for binary network models.","PeriodicalId":354504,"journal":{"name":"Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126216929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}