{"title":"Session details: Structural bioinformatics","authors":"Ramgopal R. Mettu","doi":"10.1145/3552474","DOIUrl":"https://doi.org/10.1145/3552474","url":null,"abstract":"","PeriodicalId":354504,"journal":{"name":"Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116807530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CytoEMD","authors":"Haidong Yi, Natalie Stanley","doi":"10.1145/3535508.3545525","DOIUrl":"https://doi.org/10.1145/3535508.3545525","url":null,"abstract":"Modern single-cell technologies, such as Cytometry by Time of Flight (CyTOF), measure the simultaneous expression of multiple protein markers per cell and have enabled the characterization of the immune system at unparalleled depths across numerous clinical applications. Despite the success of a variety of developed bioinformatics techniques for automatically characterizing cells into particular immune cell-types, methods to encode variation across heterogeneous cellular landscapes and with respect to a clinical outcome of interest are still lacking. To summarize and unravel the immunological variation across multiple samples profiled with CyTOF, we developed CytoEMD, a fast and scalable metric-based method to encode a compact vector representation for each profiled sample. CytoEMD uses earth mover's distance (EMD) to quantify the differences between pairs of profiled samples, which can be further projected into a latent space for visualization and interpretation. We compared CytoEMD to gating-based and deep-learning based set autoencoder methods and found that the CytoEMD approach 1) correctly captures between-patient variation, and 2) is more efficient and requires significantly fewer parameters. CytoEMD further promotes interpretability by providing insight into the cell-types driving variation between samples. CytoEMD is available as an open-sourced python package at https://github.com/CompCy-lab/CytoEMD.","PeriodicalId":354504,"journal":{"name":"Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114773413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Brendan E. Odigwe, Alireza Bagheri Rajeoni, Celestine I. Odigwe, F. Spinale, H. Valafar
{"title":"Application of machine learning for patient response prediction to cardiac resynchronization therapy","authors":"Brendan E. Odigwe, Alireza Bagheri Rajeoni, Celestine I. Odigwe, F. Spinale, H. Valafar","doi":"10.1145/3535508.3545513","DOIUrl":"https://doi.org/10.1145/3535508.3545513","url":null,"abstract":"Heart failure (HF) is a leading cause of morbidity, mortality, and substantial health care costs. Prolonged conduction through the myocardium can occur with HF, and a device-driven approach, termed cardiac resynchronization therapy (CRT), can improve left ventricular (LV) myocardial conduction patterns. We used machine learning methods of classifying HF patients, namely Decision Trees, and Artificial Neural Networks (ANNs), to develop predictive models of individual outcomes following CRT. Clinical, functional, and biomarker data were collected in HF patients before and following CRT. A prospective 6-month endpoint of a reduction in LV volume was defined as a CRT response. Using this approach on 764 subjects (368 responders, 396 non-responders), each with 53 parameters, we could classify HF patients based on their response to CRT with more than 72% success. We also explored the utilization of machine learning techniques in predicting the magnitude of LV volume, 3 months after CRT placement. Using techniques such as linear regression and Artificial neural networks, we can predict the 3-month LV volume within a 17% median margin of error. We have demonstrated that using machine learning approaches can identify HF patients with a high probability of a positive CRT response. Developing these approaches into a clinical algorithm to assist in clinical decision-making regarding the use of CRT in HF patients would potentially improve outcomes and reduce health care costs.","PeriodicalId":354504,"journal":{"name":"Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125611132","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Data adequacy bias impact in a data-blinded semi-supervised GAN for privacy-aware COVID-19 chest X-ray classification","authors":"Javier Pastorino, A. Biswas","doi":"10.1145/3535508.3545560","DOIUrl":"https://doi.org/10.1145/3535508.3545560","url":null,"abstract":"Supervised machine learning models are, by definition, data-sighted, requiring to view all or most parts of the training dataset which are labeled. This paradigm presents two bottlenecks which are intertwined: risk of exposing sensitive data samples to the third-party site with machine learning engineers, and time-consuming, laborious, bias-prone nature of data annotations by the personnel at the data source site. In this paper we studied learning impact of data adequacy as bias source in a data-blinded semi-supervised learning model for covid chest X-ray classification. Data-blindedness was put in action on a semi-supervised generative adversarial network to generate synthetic data based only on a few labeled data samples and concurrently learn to classify targets. We designed and developed a data-blind COVID-19 patient classifier that classifies whether an individual is suffering from COVID-19 or other type of illness with the ultimate goal of producing a system to assist in labeling large datasets. However, the availability of the labels in the training data had an impact in the model performance, and when a new disease spreads, as it was COVID9-19 in 2019, access to labeled data may be limited. Here, we studied how bias in the labeled sample distribution per class impacted in classification performance for three models: a Convolution Neural Network based classifier (CNN), a semi-supervised GAN using the source data (SGAN), and finally our proposed data-blinded semi-supervised GAN (BSGAN). Data-blind prevents machine learning engineers from directly accessing the source data during training, thereby ensuring data confidentiality. This was achieved by using synthetic data samples, generated by a separate generative model which were then used to train the proposed model. Our model achieved comparable performance, with the trade-off between a privacy-aware model and a traditionally-learnt model of 0.05 AUC-score, and it maintained stable, following the same learning performance as the data distribution was changed.","PeriodicalId":354504,"journal":{"name":"Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133513157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"AT[N]-net: multimodal spatiotemporal network for subtype identification in Alzheimer's disease","authors":"Jingwen Zhang, Enze Xu, Minghan Chen","doi":"10.1145/3535508.3545103","DOIUrl":"https://doi.org/10.1145/3535508.3545103","url":null,"abstract":"Alzheimer's disease (AD) is a heterogeneous, multifactorial neurodegenerative disorder, where beta-amyloid (A), pathologic tau (T), neurodegeneration ([N]), and structural brain network (Net) are four major indicators of AD progression. Most current studies on AD rely on single-source modality and ignore complex biological interactions at molecular level. In this study, we propose a novel multimodal spatiotemporal stratification network (MSSN) that is built upon the fusion of multiple data modalities and the combined power of systems biology and deep learning. Altogether, our stratification approach could (1) ameliorate limitations caused by insufficient longitudinal imaging data, (2) extract important spatiotemporal features vectors from imaging data, (3) exploit the subject-specific longitudinal prediction of a holistic biomarker set, and (4) generate symptoms related finegrained subtype classification.","PeriodicalId":354504,"journal":{"name":"Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130437954","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Implementing algorithms for sorting by strip swaps","authors":"A. Asaithambi, Chandrika Rao, Swapnoneel Roy","doi":"10.1145/3535508.3545566","DOIUrl":"https://doi.org/10.1145/3535508.3545566","url":null,"abstract":"Genome rearrangement problems in computational biology have been modeled as combinatorial optimization problems related to the familiar problem of sorting, namely transforming arbitrary permutations to the identity permutation. When a permutation is viewed as the string of integers from 1 through n, any substring in it that is also a substring in the identity permutation will be called a strip. The objective in the combinatorial optimization problems arising from the applications is to obtain the identity permutation from an arbitrary permutation in the least number of a particular chosen strip operation. Among the strip operations which have been investigated thus far in the literature are strip moves, transpositions, reversals, and block interchanges. However, it is important to note that most of the existing research on sorting by strip operations has been focused on obtaining hardness results or designing approximation algorithms, with little work carried out thus far on the implementation of the proposed approximation algorithms. In this paper, two new algorithms for sorting by strip swaps are presented. The first algorithm takes a greedy approach and selects at each step a strip swap that reduces the number of strips the most, and puts maximum strips in their correct positions. The second algorithm brings the closest consecutive pairs together at each step. Approximation ratios for these two algorithms are experimentally estimated.","PeriodicalId":354504,"journal":{"name":"Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133722915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Clinical trials & outcome prediction","authors":"Brendan E. Odigwe","doi":"10.1145/3552479","DOIUrl":"https://doi.org/10.1145/3552479","url":null,"abstract":"","PeriodicalId":354504,"journal":{"name":"Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122178911","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Trajectory-based and sound-based medical data clustering","authors":"Maria Mannone, V. Distefano","doi":"10.1145/3535508.3545102","DOIUrl":"https://doi.org/10.1145/3535508.3545102","url":null,"abstract":"Challenges in medicine are often faced as interdisciplinary endeavors. In such an interdisciplinary view, sonification of medical data provides an additional sensory dimension to highlight often hard-to-find information and details. Some examples of sonification of medical data include Covid genome mapping [5], auditory representations of tridimensional objects as the brain [4], enhancement of medical imagery through the use of sound [1]. Here, we focus on kidney filtering-efficiency time-evolution data. We consider the estimated glomerular filtration rate (eGFR), the main indicator of kidney efficiency in diabetic kidney disease patients.1 We propose a technique to sonify the eGFR trajectories with time, frequency, and timbre to distinguish amongst patients (Figure 1). Multiple pitch trajectories can be formally investigated with the tools of counterpoint (Figure 2), and computationally analyzed with sound-processing techniques. Patients who present similar patterns of eGFR behavior can be more easily spotted through musical similarities. We use the Fréchet distance, which evaluates the shape similarity between curves [2], to cluster patients with similar eGFR behavior. We thus compare the information gathered through sonification and shape-based analysis. We find the mean curves in each trajectory cluster and we compare them with the characteristics of sonified curves. Clustering methods have also been applied to sound analysis: it is the case of k-means to cluster sound data [3]. The Fréchet-based clustering technique is a development of k-means taking shape into account. Thus, we sketch a sound-based clustering approach for medical data, as an additional tool to find patterns of behavior. This study can foster new research between computer science, medicine, and sound processing.","PeriodicalId":354504,"journal":{"name":"Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics","volume":"194 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114237503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Detecting synchronization in brain activity","authors":"Gangadhar Katuri, E. Rosa, Rosangela Follmann","doi":"10.1145/3535508.3545106","DOIUrl":"https://doi.org/10.1145/3535508.3545106","url":null,"abstract":"Billions of neurons make up our brains where the emergence of synchronous behavior is one of the most fundamental questions in the field of neuroscience. In a system as complex as the human brain, synchronization of neuronal activity can be useful and necessary as during the sleep cycles and in consolidation of memory but can also be problematic and undesirable in disorders such as epilepsy and Parkinson's disease. The goal in this study is to shed light on a particular type of neuronal synchronization associated with epileptic seizures that result from a central nervous system disorder characterized by abnormal brain activity. The approach consists of analyzing electroencephalogram (EEG) data containing information about neuronal electrical activity of epileptic patients before, during and after a seizure. The database includes EEG recordings of 14 patients obtained from the Unit of Neurology and Neurophysiology of the University of Siena, with electrical activity collected from 29 brain areas through electrodes placed on the scalp of the patients [1]. The data is initially preprocessed using filters to reduce the noise level [3], and the phase of the filtered signal is extracted using the Hilbert Transform and the Phase Estimation by Means of Frequency (PEMF) methods [2]. The phase of each of the 29 signal is then compared over time with each of the other 28 signals to verify whether the signals have their phases in synchrony, or not. We compute the phase locking value (PLV) to quantify the level of synchronization between pairs of signals and obtain color maps for graphical visualization of the overall behavior of the brain electrical activity (Fig. 1, top panel). The functional connectivity in the pre, during, and post seizure of a patient experiencing a seizure is depicted in Fig. 1, bottom panel. Each line represents a functional connection were PLV was grater than 0.95. Our preliminary results show that there are more synchronized channels during the seizure across patients compared to pre and post seizure. Additionally, neurons of certain areas of the brain tend to be more synchronous than others during the epileptic seizure. The approach considered in this work can be extended beyond epilepsy, with potential implementation to study other neurological disorders including schizophrenia and Parkinson's disease, for example.","PeriodicalId":354504,"journal":{"name":"Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126804444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Perkins, J. Peckham, Tayo Obafemi-Ajayi, Xiuzhen Huang
{"title":"Team building without boundaries","authors":"A. Perkins, J. Peckham, Tayo Obafemi-Ajayi, Xiuzhen Huang","doi":"10.1145/3535508.3545596","DOIUrl":"https://doi.org/10.1145/3535508.3545596","url":null,"abstract":"Team building can be challenging when participants are from the same discipline or sub-discipline, but needs special attention when participants use a different vocabulary and have different cultural views on what constitutes viable problems and solutions. Essential to No Boundary Thinking (NBT) teams is proper formulation of the problem to be solved, and a basic tenant is that the NBT team must come together with diverse perspectives to decide the problem before solutions can be considered. Given that participants come with different views on problem formulation and solution, it is important to consider a robust process for team formation and maintenance. This takes extra effort and time, but scholars studying teams of experts with diverse training have found that they are better positioned to be successful in solving even deep and difficult problems especially if they have learned to work well with each other. At this workshop we will discuss principles that scholars who have worked in NBT teams have discovered as effective. We will then engage with the workshop participants to consider discuss these principles and brainstorm to consider other approaches.","PeriodicalId":354504,"journal":{"name":"Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115567011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}