{"title":"Semi-supervised classification of disease prognosis using CR images with clinical data structured graph","authors":"Jun Bai, Bingjun Li, S. Nabavi","doi":"10.1145/3535508.3545548","DOIUrl":"https://doi.org/10.1145/3535508.3545548","url":null,"abstract":"Fast growing global connectivity and urbanisation increases the risk of spreading worldwide disease. The worldwide SARS-COV-2 disease causes healthcare system strained, especially for the intensive care units. Therefore, prognostic of patients' need for intensive care units is priority at the hospital admission stage for efficient resource allocation. In the early hospitalization, patient chest radiography and clinical data are always collected to diagnose. Hence, we proposed a clinical data structured graph Markov neural network embedding with computed radiography exam features (CGMNN) to predict the intensive care units demand for COVID patients. The study utilized 1,342 patients' chest computed radiography with clinical data from a public dataset. The proposed CGMNN outperforms baseline models with an accuracy of 0.82, a sensitivity of 0.82, a precision of 0.81, and an F1 score of 0.76.","PeriodicalId":354504,"journal":{"name":"Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117292289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nooriyah Poonawala-Lohani, Patricia J. Riddle, Mehnaz Adnan, Jörg Simon Wicker
{"title":"Geographic ensembles of observations using randomised ensembles of autoregression chains: ensemble methods for spatio-temporal time series forecasting of influenza-like illness","authors":"Nooriyah Poonawala-Lohani, Patricia J. Riddle, Mehnaz Adnan, Jörg Simon Wicker","doi":"10.1145/3535508.3545562","DOIUrl":"https://doi.org/10.1145/3535508.3545562","url":null,"abstract":"Influenza is a communicable respiratory illness that can cause serious public health hazards. Flu surveillance in New Zealand tracks case counts from various District health boards (DHBs) in the country to monitor the spread of influenza in different geographic locations. Many factors contribute to the spread of the influenza across a geographic region, and it can be challenging to forecast cases in one region without taking into account case numbers in another region. This paper proposes a novel ensemble method called Geographic Ensembles of Observations using Randomised Ensembles of Autoregression Chains (GEO-Reach). GEO-Reach is an ensemble technique that uses a two layer approach to utilise interdependence of historical case counts between geographic regions in New Zealand. This work extends a previously published method by the authors [11] called Randomized Ensembles of Auto-regression chains (Reach). State-of-the-art forecasting models look at studying the spread of the virus. They focus on accurate forecasting of cases for a location using historical case counts for the same location and other data sources based on human behaviour such as movement of people across cities/geographic regions. This new approach is evaluated using Influenza like illness (ILI) case counts in 7 major regions in New Zealand from the years 2015--2019 and compares its performance with other standard methods such as Dante, ARIMA, Autoregression and Random Forests. The results demonstrate that the proposed method performed better than baseline methods when applied to this multi-variate time series forecasting problem.","PeriodicalId":354504,"journal":{"name":"Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114159873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D. Machado-Reyes, Mansu Kim, Hanqing Chao, Li Shen, Pingkun Yan
{"title":"Connectome transformer with anatomically inspired attention for Parkinson's diagnosis","authors":"D. Machado-Reyes, Mansu Kim, Hanqing Chao, Li Shen, Pingkun Yan","doi":"10.1145/3535508.3545544","DOIUrl":"https://doi.org/10.1145/3535508.3545544","url":null,"abstract":"Parkinson's disease (PD) is the second most prevalent neurodegenerative disease in the United States. The structural or functional connectivity between regions of interest (ROIs) in the brain and their changes captured in brain connectomes could be potential biomarkers for PD. To effectively model the complex non-linear characteristic connectomic patterns related to PD and exploit the long-range feature interactions between ROIs, we propose a connectome transformer model for PD patient classification and biomarker identification. The proposed connectome transformer learns the key connectomic patterns by leveraging the global scope of the attention mechanism guided by an additional skip-connection from the input connectome and the local level focus of the CNN techniques. Our proposed model significantly outperformed the benchmarking models in the classification task and was able to visualize key feature interactions between ROIs in the brain.","PeriodicalId":354504,"journal":{"name":"Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115292636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Aisharjya Sarkar, Aaditya Singh, Richard Bailey, A. Dobra, Tamer Kahveci
{"title":"Optimal separation of high dimensional transcriptome for complex multigenic traits","authors":"Aisharjya Sarkar, Aaditya Singh, Richard Bailey, A. Dobra, Tamer Kahveci","doi":"10.1145/3535508.3545506","DOIUrl":"https://doi.org/10.1145/3535508.3545506","url":null,"abstract":"The plight of navigating high-dimensional transcription datasets remains a persistent problem. This problem is further amplified for complex disorders, such as cancer, as these disorders are often multigenic traits with multiple subsets of genes collectively affecting the type, stage, and severity of the trait. We are often faced with a trade-off between reducing the dimensionality of our datasets and maintaining the integrity of our data. Almost exclusively, researchers apply techniques commonly known as dimensionality reduction to reduce the dimensions of the feature space to allow classifiers to work in more appropriately sized input spaces. As the number of dimensions is reduced, however, the ability to distinguish classes from one another reduces as well. Thus, to accomplish both tasks simultaneously for very high dimensional transcriptome for complex multigenic traits, we propose a new supervised technique, Class Separation Transformation (CST). CST accomplishes both tasks simultaneously by significantly reducing the dimensionality of the input space into a one-dimensional transformed space that provides optimal separation between the differing classes. We compare our method with existing state-of-the-art methods using both real and synthetic datasets, demonstrating that CST is the more accurate, robust, and scalable technique relative to existing methods. Code used in this paper is available on https://github.com/aisharjya/CST","PeriodicalId":354504,"journal":{"name":"Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123462200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Genomic variation","authors":"S. Nabavi","doi":"10.1145/3552480","DOIUrl":"https://doi.org/10.1145/3552480","url":null,"abstract":"","PeriodicalId":354504,"journal":{"name":"Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123534091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Combining spectral clustering and large cut algorithms to find compensatory functional modules from yeast physical and genetic interaction data with GLASS","authors":"Blessing Kolawole, L. Cowen","doi":"10.1145/3535508.3545509","DOIUrl":"https://doi.org/10.1145/3535508.3545509","url":null,"abstract":"Various algorithmic and statistical approaches have been proposed to uncover functionally coherent network motifs consisting of sets of genes that may occur as compensatory pathways (called Between Pathway Modules, or BPMs) in a high-throughput S. Cerevisiae genetic interaction network. We extend our previous Local-Cut/Genecentric method to also make use of a spectral clustering of the physical interaction network, and uncover some interesting new fault-tolerant modules.","PeriodicalId":354504,"journal":{"name":"Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics","volume":"119 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131580204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Predicting diabetes in imbalanced datasets using neural networks","authors":"H. Guan, Chonghao Zhang","doi":"10.1145/3535508.3545540","DOIUrl":"https://doi.org/10.1145/3535508.3545540","url":null,"abstract":"Diabetes is a long-standing disease caused by high blood sugar over a long period of time and one in every ten Americans has diabetes. The neural networks have gained attention in large-scale genetic research because of its ability in non-linear relationships. However, the data imbalance problem, which is caused by the disproportion between the number of disease samples and the number of healthy samples, will decrease the prediction accuracy. In this project, we tackle the data imbalance problem when predicting diabetes with genotype SNP data and phenotype data provided by UK BioBank. The dataset is highly skewed with healthy samples with the ratio of 20. We build a phenotype neural network and a genotype neural network, which uses two sampling techniques and a data augmentation method by generative adversarial neural network (GAN) to counter the data imbalance problem before feeding the data to the neural networks. We found out that the phenotype neural network outperforms the genotype neural network and achieves 90% accuracy. We reach the conclusion that undersampling performs better than both oversampling and the GAN, and the phenotype is better than the genotype in terms of predicting diabetes. We have identified key phenotype and genotype features that contributed to the effectiveness of the prediction.","PeriodicalId":354504,"journal":{"name":"Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131871635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: COVID-19","authors":"Yuanda Zhu","doi":"10.1145/3552478","DOIUrl":"https://doi.org/10.1145/3552478","url":null,"abstract":"","PeriodicalId":354504,"journal":{"name":"Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131910855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jun Bai, Annie Jin, Andre Jin, Tianyu Wang, Clifford Yang, S. Nabavi
{"title":"Applying graph convolution neural network in digital breast tomosynthesis for cancer classification","authors":"Jun Bai, Annie Jin, Andre Jin, Tianyu Wang, Clifford Yang, S. Nabavi","doi":"10.1145/3535508.3545549","DOIUrl":"https://doi.org/10.1145/3535508.3545549","url":null,"abstract":"Digital breast tomosynthesis, or 3D mammography, has advanced the field of breast imaging diagnosis. It has been rapidly replacing the traditional full-field digital mammography because of its diagnostic superiority. However, automatic detection of breast cancer using digital breast tomosynthesis images has remained challenging, mainly due to their high resolution, high volume, and complexity. In this study, we developed a novel model for more precise detection of cancerous 3D mammogram images. The proposed model first, represents 3D mammograms as graphs, then employs a self-attention graph convolutional neural network model to effectively and efficiently learn the features of 3D mammograms, and finally, using the extracted features, identifies the cancerous 3D mammograms. We trained and evaluated the performance of the proposed model using public and private datasets. We compared the performance of the proposed model with those of multiple state-of-the-art CNN-based models as baseline models. The results show that the proposed model outperforms all the baseline models in terms of accuracy, precision, sensitivity, F1, and AUC.","PeriodicalId":354504,"journal":{"name":"Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116739131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Determining population structure from k-mer frequencies","authors":"Y. Hrytsenko, Noah M. Daniels, R. Schwartz","doi":"10.1145/3535508.3545100","DOIUrl":"https://doi.org/10.1145/3535508.3545100","url":null,"abstract":"Determining population structure helps us understand connections among different populations and how they evolve over time. This knowledge is important for studies ranging from evolutionary biology to large-scale variant-trait association studies, such as Genome-Wide Association Studies (GWAS). Current approaches to determining population structure include model-based approaches, statistical approaches, and distance-based ancestry inference approaches. In this work, we outline an approach that identifies population structure from k-mer frequencies using principal component analysis (PCA). This approach can be classified as statistical; however, while prior work has employed PCA, here we analyze k-mer frequencies rather than multilocus genotype data (SNPs, microsatellites, or haplotypes). K-mer frequencies can be viewed as a summary statistic of a genome and have the advantage of being easily derived from a genome by counting the number of times a k-mer occurred in a sequence. No genetic assumptions must be met to generate k-mers. Current population differentiation approaches, such as structure, depend on several genetic assumptions and go through the process of a careful selection of ancestry informative markers that can be used to identify populations. In this work, we show that PCA is able to detect population structure just from the number of k-mers found in the genome. Application of PCA together with a clustering algorithm to k-mer profiles of genomes provides an easy approach to detecting a number of populations (clusters) present in the dataset. We describe the method and show that the results are comparable to those found by a model-based approach using genetic markers. We validate our method using 48 human genomes from populations identified by the 1000 Human Genomes Project. We also compared our results to those from mash, which determines relationships among individuals using the number of matched k-mers between sequences. We compare the outputs between the two approaches and discuss the sensitivity of population structure identification of both methods. This study shows that PCA is able to detect population structure from k-mer frequencies and can separate samples of admixed and non-admixed origin, whereas mash showed to be highly sensitive to the parameters of k-mer length and sketch size.","PeriodicalId":354504,"journal":{"name":"Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics","volume":"178 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133747233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}