Lei Du, Kefei Liu, Xiaohui Yao, Shannon L Risacher, Junwei Han, Lei Guo, Andrew J Saykin, Li Shen
{"title":"Fast Multi-Task SCCA Learning with Feature Selection for Multi-Modal Brain Imaging Genetics.","authors":"Lei Du, Kefei Liu, Xiaohui Yao, Shannon L Risacher, Junwei Han, Lei Guo, Andrew J Saykin, Li Shen","doi":"10.1109/BIBM.2018.8621298","DOIUrl":"https://doi.org/10.1109/BIBM.2018.8621298","url":null,"abstract":"<p><p>Brain imaging genetics studies the genetic basis of brain structures and functions via integrating both genotypic data such as single nucleotide polymorphism (SNP) and imaging quantitative traits (QTs). In this area, both multi-task learning (MTL) and sparse canonical correlation analysis (SCCA) methods are widely used since they are superior to those independent and pairwise univariate analyses. MTL methods generally incorporate a few of QTs and are not designed for feature selection from a large number of QTs; while existing SCCA methods typically employ only one modality of QTs to study its association with SNPs. Both MTL and SCCA encounter computational challenges as the number of SNPs increases. In this paper, combining the merits of MTL and SCCA, we propose a novel multi-task SCCA (MTSCCA) learning framework to identify bi-multivariate associations between SNPs and multi-modal imaging QTs. MTSCCA could make use of the complementary information carried by different imaging modalities. Using the <i>G</i> <sub>2,1</sub>-norm regularization, MTSCCA treats all SNPs in the same group together to enforce sparsity at the group level. The <math> <mrow><msub><mi>l</mi> <mrow><mn>2</mn> <mo>,</mo> <mn>1</mn></mrow> </msub> </mrow> </math> -norm penalty is used to jointly select features across multiple tasks for SNPs, and across multiple modalities for QTs. A fast optimization algorithm is proposed using the grouping information of SNPs. Compared with conventional SCCA methods, MTSCCA obtains improved performance regarding both correlation coefficients and canonical weights patterns. In addition, our method runs very fast and is easy-to-implement, and thus could provide a powerful tool for genome-wide brain-wide imaging genetic studies.</p>","PeriodicalId":74563,"journal":{"name":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","volume":"2018 ","pages":"356-361"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/BIBM.2018.8621298","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"37065392","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhenzhi Li, Yiming Zuo, Chaohui Xu, Rency S Varghese, Habtom W Ressom
{"title":"INDEED: R package for network based differential expression analysis.","authors":"Zhenzhi Li, Yiming Zuo, Chaohui Xu, Rency S Varghese, Habtom W Ressom","doi":"10.1109/BIBM.2018.8621426","DOIUrl":"https://doi.org/10.1109/BIBM.2018.8621426","url":null,"abstract":"<p><p>With recent advancement of omics technologies, fueled by decreased cost and increased number of available datasets, computational methods for differential expression analysis are sought to identify disease-associated biomolecules. Conventional differential expression analysis methods (e.g. student's t-test, ANOVA) focus on assessing mean and variance of biomolecules in each biological group. On the other hand, network-based approaches take into account the interactions between biomolecules in choosing differentially expressed ones. These interactions are typically evaluated by correlation methods that tend to generate over-complicated networks due to many seemingly indirect associations. In this paper, we introduce a new R/Bioconductor package INDEED that allows users to construct a sparse network based on partial correlation, and to identify biomolecules that have significant changes both at individual expression and pairwise interaction levels. We applied INDEED for analysis of two omic datasets acquired in a cancer biomarker discovery study to help rank disease-associated biomolecules. We believe biomolecules selected by INDEED lead to improved sensitivity and specificity in detecting disease status compared to those selected by conventional statistical methods. Also, INDEED's framework is amenable to further expansion to integrate networks from multi-omic studies, thereby allowing selection of reliable disease-associated biomolecules or disease biomarkers.</p>","PeriodicalId":74563,"journal":{"name":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","volume":"2018 ","pages":"2709-2712"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/BIBM.2018.8621426","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"37313557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chengsheng Mao, Yiheng Pan, Zexian Zeng, Liang Yao, Yuan Luo
{"title":"Deep Generative Classifiers for Thoracic Disease Diagnosis with Chest X-ray Images.","authors":"Chengsheng Mao, Yiheng Pan, Zexian Zeng, Liang Yao, Yuan Luo","doi":"10.1109/BIBM.2018.8621107","DOIUrl":"https://doi.org/10.1109/BIBM.2018.8621107","url":null,"abstract":"<p><p>Thoracic diseases are very serious health problems that plague a large number of people. Chest X-ray is currently one of the most popular methods to diagnose thoracic diseases, playing an important role in the healthcare workflow. However, reading the chest X-ray images and giving an accurate diagnosis remain challenging tasks for expert radiologists. With the success of deep learning in computer vision, a growing number of deep neural network architectures were applied to chest X-ray image classification. However, most of the previous deep neural network classifiers were based on deterministic architectures which are usually very noise-sensitive and are likely to aggravate the overfitting issue. In this paper, to make a deep architecture more robust to noise and to reduce overfitting, we propose using deep generative classifiers to automatically diagnose thorax diseases from the chest X-ray images. Unlike the traditional deterministic classifier, a deep generative classifier has a distribution middle layer in the deep neural network. A sampling layer then draws a random sample from the distribution layer and input it to the following layer for classification. The classifier is generative because the class label is generated from samples of a related distribution. Through training the model with a certain amount of randomness, the deep generative classifiers are expected to be robust to noise and can reduce overfitting and then achieve good performances. We implemented our deep generative classifiers based on a number of well-known deterministic neural network architectures, and tested our models on the chest X-ray14 dataset. The results demonstrated the superiority of deep generative classifiers compared with the corresponding deep deterministic classifiers.</p>","PeriodicalId":74563,"journal":{"name":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","volume":"2018 ","pages":"1209-1214"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/BIBM.2018.8621107","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41223004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Early Prediction of Acute Kidney Injury in Critical Care Setting Using Clinical Notes.","authors":"Yikuan Li, Liang Yao, Chengsheng Mao, Anand Srivastava, Xiaoqian Jiang, Yuan Luo","doi":"10.1109/bibm.2018.8621574","DOIUrl":"10.1109/bibm.2018.8621574","url":null,"abstract":"<p><p>Acute kidney injury (AKI) in critically ill patients is associated with significant morbidity and mortality. Development of novel methods to identify patients with AKI earlier will allow for testing of novel strategies to prevent or reduce the complications of AKI. We developed data-driven prediction models to estimate the risk of new AKI onset. We generated models from clinical notes within the first 24 hours following intensive care unit (ICU) admission extracted from Medical Information Mart for Intensive Care III (MIMIC-III). From the clinical notes, we generated clinically meaningful word and concept representations and embeddings, respectively. Five supervised learning classifiers and knowledge-guided deep learning architecture were used to construct prediction models. The best configuration yielded a competitive AUC of 0.779. Our work suggests that natural language processing of clinical notes can be applied to assist clinicians in identifying the risk of incident AKI onset in critically ill patients upon admission to the ICU.</p>","PeriodicalId":74563,"journal":{"name":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","volume":"2018 ","pages":"683-686"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7768909/pdf/nihms-1656128.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38762863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jin Lu, Jiangwen Sun, Xinyu Wang, Henry R Kranzler, Joel Gelernter, Jinbo Bi
{"title":"Collaborative Phenotype Inference from Comorbid Substance Use Disorders and Genotypes.","authors":"Jin Lu, Jiangwen Sun, Xinyu Wang, Henry R Kranzler, Joel Gelernter, Jinbo Bi","doi":"10.1109/BIBM.2017.8217681","DOIUrl":"10.1109/BIBM.2017.8217681","url":null,"abstract":"<p><p>Data in large-scale genetic studies of complex human diseases, such as substance use disorders, are often incomplete. Despite great progress in genotype imputation, e.g., the IMPUTE2 method, considerably less progress has been made in inferring phenotypes. We designed a novel approach to integrate individuals' comorbid conditions with their genotype data to infer missing (unreported) diagnostic criteria of a disorder. The premise of our approach derives from correlations among symptoms and the shared biological bases of concurrent disorders such as co-dependence on cocaine and opioids. We describe a matrix completion method to construct a bi-linear model based on the interactions of genotypes and known symptoms of related disorders to infer unknown values of another set of symptoms or phenotypes. An efficient stochastic and parallel algorithm based on the linearized alternating direction method of multipliers was developed to solve the proposed optimization problem. Empirical evaluation of the approach in comparison with other advanced data matrix completion methods via a case study shows that it both significantly improves imputation accuracy and provides greater computational efficiency.</p>","PeriodicalId":74563,"journal":{"name":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","volume":"2017 ","pages":"392-397"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5947969/pdf/nihms913259.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36094670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Variable Selection in Heterogeneous Datasets: A Truncated-rank Sparse Linear Mixed Model with Applications to Genome-wide Association Studies.","authors":"Haohan Wang, Bryon Aragam, Eric P Xing","doi":"10.1109/BIBM.2017.8217687","DOIUrl":"10.1109/BIBM.2017.8217687","url":null,"abstract":"<p><p>A fundamental and important challenge in modern datasets of ever increasing dimensionality is variable selection, which has taken on renewed interest recently due to the growth of biological and medical datasets with complex, non-i.i.d. structures. Naïvely applying classical variable selection methods such as the Lasso to such datasets may lead to a large number of false discoveries. Motivated by genome-wide association studies in genetics, we study the problem of variable selection for datasets arising from multiple subpopulations, when this underlying population structure is unknown to the researcher. We propose a unified framework for sparse variable selection that adaptively corrects for population structure via a low-rank linear mixed model. Most importantly, the proposed method does not require prior knowledge of individual relationships in the data and adaptively selects a covariance structure of the correct complexity. Through extensive experiments, we illustrate the effectiveness of this framework over existing methods. Further, we test our method on three different genomic datasets from plants, mice, and humans, and discuss the knowledge we discover with our model.</p>","PeriodicalId":74563,"journal":{"name":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","volume":"2017 ","pages":"431-438"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5889139/pdf/nihms874620.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35986011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hansi Zhang, Yi Guo, Qian Li, Thomas J George, Elizabeth A Shenkman, Jiang Bian
{"title":"Data Integration through Ontology-Based Data Access to Support Integrative Data Analysis: A Case Study of Cancer Survival.","authors":"Hansi Zhang, Yi Guo, Qian Li, Thomas J George, Elizabeth A Shenkman, Jiang Bian","doi":"10.1109/BIBM.2017.8217849","DOIUrl":"https://doi.org/10.1109/BIBM.2017.8217849","url":null,"abstract":"<p><p>To improve cancer survival rates and prognosis, one of the first steps is to improve our understanding of contributory factors associated with cancer survival. Prior research has suggested that cancer survival is influenced by multiple factors from multiple levels. Most of existing analyses of cancer survival used data from a single source. Nevertheless, there are key challenges in integrating variables from different sources. Data integration is a daunting task because data from different sources can be heterogeneous in syntax, schema, and particularly semantics. Thus, we propose to adopt a semantic data integration approach that generates a universal conceptual representation of \"information\" including data and their relationships. This paper describes a case study of semantic data integration linking three data sets that cover both individual and contextual level factors for the purpose of assessing the association of the predictors of interest with cancer survival using cox proportional hazard models.</p>","PeriodicalId":74563,"journal":{"name":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","volume":"2017 ","pages":"1300-1303"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/BIBM.2017.8217849","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36054115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhe He, Yehoshua Perl, Gai Elhanan, Yan Chen, James Geller, Jiang Bian
{"title":"Auditing the Assignments of Top-Level Semantic Types in the UMLS Semantic Network to UMLS Concepts.","authors":"Zhe He, Yehoshua Perl, Gai Elhanan, Yan Chen, James Geller, Jiang Bian","doi":"10.1109/BIBM.2017.8217840","DOIUrl":"https://doi.org/10.1109/BIBM.2017.8217840","url":null,"abstract":"<p><p>The Unified Medical Language System (UMLS) is an important terminological system. By the policy of its curators, each concept of the UMLS should be assigned the most specific Semantic Types (STs) in the UMLS Semantic Network (SN). Hence, the Semantic Types of most UMLS concepts are assigned at or near the bottom (leaves) of the UMLS Semantic Network. While most ST assignments are correct, some errors do occur. Therefore, Quality Assurance efforts of UMLS curators for ST assignments should concentrate on automatically detected sets of UMLS concepts with higher error rates than random sets. In this paper, we investigate the assignments of top-level semantic types in the UMLS semantic network to concepts, identify potential erroneous assignments, define four categories of errors, and thus provide assistance to curators of the UMLS to avoid these assignments errors. Human experts analyzed samples of concepts assigned 10 of the top-level semantic types and categorized the erroneous ST assignments into these four logical categories. Two thirds of the concepts assigned these 10 top-level semantic types are erroneous. Our results demonstrate that reviewing top-level semantic type assignments to concepts provides an effective way for UMLS quality assurance, comparing to reviewing a random selection of semantic type assignments.</p>","PeriodicalId":74563,"journal":{"name":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","volume":"2017 ","pages":"1262-1269"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/BIBM.2017.8217840","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35772366","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ricardo A Calix, Ravish Gupta, Matrika Gupta, Keyuan Jiang
{"title":"Deep Gramulator: Improving Precision in the Classification of Personal Health-Experience Tweets with Deep Learning.","authors":"Ricardo A Calix, Ravish Gupta, Matrika Gupta, Keyuan Jiang","doi":"10.1109/BIBM.2017.8217820","DOIUrl":"https://doi.org/10.1109/BIBM.2017.8217820","url":null,"abstract":"<p><p>Health surveillance is an important task to track the happenings related to human health, and one of its areas is pharmacovigilance. Pharmacovigilance tracks and monitors safe use of pharmaceutical products. Pharmacovigilance involves tracking side effects that may be caused by medicines and other health related drugs. Medical professionals have a difficult time collecting this information. It is anticipated that social media could help to collect this data and track side effects. Twitter data can be used for this task given that users post their personal health related experiences on-line. One problem with Twitter data, however, is that it contains a lot of noise. Therefore, an approach is needed to remove the noise. In this paper, several machine learning algorithms including deep neural nets are used to build classifiers that can help to detect these Personal Experience Tweets (PETs). Finally, we propose a method called the Deep Gramulator that improves results. Results of the analysis are presented and discussed.</p>","PeriodicalId":74563,"journal":{"name":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","volume":"2017 ","pages":"1154-1159"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/BIBM.2017.8217820","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36286319","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deep Learning-based MSMS Spectra Reduction in Support of Running Multiple Protein Search Engines on Cloud.","authors":"Majdi Maabreh, Basheer Qolomany, Izzat Alsmadi, Ajay Gupta","doi":"10.1109/bibm.2017.8217951","DOIUrl":"10.1109/bibm.2017.8217951","url":null,"abstract":"<p><p>The diversity of the available protein search engines with respect to the utilized matching algorithms, the low overlap ratios among their results and the disparity of their coverage encourage the community of proteomics to utilize ensemble solutions of different search engines. The advancing in cloud computing technology and the availability of distributed processing clusters can also provide support to this task. However, data transferring and results' combining, in this case, could be the major bottleneck. The flood of billions of observed mass spectra, hundreds of Gigabytes or potentially Terabytes of data, could easily cause the congestions, increase the risk of failure, poor performance, add more computations' cost, and waste available resources. Therefore, in this study, we propose a deep learning model in order to mitigate the traffic over cloud network and, thus reduce the cost of cloud computing. The model, which depends on the top 50 intensities and their m/z values of each spectrum, removes any spectrum which is predicted not to pass the majority voting of the participated search engines. Our results using three search engines namely: pFind, Comet and X!Tandem, and four different datasets are promising and promote the investment in deep learning to solve such type of Big data problems.</p>","PeriodicalId":74563,"journal":{"name":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","volume":"2017 ","pages":"1909-1914"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8382039/pdf/nihms-1728667.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39355075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}