Youngsoon Kim, Jie Hao, Yadu Gautam, T. Mersha, Mingon Kang
{"title":"DiffGRN: differential gene regulatory network analysis","authors":"Youngsoon Kim, Jie Hao, Yadu Gautam, T. Mersha, Mingon Kang","doi":"10.1504/IJDMB.2018.10016325","DOIUrl":"https://doi.org/10.1504/IJDMB.2018.10016325","url":null,"abstract":"Identification of differential gene regulators with significant changes under disparate conditions is essential to understand complex biological mechanism in a disease. Differential Network Analysis (DiNA) examines different biological processes based on gene regulatory networks that represent regulatory interactions between genes with a graph model. While most studies in DiNA have considered correlation-based inference to construct gene regulatory networks from gene expression data due to its intuitive representation and simple implementation, the approach lacks in the representation of causal effects and multivariate effects between genes. In this paper, we propose an approach named Differential Gene Regulatory Network (DiffGRN) that infers differential gene regulation between two groups. We infer gene regulatory networks of two groups using Random LASSO, and then we identify differential gene regulations by the proposed significance test. The advantages of DiffGRN are to capture multivariate effects of genes that regulate a gene simultaneously, to identify causality of gene regulations, and to discover differential gene regulators between regression-based gene regulatory networks. We assessed DiffGRN by simulation experiments and showed its outstanding performance than the current state-of-the-art correlation-based method, DINGO. DiffGRN is applied to gene expression data in asthma. The DiNA with asthma data showed a number of gene regulations, such as ADAM12 and RELB, reported in biological literature.","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.3,"publicationDate":"2018-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49202748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Youngsoon Kim, Jie Hao, Yadu Gautam, Tesfaye B Mersha, Mingon Kang
{"title":"DiffGRN: differential gene regulatory network analysis.","authors":"Youngsoon Kim, Jie Hao, Yadu Gautam, Tesfaye B Mersha, Mingon Kang","doi":"10.1504/IJDMB.2018.094891","DOIUrl":"https://doi.org/10.1504/IJDMB.2018.094891","url":null,"abstract":"<p><p>Identification of differential gene regulators with significant changes under disparate conditions is essential to understand complex biological mechanism in a disease. Differential Network Analysis (DiNA) examines different biological processes based on gene regulatory networks that represent regulatory interactions between genes with a graph model. While most studies in DiNA have considered correlation-based inference to construct gene regulatory networks from gene expression data due to its intuitive representation and simple implementation, the approach lacks in the representation of causal effects and multivariate effects between genes. In this paper, we propose an approach named Differential Gene Regulatory Network (DiffGRN) that infers differential gene regulation between two groups. We infer gene regulatory networks of two groups using Random LASSO, and then we identify differential gene regulations by the proposed significance test. The advantages of DiffGRN are to capture multivariate effects of genes that regulate a gene simultaneously, to identify causality of gene regulations, and to discover differential gene regulators between regression-based gene regulatory networks. We assessed DiffGRN by simulation experiments and showed its outstanding performance than the current state-of-the-art correlation-based method, DINGO. DiffGRN is applied to gene expression data in asthma. The DiNA with asthma data showed a number of gene regulations, such as ADAM12 and RELB, reported in biological literature.</p>","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.3,"publicationDate":"2018-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/IJDMB.2018.094891","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36999358","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Neda Zarayeneh, Euiseong Ko, Jung Hun Oh, Sang Suh, Chunyu Liu, Jean Gao, Donghyun Kim, Mingon Kang
{"title":"Integration of multi-omics data for integrative gene regulatory network inference.","authors":"Neda Zarayeneh, Euiseong Ko, Jung Hun Oh, Sang Suh, Chunyu Liu, Jean Gao, Donghyun Kim, Mingon Kang","doi":"10.1504/IJDMB.2017.10008266","DOIUrl":"https://doi.org/10.1504/IJDMB.2017.10008266","url":null,"abstract":"<p><p>Gene regulatory networks provide comprehensive insights and indepth understanding of complex biological processes. The molecular interactions of gene regulatory networks are inferred from a single type of genomic data, e.g., gene expression data in most research. However, gene expression is a product of sequential interactions of multiple biological processes, such as DNA sequence variations, copy number variations, histone modifications, transcription factors, and DNA methylations. The recent rapid advances of high-throughput omics technologies enable one to measure multiple types of omics data, called 'multi-omics data', that represent the various biological processes. In this paper, we propose an Integrative Gene Regulatory Network inference method (iGRN) that incorporates multi-omics data and their interactions in gene regulatory networks. In addition to gene expressions, copy number variations and DNA methylations were considered for multi-omics data in this paper. The intensive experiments were carried out with simulation data, where iGRN's capability that infers the integrative gene regulatory network is assessed. Through the experiments, iGRN shows its better performance on model representation and interpretation than other integrative methods in gene regulatory network inference. iGRN was also applied to a human brain dataset of psychiatric disorders, and the biological network of psychiatric disorders was analysed.</p>","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.3,"publicationDate":"2017-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5771269/pdf/nihms912092.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35754483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jingshan Huang, K. Eilbeck, Barry Smith, J. Blake, D. Dou, Weili Huang, D. Natale, A. Ruttenberg, Jun Huan, Michael T. Zimmermann, Guoqian Jiang, Yu Lin, Bin Wu, Harrison J. Strachan, Nisansa de Silva, M. V. Kasukurthi, V. Jha, Y. He, Shaojie Zhang, Xiaowei Wang, Zixing Liu, G. Borchert, M. Tan
{"title":"The development of non-coding RNA ontology","authors":"Jingshan Huang, K. Eilbeck, Barry Smith, J. Blake, D. Dou, Weili Huang, D. Natale, A. Ruttenberg, Jun Huan, Michael T. Zimmermann, Guoqian Jiang, Yu Lin, Bin Wu, Harrison J. Strachan, Nisansa de Silva, M. V. Kasukurthi, V. Jha, Y. He, Shaojie Zhang, Xiaowei Wang, Zixing Liu, G. Borchert, M. Tan","doi":"10.1504/IJDMB.2016.077072","DOIUrl":"https://doi.org/10.1504/IJDMB.2016.077072","url":null,"abstract":"Identification of non-coding RNAs (ncRNAs) has been significantly improved over the past decade. On the other hand, semantic annotation of ncRNA data is facing critical challenges due to the lack of a comprehensive ontology to serve as common data elements and data exchange standards in the field. We developed the Non-Coding RNA Ontology (NCRO) to handle this situation. By providing a formally defined ncRNA controlled vocabulary, the NCRO aims to fill a specific and highly needed niche in semantic annotation of large amounts of ncRNA biological and clinical data.","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.3,"publicationDate":"2016-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/IJDMB.2016.077072","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"66730995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Application of consensus string matching in the diagnosis of allelic heterogeneity involving transposition mutation","authors":"F. Zohora, Mohammad Sohel Rahman","doi":"10.1504/IJDMB.2015.072756","DOIUrl":"https://doi.org/10.1504/IJDMB.2015.072756","url":null,"abstract":"In this paper, an algorithm is proposed that detects the existence of a common ancestor gene sequence for non-overlapping transposition metric given two input DNA sequences. We consider two cases: fixed length transposition and all length transposition. For the first one, the algorithm has the time complexity of O(n3), where n is the length of input sequences. In case of all length transposition, theoretical worst case time complexity of the algorithm is proven to be O(n4). However, practically the worst case and the average case time complexity for all length transposition are found to be O(n3) and O(n2) respectively. This work is motivated by the purpose of diagnosing unknown genetic disease that shows allelic heterogeneity, a case where a normal gene mutates in different orders resulting in two different gene sequences causing two different genetic diseases. The algorithm can be useful as well in the study of breed-related hereditary to determine the genetic spread of a defective gene in the population.","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.3,"publicationDate":"2015-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/IJDMB.2015.072756","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"66730847","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Genome-wide discovery of miRNAs using ensembles of machine learning algorithms and logistic regression","authors":"Benjamin Ulfenborg, K. Klinga-Levan, B. Olsson","doi":"10.1504/IJDMB.2015.072755","DOIUrl":"https://doi.org/10.1504/IJDMB.2015.072755","url":null,"abstract":"In silico prediction of novel miRNAs from genomic sequences remains a challenging problem. This study presents a genome-wide miRNA discovery software package called GenoScan and evaluates two hairpin classification methods. These methods, one ensemble-based and one using logistic regression were benchmarked along with 15 published methods. In addition, the sequence-folding step is addressed by investigating the impact of secondary structure prediction methods and the choice of input sequence length on prediction performance. Both the accuracy of secondary structure predictions and the miRNA prediction are evaluated. In the benchmark of hairpin classification methods, the regression model achieved highest classification accuracy. Of the structure prediction methods evaluated, ContextFold achieved the highest agreement between predicted and experimentally determined structures. However, both the choice of secondary structure prediction method and input sequence length had limited impact on hairpin classification performance.","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.3,"publicationDate":"2015-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/IJDMB.2015.072755","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"66730800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiaofeng Song, Lizhen Hu, P. Han, Xuejiang Guo, J. Sha
{"title":"In silico identification and functional annotation of yeast E3 ubiquitin ligase Rsp5 substrates","authors":"Xiaofeng Song, Lizhen Hu, P. Han, Xuejiang Guo, J. Sha","doi":"10.1504/IJDMB.2015.072754","DOIUrl":"https://doi.org/10.1504/IJDMB.2015.072754","url":null,"abstract":"Rsp5, E3 ligases conserved from yeast to mammals, plays a key role in diverse processes in yeast. However, many of Rsp5 substrates are still unclear. Therefore we proposed an in silico method to recognise new substrates of Rsp5. To investigate the molecular determinants that affect the interaction between Rsp5 and its substrate, we have systematically analysed many features that perhaps correlated with the Rsp5 substrate recognition. It is found that PPxY motif, transmembrane region, disorder region and N-linked glycosylation modification are the most important features for substrate recognition. We have constructed an SVM-based classifier to recognise Rsp5 substrates, obtaining 81.5% sensitivity and 74.1% specificity averagely on ten independent testing dataset. We also applied the model on the whole yeast proteome, and identified -66 new Rsp5 substrates. Functional annotation reveals that half of these novel substrates function in the Rsp5 involved cell processes as Rsp5-interacting proteins.","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.3,"publicationDate":"2015-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/IJDMB.2015.072754","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"66730786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yan Wang, Tingting He, Xingpeng Jiang, Jie Yuan, Xianjun Shen
{"title":"Weighted fusion regularisation and predicting microbial interactions with vector autoregressive model","authors":"Yan Wang, Tingting He, Xingpeng Jiang, Jie Yuan, Xianjun Shen","doi":"10.1504/IJDMB.2015.072757","DOIUrl":"https://doi.org/10.1504/IJDMB.2015.072757","url":null,"abstract":"In this paper, we develop a novel regularisation method for MVAR via weighted fusion which considers the correlation among variables. In theory, we discuss the grouping effect of weighted fusion regularisation for linear models. By virtue of the probability method, we show that coefficients corresponding to highly correlated predictors have small differences. A quantitative estimate for such small differences is given regardless of the coefficients signs. The estimate is also improved when consider empirical approximation error if the model fit the data well. We then apply the proposed model on several time series data sets especially a time series dataset of human gut microbiomes. The experimental results indicate that the new approach has better performance than several other VAR-based models and we also demonstrate its capability of extracting relevant microbial interactions.","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.3,"publicationDate":"2015-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/IJDMB.2015.072757","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"66730933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning multiple distributed prototypes of semantic categories for named entity recognition","authors":"Aron Henriksson","doi":"10.1504/IJDMB.2015.072766","DOIUrl":"https://doi.org/10.1504/IJDMB.2015.072766","url":null,"abstract":"The scarcity of large labelled datasets comprising clinical text that can be exploited within the paradigm of supervised machine learning creates barriers for the secondary use of data from electronic health records. It is therefore important to develop capabilities to leverage the large amounts of unlabelled data that, indeed, tend to be readily available. One technique utilises distributional semantics to create word representations in a wholly unsupervised manner and uses existing training data to learn prototypical representations of predefined semantic categories. Features describing whether a given word belongs to a certain category are then provided to the learning algorithm. It has been shown that using multiple distributional semantic models, each employing a different word order strategy, can lead to enhanced predictive performance. Here, another hyperparameter is also varied--the size of the context window--and an experimental investigation shows that this leads to further performance gains.","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.3,"publicationDate":"2015-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/IJDMB.2015.072766","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"66730949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
W. Shoombuatong, Panuwat Mekha, Jeerayut Chaijaruwanich
{"title":"Sequence based human leukocyte antigen gene prediction using informative physicochemical properties","authors":"W. Shoombuatong, Panuwat Mekha, Jeerayut Chaijaruwanich","doi":"10.1504/IJDMB.2015.072072","DOIUrl":"https://doi.org/10.1504/IJDMB.2015.072072","url":null,"abstract":"Prediction of different classes within the human leukocyte antigen (HLA) gene family can provide insight into the human immune system and its response to viral pathogens. Therefore, it is desirable to develop an efficient and easily interpretable method for predicting HLA gene class compared to existing methods. We investigated the HLA gene prediction problem as follows: (a) establishing a dataset (HLA262) such that the sequence identity of the complete HLA dataset was reduced to 30%; (b) proposing a feature set of informative physicochemical properties that cooperate with SVM (named HLAPred) to achieve high accuracy and sensitivity (90.04% and 82.99%, respectively) compared with existing methods; and (c) analysing the informative physicochemical properties to understand the physicochemical properties and molecular mechanisms of the HLA gene family.","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.3,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/IJDMB.2015.072072","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"66730614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}