Sriram Sridharan, A. Datta, Jijayanagaram Venkatraj
{"title":"Boolean model to experimental validation: A preliminary attempt","authors":"Sriram Sridharan, A. Datta, Jijayanagaram Venkatraj","doi":"10.1109/GENSIPS.2013.6735945","DOIUrl":"https://doi.org/10.1109/GENSIPS.2013.6735945","url":null,"abstract":"There are several publications detailing modeling of biological systems, especially in the post-genomic era. However there is a real dearth of work testing and validating/invalidating mathematical models with experimental data. This work is one of the first attempts trying to bridge this expanding gap. Oxidative stress is a consequence of both normal and abnormal cellular metabolism and is linked to cell proliferation, differentiation and apoptosis through both genetic and epigenetic changes leading to the development of human diseases. Oxidative stress itself is a consequence of the imbalance between pro and anti-oxidative factors generated by cells in response to internal and external cues. A common mechanism for chemotherapeutic agents inducing cell death is through the induction of the generation of free radicals leading to an excess of free radicals. Although the exact mechanism of the molecular signaling that it entails is still being worked upon, however it is clear that this varies with the stage and type of cancer and the drug and dosage used. Key genes in the oxidative stress response pathways were earlier modeled by us using the multivariate Boolean Network Modeling. Here we studied the response of well accepted progressive breast cancer cell lines, the MCF10A series in response to Adriamycin and Cyclophosphamide, two well-known and commonly used chemotherapeutic drugs. We provide evidence that the strategy of using Boolean modeling and laboratory testing of the model, although not a perfect match, is certainly a reasonable one.","PeriodicalId":336511,"journal":{"name":"2013 IEEE International Workshop on Genomic Signal Processing and Statistics","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114931845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A structure-based approach to predicting in vitro transcription factor-DNA interaction","authors":"Zhenzhu Gao, Jianhua Ruan","doi":"10.1109/GENSIPS.2013.6735915","DOIUrl":"https://doi.org/10.1109/GENSIPS.2013.6735915","url":null,"abstract":"Summary form only given. Understanding the mechanism of transcriptional regulation remains to be an inspiring stage of molecular biology. Within the popular methods for modeling TFBS, position-specific weight matrix and k-mer based approaches have gained great success. However, both approaches fail to consider the structural properties of a binding site. Recently, a novel TFBS modeling and predicting approach is presented by Bauer et al. (2010), where the sequence-specific chemical and structural features of DNA are applied. However, the in vivo protein-DNA interactions observed in ChIP-chip assays, which were used in this study, are not necessarily direct, as some TFs tend to interact with DNAs extensively through other partners. Therefore, an evaluation on a proper in vitro dataset would be more appropriate to reveal the benefit of such physicochemical features in modeling TF-DNA interactions. Recently, in vitro protein-binding microarray experiment has greatly improved the understanding of transcription factor-DNA interaction. It is a high-throughput experiment used to measure the in vitro binding affinity of a given TF to the sequences on the probe array. Because typical confounding factors such as transcription co-factors present in ChIP-based experiments are eliminated, PBM data provide an excellent information source to develop structural models for TF-DNA interactions. On the other hand, directly mapping of the 3-mer or 4-mer based meta-features to the candidate DNA binding sequences as in their work may not reflect the TF-DNA binding nature, since a TFBS is usually an 8 to 12 base-pair. As a result, conventionally machine-learning algorithms, which rely on well-structured feature vector and label pairs, may not work well in modeling PBM data. In this paper we propose a novel approach to predicts in vitro transcription factor binding based on the structural properties of DNA using a so-called multiple-instance learning algorithm. Compared to conventional (single-instance based) learning algorithms, our multi-instance learning-based algorithm does not require the knowledge of the actual binding site within a candidate probe sequence, yet can still take full advantage of the physicochemical properties in modeling and predicting TF-DNA interactions. Evaluation on an in vitro protein binding microarray data of twenty mouse TFs shows that our new model performs significantly better than several k-mer or structure-based single-instance learning algorithms. It indicates that combining multi-instance learning and structural properties of DNA has promising potential for studying biological regulatory networks.","PeriodicalId":336511,"journal":{"name":"2013 IEEE International Workshop on Genomic Signal Processing and Statistics","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121705408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improved branch-and-bound algorithm for U-curve optimization","authors":"E. Atashpaz-Gargari, U. Braga-Neto, E. Dougherty","doi":"10.1109/GENSIPS.2013.6735948","DOIUrl":"https://doi.org/10.1109/GENSIPS.2013.6735948","url":null,"abstract":"The U-curve branch-and-bound algorithm for optimization was introduced recently by Ris and collaborators. In this paper we introduce an improved algorithm for finding the optimal set of features based on the U-curve assumption. Synthetic experiments are used to asses the performance of the proposed algorithm, and compare it to exhaustive search and the original algorithm. The results show that the modified U-curve BB algorithm makes fewer evaluations and is more robust than the original algorithm.","PeriodicalId":336511,"journal":{"name":"2013 IEEE International Workshop on Genomic Signal Processing and Statistics","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122365320","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lingjia Kong, Kaisa-Leena Aho, Kirsi J. Granberg, Christophe Roos, R. Autio
{"title":"DBComposer: An R package for integrative analysis and management of gene expression microarray data","authors":"Lingjia Kong, Kaisa-Leena Aho, Kirsi J. Granberg, Christophe Roos, R. Autio","doi":"10.1109/GENSIPS.2013.6735944","DOIUrl":"https://doi.org/10.1109/GENSIPS.2013.6735944","url":null,"abstract":"DBComposer is an R package with a graphical user interface (GUI) to analyze and integrate human gene expression microarray data. With DBComposer, the data can be easily annotated, preprocessed and analyzed in several ways. DBComposer can also serve as a personal expression microarray database allowing users to store multiple datasets together for later retrieval or data analysis. It takes advantage of many R packages for statistics and visualizations, and provides a flexible framework to implement custom workflows to extend the data analysis capabilities.","PeriodicalId":336511,"journal":{"name":"2013 IEEE International Workshop on Genomic Signal Processing and Statistics","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128754822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optimal Bayesian MMSE estimation of the coefficient of determination for discrete prediction","authors":"Ting-Ju Chen, U. Braga-Neto","doi":"10.1109/GENSIPS.2013.6735933","DOIUrl":"https://doi.org/10.1109/GENSIPS.2013.6735933","url":null,"abstract":"The coefficient of determination (CoD) has significant applications in genomics, for example, in the inference of gene regulatory networks. In previous publications, we have studied several nonparametric CoD estimators, based upon the resubstitution, leave-one-out, cross-validation, and bootstrap error estimators, and one parametric maximum-likelihood (ML) CoD estimator that allows the incorporation of available prior knowledge, from a frequentist perspective. However, none of these CoD estimators are rigorously optimized based on statistical inference across a family of possible distributions. Therefore, by following the idea of Bayesian error estimation for classification, we define a Bayesian CoD estimator that minimizes the mean-square error (MSE), based on a parametrized family of joint distributions between predictors and target as a function of random parameters characterized by assumed prior distributions. We derive an exact formulation of the sample-based Bayesian MMSE CoD estimator. Numerical experiments are carried out to estimate performance metrics of the Bayesian CoD estimator and compare them against those of resubstitution, leave-one-out, bootstrap and cross-validation CoD estimators over all the distributions, by employing the Monte Carlo sample method. Results show that the Bayesian CoD estimator has the best performance, displaying zero bias, small variance, and least root mean-square error (RMS).","PeriodicalId":336511,"journal":{"name":"2013 IEEE International Workshop on Genomic Signal Processing and Statistics","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114919627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Prediction of drug-target interactions using popular Collaborative Filtering methods","authors":"A. Koohi","doi":"10.1109/GENSIPS.2013.6735931","DOIUrl":"https://doi.org/10.1109/GENSIPS.2013.6735931","url":null,"abstract":"Computational approaches for predicting drug-protein interactions have gained more attention in recent years. The main reason is that a correct prediction based on screening a database of small molecules against a certain class of protein can potentially accelerate drug discovery. In this paper a popular prediction method, collaborative filtering in recommender systems, is evaluated for the prediction of drug-protein interaction. The interaction matrix for the drug-protein and the rating matrix of user-item are similar and in both cases only a small subset of the matrices are known. The CF (collaborative filtering) methods are evaluated on four classes of proteins and AUC (Area under receiver operating characteristic curve) and AUPR (Area under precision-recall curve) are reported. It is shown that collaborative filtering methods can be effective in the prediction of drug-target interaction based on the known interaction matrix. These results highlight the importance of using the known interaction matrix in order to achieve high accuracy and precision in prediction.","PeriodicalId":336511,"journal":{"name":"2013 IEEE International Workshop on Genomic Signal Processing and Statistics","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128129725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Tchagang, Sieu Phan, Fazel Famili, Youlian Pan, A. Cutler, Jitao Zou
{"title":"A generic model of transcriptional regulatory networks: Application to plants under abiotic stress","authors":"A. Tchagang, Sieu Phan, Fazel Famili, Youlian Pan, A. Cutler, Jitao Zou","doi":"10.1109/GENSIPS.2013.6735922","DOIUrl":"https://doi.org/10.1109/GENSIPS.2013.6735922","url":null,"abstract":"Understanding the relationships between transcription factors (TFs) and genes in plants under abiotic stress responses, tolerance and adaptation to adverse environments is very important in developing resilient crop varieties. While experimental methods to characterize stress responsive TFs and their targets are highly accurate, identification and characterization of the role of a given gene in a given stress response event are often laborious and time consuming. Computational approaches, on the other hand, offer a platform to identify new knowledge by integrating high throughput omics data and mathematical methods/models. In this research, we have developed a generic linear model of transcriptional regulatory networks (TRNs) and a companion algorithm to identify and to characterize stress responsive genes and their roles in a given stress response event. The proposed methodology was applied to plants, by using Arabidopsis thaliana as an example, under abiotic stress. Well known interactions were inferred as well as putative novel ones that may play important roles in plants under abiotic stress conditions as confirmed by statistical and literature evidences.","PeriodicalId":336511,"journal":{"name":"2013 IEEE International Workshop on Genomic Signal Processing and Statistics","volume":"119 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134275984","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bayesian multivariate Poisson model for RNA-seq classification","authors":"J. Knight, I. Ivanov, E. Dougherty","doi":"10.1109/GENSIPS.2013.6735946","DOIUrl":"https://doi.org/10.1109/GENSIPS.2013.6735946","url":null,"abstract":"High dimensional data and small samples make genomic/proteomic classifier design and error estimation virtually impossible without the use of prior information [1]. Dalton and Dougherty utilize prior biological knowledge via a Bayesian approach that considers a prior distribution on an uncertainty class of feature-label distributions [2], [3]. While their general framework is very broad, the focus their attention on multinomial and Gaussian models, for which they derive closed-form solutions of the minimum mean squared error (MMSE) error estimate, the MSE of the error estimate, and an optimal Bayesian classifier (OBC) classifier relative to the prior distribution. Sequencing datasets consist of the number of reads found to map to specific regions of a reference genome. As such, they are often modeled with a discrete distribution, such as the Poisson. For this reason, Gaussian and multinomial distributions are not ideal for sequence-based datasets. Thus, we introduce a multivariate Poisson model (MP) and the associated MP OBC for classifying samples using sequencing data. Lacking closed-form solutions, we employ a Monte Carlo Markov Chain (MCMC) approach to perform classification. We demonstrate superior classification performance for more complex synthetic datasets and comparable performance to the top classifiers in other simpler synthetic datasets.","PeriodicalId":336511,"journal":{"name":"2013 IEEE International Workshop on Genomic Signal Processing and Statistics","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125172740","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Effect of separate sampling on classification and the minimax criterion","authors":"M. S. Esfahani, E. Dougherty","doi":"10.1109/GENSIPS.2013.6735935","DOIUrl":"https://doi.org/10.1109/GENSIPS.2013.6735935","url":null,"abstract":"It is commonplace in bioinformatics (and elsewhere) to build a classifier from sample data in which the sample sizes of the classes are not random; that is, they are selected prior to sampling. The result is that there is no estimate of the prior class probabilities available from the data. In this paper, we find an analytic result for the minimax solution for the class prior probabilities for a general Neyman-Pearson induced classifier. From that we derive Anderson's classical minimax prior probability “estimate.” Using synthetic and real data, we demonstrate the degradation in classifier performance from using inaccurate values for the prior probabilities.","PeriodicalId":336511,"journal":{"name":"2013 IEEE International Workshop on Genomic Signal Processing and Statistics","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121609436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ying-Wooi Wan, John Nagorski, Genevera I. Allen, Zhaohui Li, Zhandong Liu
{"title":"Identifying cancer biomarkers through a network regularized Cox model","authors":"Ying-Wooi Wan, John Nagorski, Genevera I. Allen, Zhaohui Li, Zhandong Liu","doi":"10.1109/GENSIPS.2013.6735924","DOIUrl":"https://doi.org/10.1109/GENSIPS.2013.6735924","url":null,"abstract":"A central problem in cancer genomics is to identify interpretable biomarkers for better disease prognosis. Many of the biomarkers identified through Cox Proportional Hazard (PH) models are biologically uninterpretable. We propose the use of graph Laplacian regularized Cox PH model to integrate biological networks into the feature selection problem in survival analysis. Simulation studies demonstrate that the performance of the proposed algorithm is superior to L1 and L1+L2 regularized Cox PH models. Utility of this algorithm is also validated by its ability to identify key known biomarkers such as p53 and myc in estrogen receptor positive breast cancer patients using genomic abberration data generated by the Cancer Genome Altas consortium. With the rapid expansion of our knowledge of biological networks, this approach will become increasingly useful for mining high-throughput genomic datasets.","PeriodicalId":336511,"journal":{"name":"2013 IEEE International Workshop on Genomic Signal Processing and Statistics","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133838646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}