M. S. Esfahani, A. Zollanvari, Byung-Jun Yoon, E. Dougherty
{"title":"Designing enhanced classifiers using prior process knowledge: Regularized maximum-likelihood","authors":"M. S. Esfahani, A. Zollanvari, Byung-Jun Yoon, E. Dougherty","doi":"10.1109/GENSiPS.2011.6169451","DOIUrl":"https://doi.org/10.1109/GENSiPS.2011.6169451","url":null,"abstract":"We propose a novel optimization-based paradigm for designing enhanced classifiers. The proposed paradigm allows us to incorporate available prior process knowledge into classifier design, thereby improving the performance of the resulting classifiers. In this work, we focus on dynamical systems that can be represented as finite-state multi-dimensional stochastic processes that possess labeled steady-state distributions. Given prior operational knowledge of the process, our goal is to build a classifier that can accurately label future observations obtained from the steady-state, by utilizing both the available prior knowledge and the training data. Simulation results show that the proposed paradigm yields improved classifiers that outperform traditional classifiers that use only training data.","PeriodicalId":181666,"journal":{"name":"2011 IEEE International Workshop on Genomic Signal Processing and Statistics (GENSIPS)","volume":"104 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115461694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Minimum description length based selection of reference sequences for comparative assemblers","authors":"B. Wajid, E. Serpedin","doi":"10.1109/GENSiPS.2011.6169487","DOIUrl":"https://doi.org/10.1109/GENSiPS.2011.6169487","url":null,"abstract":"Genome sequences are the most basic, yet most essential pieces of data in all biological analysis. Genome sequence is the solution to the Genome Assembly problem which remakes the entire sequence from a set of reads which are unordered and very small in size. Genome Assembly problem is therefore, quite complex and is broadly divided into denovo and comparative assembly. Comparative assembly takes the aid of a reference sequence, closely related to the unassembled genome, to determine the relative order of the reads with respect to one another, and then joins them together to form the sequence. This paper explores all variants of Minimum Description Length (MDL) to find the best reference sequence for comparative assembly. The paper looked at two-part MDL, Sophisticated MDL and MiniMax Regret and found that Sophisticated MDL performs better than two-part MDL, however, MiniMax regret owing to the nature of the problem was unsuitable. The proposed scheme is prior free and can be incorporated in the data preprocessing stage for all comparative assemblers allowing the assembly process to make use of the best reference sequence available.","PeriodicalId":181666,"journal":{"name":"2011 IEEE International Workshop on Genomic Signal Processing and Statistics (GENSIPS)","volume":"92 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123982130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Transcriptomic analysis using SVD clustering and SVM classification","authors":"Hong Cai, Yufeng Wang","doi":"10.1109/GENSiPS.2011.6169476","DOIUrl":"https://doi.org/10.1109/GENSiPS.2011.6169476","url":null,"abstract":"The classification performance using support vector machines (SVMs) for transcriptomic analysis can be limited due to the high dimensionality of the data. This limitation is most problematic in the case of small training sets. A general solution is to employ a dimension reduction method before SVM classification. In this paper, we propose a novel singular value decomposition (SVD) based method for dual purposes: firstly, to reduce the dimensionality, and secondly to cluster the transcriptional profiles. The kernel functions of SVM were modified based on the Riemannian geometrical structure which can achieve a better spatial resolution. The proposed approach was applied to the yeast time series microarray dataset and outperformed the traditional SVM kernels.","PeriodicalId":181666,"journal":{"name":"2011 IEEE International Workshop on Genomic Signal Processing and Statistics (GENSIPS)","volume":"155 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125227146","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yiqi Lu, Yanghua Xiao, Yaoliang Chen, Danfeng Xu, F. Yu
{"title":"A novel approach for alignments output storage problem facing clinical scenarios","authors":"Yiqi Lu, Yanghua Xiao, Yaoliang Chen, Danfeng Xu, F. Yu","doi":"10.1109/GENSiPS.2011.6169473","DOIUrl":"https://doi.org/10.1109/GENSiPS.2011.6169473","url":null,"abstract":"With ever-developing sequencing technologies, gaps between sequence mapping research work and commercial applications in clinical scenario are narrowing. However, the storage problem of alignment results is still a big challenge remained. From both storage and application perspectives, traditional approaches in alignments output files still leave much to be desired. In this paper, we try to explore on the above issues and propose a novel idea to tackle the alignments output storage problem under clinical scenarios. Experimental results show that when the coverage goes high, our method has an obvious advantage over BAM file. In addition, our method outperforms BAM file on the genomic position query on reads.","PeriodicalId":181666,"journal":{"name":"2011 IEEE International Workshop on Genomic Signal Processing and Statistics (GENSIPS)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129832073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Pathway analysis in the context of Bayesian networks - mathematical modeling of master and canalizing genes","authors":"Chen Zhao, I. Ivanov, M. Bittner, E. Dougherty","doi":"10.1109/GENSiPS.2011.6169444","DOIUrl":"https://doi.org/10.1109/GENSiPS.2011.6169444","url":null,"abstract":"We utilize a tree-structured Bayesian network to characterize and detect master and canalizing genes via the coefficient of determination (CoD). Master genes possess strong regulation over groups of genes, whereas canalizing genes take over the regulation of large cohorts under certain cell conditions. While related, the two concepts are not the same and the analytic measures we employ reveal that difference. We also consider hypothesis testing for successful drug intervention in the framework of the Bayesian model.","PeriodicalId":181666,"journal":{"name":"2011 IEEE International Workshop on Genomic Signal Processing and Statistics (GENSIPS)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122296398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A cubature Kalman filter approach for inferring gene regulatory networks using time series data","authors":"Amina Noor, E. Serpedin, M. Nounou, H. Nounou","doi":"10.1109/GENSiPS.2011.6169432","DOIUrl":"https://doi.org/10.1109/GENSiPS.2011.6169432","url":null,"abstract":"A novel technique for the inference of gene regulatory networks is proposed which utilizes cubature Kalman filter (CKF). The gene network is modeled using the state-space approach. A non-linear model for the evolution of gene expression is considered and the microarray data is assumed to follow a linear Gaussian model. CKF is used to estimate the hidden states as well as the unknown static parameters of the model. These parameters provide an insight into the regulatory relations among the genes. The proposed algorithm delievers superior performance than the linearization based extended Kalman filter (EKF) for synthetic as well as real world biological data.","PeriodicalId":181666,"journal":{"name":"2011 IEEE International Workshop on Genomic Signal Processing and Statistics (GENSIPS)","volume":"181 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127004156","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Personal genome privacy protection with feature-based hierarchical dual-stage encryption","authors":"X. Zou, Peng Liu, J. Chen","doi":"10.1109/GENSiPS.2011.6169474","DOIUrl":"https://doi.org/10.1109/GENSiPS.2011.6169474","url":null,"abstract":"Personal Genomic information is becoming increasingly important to both scientific research and clinical practice. However, security breach, misuse, or unintended disclosure of this information may result in severe privacy breaches. Traditional privacy preservation of personal genome information is implemented in an “all-or-none” manner, i.e., an entire genome being controlled as either fully accessible or fully inaccessible. In this paper, we propose a new fine-grained privacy protection method for flexible multi-level genome information protection and access. The method can make use of any user-defined hierarchical knowledge structure to define privacy levels and control cryptography-based hierarchical access. It also implements dual-stage encryptions to allow efficient definition, addition, and update of feature-based privacy protections. The experiments show that it can be effectively implemented to deal with real personal genome data sets in the future.","PeriodicalId":181666,"journal":{"name":"2011 IEEE International Workshop on Genomic Signal Processing and Statistics (GENSIPS)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132727319","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Clustering gene expression data using probabilistic non-negative matrix factorization","authors":"Belhassen Bayar, N. Bouaynaya, R. Shterenberg","doi":"10.1109/GENSiPS.2011.6169465","DOIUrl":"https://doi.org/10.1109/GENSiPS.2011.6169465","url":null,"abstract":"Non-negative matrix factorization (NMF) has proven to be a useful decomposition for multivariate data. Specifically, NMF appears to have advantages over other clustering methods, such as hierarchical clustering, for identification of distinct molecular patterns in gene expression profiles. The NMF algorithm, however, is deterministic. In particular, it does not take into account the noisy nature of the measured genomic signals. In this paper, we extend the NMF algorithm to the probabilistic case, where the data is viewed as a stochastic process. We show that the probabilistic NMF can be viewed as a weighted regularized matrix factorization problem, and derive the corresponding update rules. Our simulation results show that the probabilistic non-negative matrix factorization (PNMF) algorithm is more accurate and more robust than its deterministic homologue in clustering cancer subtypes in a leukemia microarray dataset.","PeriodicalId":181666,"journal":{"name":"2011 IEEE International Workshop on Genomic Signal Processing and Statistics (GENSIPS)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132831457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient cancer therapy using Boolean networks and Max-SAT-based ATPG","authors":"P. Lin, S. Khatri","doi":"10.1109/GENSiPS.2011.6169450","DOIUrl":"https://doi.org/10.1109/GENSiPS.2011.6169450","url":null,"abstract":"Cancer and other gene related diseases are usually caused by a failure in the signaling pathway between genes and cells. These failures can occur in different areas of the gene regulatory network, but can be abstracted as faults in the regulatory function. For effective cancer treatment, it is imperative to identify faults and select appropriate drugs to treat the fault. In this paper, we present an extensible Max-SAT based automatic test pattern generation (ATPG) algorithm for cancer therapy. This ATPG algorithm is based on Boolean Satisfiability (SAT) and utilizes the stuck-at fault model for representing signalling faults. A weighted partial Max-SAT formulation is used to enable selection of the most effective drug. Several usage cases as presented for fault identification and drug selection. These include the identification of testable faults, optimal drug selection for single/multiple known faults, and optimal drug selection for overall fault coverage. Experimental results on growth factor (GF) signaling pathways demonstrate that our algorithm is flexible, and can yield an exact solution for each feature in much less than 1 second.","PeriodicalId":181666,"journal":{"name":"2011 IEEE International Workshop on Genomic Signal Processing and Statistics (GENSIPS)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134114749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fang-Han Hsu, E. Serpedin, Yidong Chen, E. Dougherty
{"title":"Stochastic modeling of dynamic effects of copy number alterations upon gene expression levels","authors":"Fang-Han Hsu, E. Serpedin, Yidong Chen, E. Dougherty","doi":"10.1109/GENSiPS.2011.6169449","DOIUrl":"https://doi.org/10.1109/GENSiPS.2011.6169449","url":null,"abstract":"DNA copy number alterations (CNAs) are known to be related to genetic diseases, including cancer. Our previous study proposed a stochastic model to investigate the relationship between copy number and gene expression values. The simulation results revealed that the relationship is not generally linear except when the ratio of transcription factor (TF) arrival rate to TF departure rate is large. However, only a transcription productive (ON) state was considered in the previous study. Under certain environmental conditions, the demand for mRNA is limited and hence the transcription is turned off and strictly regulated. In this study, an alternative (OFF) state of transcription is proposed, in which, bound TFs are assumed to be shut down, or unloaded, immediately after stimulating a transcription. Using the Laplace-Stieltjes transform and numerical analysis, the relationship between DNA copy number and gene expression level is evaluated. The stochastic models show that CNAs would potentially alter, or even reverse, the amplitude changes of gene expression levels between the productive (ON) state and the alternative (OFF) state.","PeriodicalId":181666,"journal":{"name":"2011 IEEE International Workshop on Genomic Signal Processing and Statistics (GENSIPS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129241044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}