{"title":"Principal components analysis filters functionally significant peroxidase motions","authors":"M. Laberge, Istvan Kovesi","doi":"10.1109/CIBCB.2010.5510723","DOIUrl":"https://doi.org/10.1109/CIBCB.2010.5510723","url":null,"abstract":"Molecular dynamics simulation of explicitly solvated horseradish peroxidase and of its Ca-depleted form have been carried out, and the trajectories have been analyzed by the essential dynamics method. The results indicate that the motion of the native species is defined by a few preferred directions identified by the first four eigenvectors. The eigenvectors are significantly sampled and reveal that collective motions are perturbed in the absence of calcium. The destabilization of HRP and the corresponding decrease in the catalytic activity of the enzyme is due to perturbed collective motions primarily in the region located around the proximal calcium site.","PeriodicalId":340637,"journal":{"name":"2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology","volume":"154 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116507746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Missing value imputation methods for gene-sample-time microarray data analysis","authors":"Yifeng Li, A. Ngom, L. Rueda","doi":"10.1109/CIBCB.2010.5510349","DOIUrl":"https://doi.org/10.1109/CIBCB.2010.5510349","url":null,"abstract":"With the recent advances in microarray technology, the expression levels of genes with respect to the samples can be monitored synchronically over a series of time points. Such three-dimensional microarray data, termed gene-sample-time microarray data or GST data for short, may contain missing values. Current microarray analysis methods require complete data sets, and thus, either each row, column or tube containing missing values must be removed from the original GST data, or these missing values must be estimated before analysis. Imputation of missing values is, however, more recommended than removal of data in order to increase the effectiveness of analysis algorithms. In this paper, we extend automated imputation methods, devised for two-dimensional microarray data, to GST data. We implemented imputation methods for GST data based on Singular Value Decomposition (3SVDimpute), K-Nearest Neighbor (3KNNimpute), and gene and sample average methods (3Aimpute), and show that methods based on KNN yield the best results with the lowest normalized root mean squared error.","PeriodicalId":340637,"journal":{"name":"2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129020925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Super-resolution of mammograms","authors":"Jun Zheng, O. Fuentes, M. Leung","doi":"10.1109/CIBCB.2010.5510384","DOIUrl":"https://doi.org/10.1109/CIBCB.2010.5510384","url":null,"abstract":"High-quality mammography is the most effective technology presently available for breast cancer screening. High resolution mammograms usually lead to more accurate diagnoses; however, they require large doses of radiation, which may have harmful effects. In this paper, we present a method to synthesize high-resolution mammograms from low-resolution inputs, which offers the potential of allowing accurate diagnoses while minimizing risks to patients. Our algorithm combines statistical machine learning methods and stochastic search to learn the mapping from low-resolution to high-resolution mammograms using a large dataset of training image pairs. Experimental results show that the super-resolution algorithm can generate high-quality, high-resolution breast mammograms from low-resolution input with no human intervention.","PeriodicalId":340637,"journal":{"name":"2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology","volume":"205 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116394989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Modeling treatment and drug effects at the molecular level using hybrid system theory","authors":"Xiangfang Li, Lijun Qian, E. Dougherty","doi":"10.1109/CIBCB.2010.5510440","DOIUrl":"https://doi.org/10.1109/CIBCB.2010.5510440","url":null,"abstract":"In this paper, we propose to study the treatment and drug effects at the molecular level using a hybrid system model. Specifically, we propose a generic piecewise linear model to analyze drug effects on the state of the genes in a genetic regulatory network. We intend to answer the following question: given an initial state, would a treatment or drug (control input) drive the target gene to a new desired state that are not reachable without the treatment or drug? assuming that the concentration level of the drug remains constant. In other words, we try to identify whether there is a chance that the treatment or drug will be effective for changing gene expressions at all. We provide detailed analysis for two cases. In the first case, there is only one target gene; while in the second case, there is also another gene interacting with the target gene. The relationships between various parameters (of the genetic regulatory network and the design of the drug) and the convergence and the steady state of the controlled genes are derived analytically and discussed in detail. Simulations are performed using MATLAB/SIMULINK and the results confirmed our analytical findings.","PeriodicalId":340637,"journal":{"name":"2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125600033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Speeding up subcellular localization by extracting informative regions of protein sequences for profile alignment","authors":"Wei Wang, M. Mak, S. Kung","doi":"10.1109/CIBCB.2010.5510320","DOIUrl":"https://doi.org/10.1109/CIBCB.2010.5510320","url":null,"abstract":"The functions of proteins are closely related to their subcellular locations. In the post-proteomics era, the amount of gene and protein data grows exponentially, which necessitates the prediction of subcellular localization by computational means. This paper proposes mitigating the computation burden of alignment-based approaches to subcellular localization prediction by using the information provided by the N-terminal sorting signals. To this end, a cascaded fusion of cleavage site prediction and profile alignment is proposed. Specifically, the informative segments of protein sequences are identified by a cleavage site predictor. Then, only the informative segments are applied to a homology-based classifier for predicting the subcellular locations. Experimental results on a newly constructed dataset show that the method can make use of the best property of both approaches and can attain an accuracy higher than using the full-length sequences. Moreover, the method can reduce the computation time by 20 folds. We advocate that the method will be important for biologists to conduct large-scale protein annotation or for bioinformaticians to perform preliminary investigations on new algorithms that involve pairwise alignments.","PeriodicalId":340637,"journal":{"name":"2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127023168","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Support vectors based correlation coefficient for gene and sample selection in cancer classification","authors":"P. Mundra, Jagath Rajapakse","doi":"10.1109/CIBCB.2010.5510689","DOIUrl":"https://doi.org/10.1109/CIBCB.2010.5510689","url":null,"abstract":"Correlation is a very widely used filter criterion for gene selection in cancer classification. However, it uses all the training samples in ranking, which may not be equally important for the classification. Using support vectors, we demonstrate that classical correlation coefficient based gene selection is biased because of the sample points away from classification margin. To remove such bias, we use only the support vectors for computation of correlation coefficient and propose a backward elimination based SVcc-RFE algorithm. The proposed method is tested on several benchmark cancer gene expression datasets and the results show improvement in classification performance compared to other state-of-the-art methods.","PeriodicalId":340637,"journal":{"name":"2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115081550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A regression tree-based Gibbs sampler to learn the regulation programs in a transcription regulatory module network","authors":"Jianlong Qi, T. Michoel, G. Butler","doi":"10.1109/CIBCB.2010.5510433","DOIUrl":"https://doi.org/10.1109/CIBCB.2010.5510433","url":null,"abstract":"Many algorithms have been proposed to learn transcription regulatory networks from gene expression data. Bayesian networks have obtained promising results, in particular, the module network method. The genes in a module share a regulation program (regression tree), consisting of a set of parents and conditional probability distributions. Hence, the method significantly decreases the search space of models and consequently avoids overfitting. The regulation program of a module is normally learned by a deterministic search algorithm, which performs a series of greedy operations to maximize the Bayesian score. The major shortcoming of the deterministic search algorithm is that its result may only represent one of several possible regulation programs. In order to account for the model uncertainty, we propose a regression tree-based Gibbs sampling algorithm for learning regulation programs in module networks. The novelty of this work is that a set of tree operations is defined for generating new regression trees from a given tree and we show that the set of tree operations is sufficient to generate a well mixing Gibbs sampler even in large data sets. The effectiveness of our algorithm is demonstrated by the experiments in synthetic data and real biological data.","PeriodicalId":340637,"journal":{"name":"2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133804093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Functional data classification for temporal gene expression data with kernel-induced random forests","authors":"Guangzhe Fan, Jiguo Cao, Jiheng Wang","doi":"10.1109/CIBCB.2010.5510482","DOIUrl":"https://doi.org/10.1109/CIBCB.2010.5510482","url":null,"abstract":"Scientists and others today often collect samples of curves and other functional data. The multivariate data classification methods cannot be directly used for functional data classification because the curse of dimensionality and difficulty in taking in account the correlation and order of functional data. We extend the kernel-induced random forest method for discriminating functional data by defining kernel functions of two curves. This method is demonstrated by classifying the temporal gene expression data. The simulation study and applications show that the kernel-induced random forest method increases the classification accuracy from the available methods. The kernel-induced random forest method is easy to implement by naive users. It is also appealing in its flexibility to allow users to choose different curve estimation methods and appropriate kernel functions.","PeriodicalId":340637,"journal":{"name":"2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology","volume":"473 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113998802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Discrete Fourier Transform method for alignment of visual evoked potentials","authors":"Ismet Sahin, N. Yilmazer","doi":"10.1109/CIBCB.2010.5510704","DOIUrl":"https://doi.org/10.1109/CIBCB.2010.5510704","url":null,"abstract":"In this paper, we consider alignment of visual evoked potentials (EP) in the Discrete Fourier Transform (DFT) domain. Visual EPs have important clues for diagnosing medical problems such as multiple sclerosis and optic neuritis. The amplitude of visual EPs are usually smaller than the amplitude of spontaneous EPs which causes difficulties in reliably finding the latencies and amplitudes of important positive and negative peaks in the evoked responses. Therefore, noise cancellation becomes important for determining the features of interest in these waveforms. A well-known noise cancellation method is averaging multiple evoked potentials. Averaging after alignment of EP waveforms can improve the waveform quality substantially since usually evoked potentials have different characteristics and therefore have different latencies and amplitudes in response to the same visual stimulus. In this paper, we use a time alignment method which simultaneously reduces the spectral differences between all waveforms by minimizing the linearly phase shifted forms of the DFTs of these waveforms. We demonstrate that this method successfully aligns multiple visual EPs and achieves a smooth averaged waveform with reduced noise.","PeriodicalId":340637,"journal":{"name":"2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116164497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Identifying essential features for the classification of real and pseudo microRNAs precursors using fuzzy decision trees","authors":"Na'el Abu-halaweh, R. Harrison","doi":"10.1109/CIBCB.2010.5510430","DOIUrl":"https://doi.org/10.1109/CIBCB.2010.5510430","url":null,"abstract":"MicroRNAs play an important role in post-transcriptional gene regulation. Experimental approaches to identify microRNAs are expensive and time-consuming. Computational approaches have proven to be useful for identifying microRNA candidates. Most approaches rely on features extracted from miroRNA precursors (pre-microRNA) and their secondary structure. Selecting the appropriate set of features plays a critical role in improving the prediction accuracy of pre-microRNA candidates. This work aims to investigate the triplet elements encoding scheme and to identify essential features needed for the correct classification of pre-microRNAs. To achieve these goals, an extension of the triplet elements encoding scheme is introduced. Features extracted using the extended scheme were combined with global features introduced in the literature, and fuzzy decision tree (FDT) is used as a classification and a feature selection tool. Unlike previous machine-learning-based approaches, FDT produces a human comprehensible classification model. The interpretability of the classification model provides a means to identify the essential features needed to recognize microRNA candidates and offers a better understanding of this problem. Our results indicate that the triplet elements scheme is not superior to any of its proposed extensions. Further analysis revealed that including the features extracted using triplet elements scheme does not add any value to this classification problem but rather introduces some noisy features, and comparable classification results can be achieved by using only the six global features identified by FDT.","PeriodicalId":340637,"journal":{"name":"2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125963395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}