{"title":"Issues with the PipeAlign phylogenomics toolkit in identifying protein subfamilies","authors":"Christine Kehyayan, G. Butler","doi":"10.1109/CIBCB.2010.5510344","DOIUrl":"https://doi.org/10.1109/CIBCB.2010.5510344","url":null,"abstract":"Automated protein function annotation is extremely important in computational biology for its low cost. Standard sequence similarity comparison methods for annotation have limited specificity in identifying orthologs and paralogs. Phylogenomic methods are gaining popularity for their role in identifying orthologs and paralogs with the help of evolutionary information and sequence data. Pipelines have been developed for phylogenomic classification of proteins. Two such pipelines are PhyloFacts and PipeAlign. Given a protein of interest, these pipelines identify functional subfamilies for the protein superfamily. Subfamilies hold orthologs and paralogs and can later be used to identify orthologous groups. We evaluate the performance of PipeAlign with respect to both consistency in the generated subfamilies and phylogeny. We use the predefined subfamilies of PhyloFacts as a reference to compare the generated subfamilies of related reference sequences in PipeAlign. In the consistency analysis, we compare the compositions of the generated functional subfamilies with different related reference sequences, and use the predefined PhyloFacts subfamilies for the corresponding sequences as a measure of consistency. In the phylogenetic analysis, we compare the evolutionary distances of the members of the same and different generated subfamilies from PipeAlign.","PeriodicalId":340637,"journal":{"name":"2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125016455","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Side effect machines for quaternary edit metric decoding","authors":"J. A. Brown, S. Houghten, D. Ashlock","doi":"10.1109/CIBCB.2010.5510422","DOIUrl":"https://doi.org/10.1109/CIBCB.2010.5510422","url":null,"abstract":"DNA edit metric codes are used as labels to track the origin of sequence data. This study is the first to treat sophisticated decoders for these error-correcting codes. Side effect machines can provide efficient decoding algorithms for such codes. Two methods for automatically producing decoding algorithms are presented. Side Effect Machines (SEMs), generalizations of finite state automata, are used in both. Single Classifier Machines (SCMs) use a single side effect machine to classify all words within a code. Locking Side Effect Machines (LSEMs) use multiple side effect machines to create a tree structured iterated classification. This study examines these techniques and provides new decoders for existing codes. Presented are ideas for best practises for the creation of these two types of new edit metric decoders. Codes of the form (n,M,d)4 are used in testing due to their suitability for bioinformatics problems. A group of (12, 54–56, 7)4 codes are used as an example of the process.","PeriodicalId":340637,"journal":{"name":"2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127451514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Additive noise analysis on microarray data via SVM classification","authors":"Z. Ding, Yanqing Zhang","doi":"10.1109/CIBCB.2010.5510725","DOIUrl":"https://doi.org/10.1109/CIBCB.2010.5510725","url":null,"abstract":"Microarray technology has been broadly used for monitoring the expression levels of thousands of genes simultaneously, providing the opportunities of identifying disease-related genes by finding differentially expressed genes in different conditions. However, a great challenge of analyzing microarray data is the significant noise brought by different experimental settings, laboratory procedures, genetic heterogeneity among samples, and environmental variations among different patients, and so on. This paper attempts to analyze the influence of these noises on each gene by measuring the changes of classification performance. We assume each gene in microarray data includes an independently distributed unknown uniform noise. Thus, we add a compensated noise back to each gene and test whether the classification accuracy of a linear support vector machine (SVM) improves. If the accuracy does increase, then we believe such noise does exist and degenerate the relation of this gene to the disease status. Through extensive experiments on several public microarray data, we found such added noises can improve the classification accuracy in several genes and the results are relatively consistent, indicating our method can be used to analyze the noise pattern in microarray experiments, and also discover potential important gene markers.","PeriodicalId":340637,"journal":{"name":"2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129251264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kengo Sato, Thomas Whitington, T. Bailey, P. Horton
{"title":"Improved prediction of transcription binding sites from chromatin modification data","authors":"Kengo Sato, Thomas Whitington, T. Bailey, P. Horton","doi":"10.1109/CIBCB.2010.5510323","DOIUrl":"https://doi.org/10.1109/CIBCB.2010.5510323","url":null,"abstract":"In this paper we apply machine learning to the task of predicting transcription factor binding sites by combining information on multiple forms of chromatin modification with the binding strength DNA site predicted by a position weight matrix. We additionally explore the effect of incorporating auxiliary features such as the distance of the site to the nearest gene's transcription start site and the degree to which the site is conserved among related species. We approach the task as a classification problem, and show that both Na¨ıve Bayes and Random Forests can provide substantial increases in the accuracy of predicted binding sites. Our results extend previous work which simply filtered candidate sites based on H3K4Me3 chromatin modification scores. In addition we apply feature selection to explore which forms of chromatin modification and which auxiliary features have predictive value for which transcription factors.","PeriodicalId":340637,"journal":{"name":"2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124351804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improved PCR design for mouse DNA by training finite state machines","authors":"S. Yadav, S. Corns","doi":"10.1109/CIBCB.2010.5510701","DOIUrl":"https://doi.org/10.1109/CIBCB.2010.5510701","url":null,"abstract":"This project presents an updated method for classification of polymerase chain reaction primers in mice using finite state classifiers. This is done to compensate for many lab, organism and chemical specific factors that are costly. Using Finite State Classifiers can help decrease the number of primers that fail to amplify correctly. For training these classifiers, five different evolutionary algorithms that use an incremental fitness reward are used. Variations to the number of generations and the values in the fitness reward are examined, and the resulting designs are presented. By controlling the fitness reward correctly, there is a potential to develop classifiers with a high likelihood of accepting only good primers. The proposed tool can act as a post-production add-on to the standard primer picking algorithm for gene expression detection in mice to compensate for local factors that may induce errors.","PeriodicalId":340637,"journal":{"name":"2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123080160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Nearest neighbor training of side effect machines for sequence classification","authors":"D. Ashlock, Andrew McEachern","doi":"10.1109/CIBCB.2010.5510426","DOIUrl":"https://doi.org/10.1109/CIBCB.2010.5510426","url":null,"abstract":"Side effect machines operate by associating side effects with the states of a finite state machine. The use of side effect machines permits the researcher to leverage information stored in the state transition structure, making machines that might be identical as recognizers behave differently as classifiers. The side effect machines in this study associate a counter with each state so that the number of times each state is visited becomes a numerical feature associated with each state. The key to effective use of these numerical feature is to locate side effect machines for which the count vectors are good feature sets. In this study side effect machines are selected with an evolutionary algorithm. The Rand index of nearest neighbor classification of the count vectors serves as the fitness function for selecting side effect machines. A parameter study is performed on simple synthetic data and then side effect machines are trained to classify two sets of biological sequences. The first set comprises two categories of HLA sequences from the human major histocompatibility complex. The second are positive and negative examples of human endogenous retroviral sequences taken from the human genome. The retroviral sequences are challenging but good results are obtained. The HLA data is classified with complete accuracy.","PeriodicalId":340637,"journal":{"name":"2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126568034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Applying neural networks to classify influenza virus antigenic types and hosts","authors":"P. Attaluri, Zhengxin Chen, G. Lu","doi":"10.1109/CIBCB.2010.5510726","DOIUrl":"https://doi.org/10.1109/CIBCB.2010.5510726","url":null,"abstract":"Influenza viruses continue to evolve rapidly and are responsible for seasonal epidemics and occasional, but catastrophic, pandemics. We recently demonstrated the use of decision tree and support vector machine methods in classifying pandemic swine flu viral strains with high accuracy. Here, we applied the technique of artificial neural networks for the prediction of important influenza virus antigenic types (H1, H3, and H5) and hosts (Human, Avian, and Swine), which fulfills a critical need for a computational system for influenza surveillance. A comprehensive experiment on different k-mers and different binary encoding types showed classification based upon frequencies of k-mer nucleotide strings performed better than transformed binary data of nucleotides. It has been found for the first time that the accuracy of virus classification varies from host to host and from gene segment to gene segment. In particular, compared to avian and swine viruses, human influenza viruses can be classified with high accuracy, which indicates influenza virus strains might have become well adapted to their human host and hence less variation occurs in human viruses. In addition, the accuracy of host classification varies from genome segment to segment, achieving the highest values when using the HA and NA segments for human host classification. This research, along with our previous studies, shows machine learning techniques play an indispensable role in virus classification.","PeriodicalId":340637,"journal":{"name":"2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128198855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
G. Fogel, Jonathan Tran, Stephen Johnson, David Hecht
{"title":"Machine learning approaches for customized docking scores: Modeling of inhibition of Mycobacterium tuberculosis enoyl acyl carrier protein reductase","authors":"G. Fogel, Jonathan Tran, Stephen Johnson, David Hecht","doi":"10.1109/CIBCB.2010.5510700","DOIUrl":"https://doi.org/10.1109/CIBCB.2010.5510700","url":null,"abstract":"Machine learning algorithms were used for feature selection and model generation of customized docking score functions for known inhibitors of Mycobacterium tuberculosis enoyl acyl carrier protein reductase. The features included small molecule descriptors derived from MOE, Accord, and Molegro as well as in silico docking energies/scores from GOLD and Autodock. The resulting models can be used to identify key descriptors for enoyl acyl carrier protein reductase inhibition and are useful for high-throughput screening of novel drug compounds. This paper also evaluates and contrasts several strategies for model generation for quantitative structure-activity relationships.","PeriodicalId":340637,"journal":{"name":"2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124367878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A fitness-independent evolvability measure for evolutionary developmental systems","authors":"Yaochu Jin, J. Trommler","doi":"10.1109/CIBCB.2010.5510475","DOIUrl":"https://doi.org/10.1109/CIBCB.2010.5510475","url":null,"abstract":"Evolvability refers to the organisms ability to create heritable new phenotypes that potentially facilitate the organism's survival and reproduction. In this paper, a general evolvability measure for a computational model of evolutionary development is proposed. The measure is able to quantify individuals' evolvability, including robustness and innovation, independent of the fitness function of the evolutionary system. Empirical studies are performed to check the evolvability of individuals in in silico evolution of oscillatory behavior using the proposed evolvability measure. Our preliminary results suggest that evolvability of the developmental system can evolve without an explicit selection pressure on evolvability, confirming findings revealed in other artificial evolutionary systems.","PeriodicalId":340637,"journal":{"name":"2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology","volume":"198 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121881575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exploring structural modeling of proteins for kernel-based enzyme discrimination","authors":"Marco A. Alvarez, Changhui Yan","doi":"10.1109/CIBCB.2010.5510588","DOIUrl":"https://doi.org/10.1109/CIBCB.2010.5510588","url":null,"abstract":"Computational methods play an important role in investigating the relationships between protein structure and function. In this study, we evaluate different graph representations of protein structures for kernel-based protein function prediction. We use shortest path graph kernels and support vector machines to predict whether a protein is an enzyme or not. We present three different and straightforward strategies for modeling protein structures. Accuracy averages for 10-fold cross-validation range from 84.31% to 86.97% for different modeling strategies, outperforming state-of-the-art work.","PeriodicalId":340637,"journal":{"name":"2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130008982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}