2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology最新文献_第3页

Issues with the PipeAlign phylogenomics toolkit in identifying protein subfamilies PipeAlign系统基因组学工具包在识别蛋白质亚家族中的问题

2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology Pub Date : 2010-05-02 DOI: 10.1109/CIBCB.2010.5510344

Christine Kehyayan, G. Butler

{"title":"Issues with the PipeAlign phylogenomics toolkit in identifying protein subfamilies","authors":"Christine Kehyayan, G. Butler","doi":"10.1109/CIBCB.2010.5510344","DOIUrl":"https://doi.org/10.1109/CIBCB.2010.5510344","url":null,"abstract":"Automated protein function annotation is extremely important in computational biology for its low cost. Standard sequence similarity comparison methods for annotation have limited specificity in identifying orthologs and paralogs. Phylogenomic methods are gaining popularity for their role in identifying orthologs and paralogs with the help of evolutionary information and sequence data. Pipelines have been developed for phylogenomic classification of proteins. Two such pipelines are PhyloFacts and PipeAlign. Given a protein of interest, these pipelines identify functional subfamilies for the protein superfamily. Subfamilies hold orthologs and paralogs and can later be used to identify orthologous groups. We evaluate the performance of PipeAlign with respect to both consistency in the generated subfamilies and phylogeny. We use the predefined subfamilies of PhyloFacts as a reference to compare the generated subfamilies of related reference sequences in PipeAlign. In the consistency analysis, we compare the compositions of the generated functional subfamilies with different related reference sequences, and use the predefined PhyloFacts subfamilies for the corresponding sequences as a measure of consistency. In the phylogenetic analysis, we compare the evolutionary distances of the members of the same and different generated subfamilies from PipeAlign.","PeriodicalId":340637,"journal":{"name":"2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125016455","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Side effect machines for quaternary edit metric decoding 四元编辑码解码的副作用机

2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology Pub Date : 2010-05-02 DOI: 10.1109/CIBCB.2010.5510422

J. A. Brown, S. Houghten, D. Ashlock

引用次数: 11

Additive noise analysis on microarray data via SVM classification 基于支持向量机分类的微阵列数据加性噪声分析

2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology Pub Date : 2010-05-02 DOI: 10.1109/CIBCB.2010.5510725

Z. Ding, Yanqing Zhang

{"title":"Additive noise analysis on microarray data via SVM classification","authors":"Z. Ding, Yanqing Zhang","doi":"10.1109/CIBCB.2010.5510725","DOIUrl":"https://doi.org/10.1109/CIBCB.2010.5510725","url":null,"abstract":"Microarray technology has been broadly used for monitoring the expression levels of thousands of genes simultaneously, providing the opportunities of identifying disease-related genes by finding differentially expressed genes in different conditions. However, a great challenge of analyzing microarray data is the significant noise brought by different experimental settings, laboratory procedures, genetic heterogeneity among samples, and environmental variations among different patients, and so on. This paper attempts to analyze the influence of these noises on each gene by measuring the changes of classification performance. We assume each gene in microarray data includes an independently distributed unknown uniform noise. Thus, we add a compensated noise back to each gene and test whether the classification accuracy of a linear support vector machine (SVM) improves. If the accuracy does increase, then we believe such noise does exist and degenerate the relation of this gene to the disease status. Through extensive experiments on several public microarray data, we found such added noises can improve the classification accuracy in several genes and the results are relatively consistent, indicating our method can be used to analyze the noise pattern in microarray experiments, and also discover potential important gene markers.","PeriodicalId":340637,"journal":{"name":"2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129251264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Improved prediction of transcription binding sites from chromatin modification data 利用染色质修饰数据改进转录结合位点的预测

2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology Pub Date : 2010-05-02 DOI: 10.1109/CIBCB.2010.5510323

Kengo Sato, Thomas Whitington, T. Bailey, P. Horton

引用次数: 0

Improved PCR design for mouse DNA by training finite state machines 通过训练有限状态机改进小鼠DNA PCR设计

2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology Pub Date : 2010-05-02 DOI: 10.1109/CIBCB.2010.5510701

S. Yadav, S. Corns

引用次数: 8

Nearest neighbor training of side effect machines for sequence classification 序列分类中副作用机的最近邻训练

2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology Pub Date : 2010-05-02 DOI: 10.1109/CIBCB.2010.5510426

D. Ashlock, Andrew McEachern

{"title":"Nearest neighbor training of side effect machines for sequence classification","authors":"D. Ashlock, Andrew McEachern","doi":"10.1109/CIBCB.2010.5510426","DOIUrl":"https://doi.org/10.1109/CIBCB.2010.5510426","url":null,"abstract":"Side effect machines operate by associating side effects with the states of a finite state machine. The use of side effect machines permits the researcher to leverage information stored in the state transition structure, making machines that might be identical as recognizers behave differently as classifiers. The side effect machines in this study associate a counter with each state so that the number of times each state is visited becomes a numerical feature associated with each state. The key to effective use of these numerical feature is to locate side effect machines for which the count vectors are good feature sets. In this study side effect machines are selected with an evolutionary algorithm. The Rand index of nearest neighbor classification of the count vectors serves as the fitness function for selecting side effect machines. A parameter study is performed on simple synthetic data and then side effect machines are trained to classify two sets of biological sequences. The first set comprises two categories of HLA sequences from the human major histocompatibility complex. The second are positive and negative examples of human endogenous retroviral sequences taken from the human genome. The retroviral sequences are challenging but good results are obtained. The HLA data is classified with complete accuracy.","PeriodicalId":340637,"journal":{"name":"2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126568034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Applying neural networks to classify influenza virus antigenic types and hosts 应用神经网络对流感病毒抗原类型和宿主进行分类

2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology Pub Date : 2010-05-02 DOI: 10.1109/CIBCB.2010.5510726

P. Attaluri, Zhengxin Chen, G. Lu

{"title":"Applying neural networks to classify influenza virus antigenic types and hosts","authors":"P. Attaluri, Zhengxin Chen, G. Lu","doi":"10.1109/CIBCB.2010.5510726","DOIUrl":"https://doi.org/10.1109/CIBCB.2010.5510726","url":null,"abstract":"Influenza viruses continue to evolve rapidly and are responsible for seasonal epidemics and occasional, but catastrophic, pandemics. We recently demonstrated the use of decision tree and support vector machine methods in classifying pandemic swine flu viral strains with high accuracy. Here, we applied the technique of artificial neural networks for the prediction of important influenza virus antigenic types (H1, H3, and H5) and hosts (Human, Avian, and Swine), which fulfills a critical need for a computational system for influenza surveillance. A comprehensive experiment on different k-mers and different binary encoding types showed classification based upon frequencies of k-mer nucleotide strings performed better than transformed binary data of nucleotides. It has been found for the first time that the accuracy of virus classification varies from host to host and from gene segment to gene segment. In particular, compared to avian and swine viruses, human influenza viruses can be classified with high accuracy, which indicates influenza virus strains might have become well adapted to their human host and hence less variation occurs in human viruses. In addition, the accuracy of host classification varies from genome segment to segment, achieving the highest values when using the HA and NA segments for human host classification. This research, along with our previous studies, shows machine learning techniques play an indispensable role in virus classification.","PeriodicalId":340637,"journal":{"name":"2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128198855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 17

Machine learning approaches for customized docking scores: Modeling of inhibition of Mycobacterium tuberculosis enoyl acyl carrier protein reductase 定制对接分数的机器学习方法:结核分枝杆菌烯酰酰基载体蛋白还原酶抑制的建模

2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology Pub Date : 2010-05-02 DOI: 10.1109/CIBCB.2010.5510700

G. Fogel, Jonathan Tran, Stephen Johnson, David Hecht

引用次数: 6

A fitness-independent evolvability measure for evolutionary developmental systems 进化发展系统的适应度独立的可进化性度量

2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology Pub Date : 2010-05-02 DOI: 10.1109/CIBCB.2010.5510475

Yaochu Jin, J. Trommler

引用次数: 12

Exploring structural modeling of proteins for kernel-based enzyme discrimination 探索基于核酶识别的蛋白质结构建模

2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology Pub Date : 2010-05-02 DOI: 10.1109/CIBCB.2010.5510588

Marco A. Alvarez, Changhui Yan

引用次数: 3