2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology最新文献_第4页

Supervised learning of maternal cigarette-smoking signatures from placental gene expression data: A case study 胎盘基因表达数据中母体吸烟特征的监督学习:一个案例研究

2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology Pub Date : 2010-05-02 DOI: 10.1109/CIBCB.2010.5510587

Chengpeng Bi, C. Vyhlidal, J. Leeder

引用次数: 3

Sequence transformation to a complex signature form for consistent phylogenetic tree using Extensible Markov Model 基于可扩展马尔可夫模型的一致系统发育树的复签名形式序列转换

2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology Pub Date : 2010-05-02 DOI: 10.1109/CIBCB.2010.5510472

Rao M. Kotamarti, Michael Hahsler, Douglas W. Raiford, M. Dunham

{"title":"Sequence transformation to a complex signature form for consistent phylogenetic tree using Extensible Markov Model","authors":"Rao M. Kotamarti, Michael Hahsler, Douglas W. Raiford, M. Dunham","doi":"10.1109/CIBCB.2010.5510472","DOIUrl":"https://doi.org/10.1109/CIBCB.2010.5510472","url":null,"abstract":"Phylogenetic tree analysis using molecular sequences continues to expand beyond the 16S rRNA marker. By addressing the multi-copy issue known as the intra-heterogeneity, this paper restores the focus in using the 16S rRNA marker. Through use of a novel learning and model building algorithm, the multiple gene copies are integrated into a compact complex signature using the Extensible Markov Model (EMM). The method clusters related sequence segments while preserving their inherent order to create an EMM signature for a mi-crobial organism. A library of EMM signatures is generated from which samples are drawn for phylogenetic analysis. By matching the components of two signatures, referred to as quasi-alignment, the differences are highlighted and scored. Scoring quasi-alignments is done using adapted Karlin-Altschul statistics to compute a novel distance metric. The metric satisfies conditions of identity, symmetry, triangular inequality and the four point rule required for a valid evolution distance metric. The resulting distance matrix is input to PHYologeny Inference Package (PHYLIP) to generate phylogenies using neighbor joining algorithms. Through control of clustering in signature creation, the diversity of similar organisms and their placement in the phylogeny is explained. The experiments include analysis of genus Burkholderia, a random microbial sample spanning several phyla and a diverse sample that includes RNA of Eukaryotic origin. The NCBI sequence data for 16S rRNA is used for validation.","PeriodicalId":340637,"journal":{"name":"2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127422405","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Detecting retroviruses using reading frame information and side effect machines 利用读框信息和副作用机检测逆转录病毒

2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology Pub Date : 2010-05-02 DOI: 10.1109/CIBCB.2010.5510699

W. Ashlock, S. Datta

{"title":"Detecting retroviruses using reading frame information and side effect machines","authors":"W. Ashlock, S. Datta","doi":"10.1109/CIBCB.2010.5510699","DOIUrl":"https://doi.org/10.1109/CIBCB.2010.5510699","url":null,"abstract":"This paper addresses the problem of distinguishing retroviruses from non-coding DNA sequences. Retroviruses have a distinctive reading frame structure that includes multiple reading frames that often overlap. This paper uses reading frame information generated from Fourier spectral analysis as input for Side Effect Machines (SEMs) that are evolved to create clusterings which separate the two types of sequences. The output from these SEMs is then used to train Support Vector Machines (SVMs) to perform the classification. The best classifier out of 100 replicates achieves 100% accuracy using complete retroviral genomes and the average classifier achieves 85% accuracy. Using endogenous retroviral data that includes many mutations, the best classifier achieves 86% accuracy; the average achieves an accuracy of 71%. The method also was able to distinguish lentiviruses from other types of retroviruses with a best accuracy of 100% (average 93%). In order to better understand the evolved SEMs, classifiers trained on SEMs evolved using endogenous retroviral data were used to classify the complete unmutated retroviral genomes and vice versa. It was found that, regardless of which type of data was used to create the classifiers, their performance on the test data sets was similar. This suggests that SEMs are able to extract the distinctive retroviral reading frame structure from the Fourier spectra, but that in some of the endogenous retroviruses in our data set there were too many mutations for this structure to be discernable from the data using this method.","PeriodicalId":340637,"journal":{"name":"2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology","volume":"28 12","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114028142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Classification of HIV-1 protease crystal structures using Random Forest, linear discriminant analysis and logistic regression 利用随机森林、线性判别分析和逻辑回归对HIV-1蛋白酶晶体结构进行分类

2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology Pub Date : 2010-05-02 DOI: 10.1109/CIBCB.2010.5510465

Gene M. Ko, A. Reddy, Sunil Kumar, S. A. Bailey, R. Garg

{"title":"Classification of HIV-1 protease crystal structures using Random Forest, linear discriminant analysis and logistic regression","authors":"Gene M. Ko, A. Reddy, Sunil Kumar, S. A. Bailey, R. Garg","doi":"10.1109/CIBCB.2010.5510465","DOIUrl":"https://doi.org/10.1109/CIBCB.2010.5510465","url":null,"abstract":"The present study develops a classification model to correlate the binding pockets of 70 HIV-1 protease crystal structures in terms of their structural descriptors to their complexed HIV-1 protease inhibitors. The Random Forest classification model is used to reduce the chemical descriptor space from 456 to the 12 most relevant descriptors based on the Gini importance measure. The selected 12 descriptors are then used to develop classification models using linear discriminant analysis (LDA) and logistic regression (LR). The top eight descriptors were found to produce the best LDA model with an overall error of 30% and a leave-one-out cross validation error of 44.29%, while the top five descriptors were found to produce the best LR model with an overall error of 28.57% and a leave-one-out cross validation error of 41.43%. Hierarchical clustering was performed on the top five and eight descriptors to verify whether the descriptor selection of Random Forest can group together the binding pockets based on their complexed ligands. The selected descriptors would play a crucial role in understanding the HIV-1 protease binding pocket structure in terms of its chemical descriptors.","PeriodicalId":340637,"journal":{"name":"2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology","volume":"22 6","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114110581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Expanded study of efn2 thermodynamic model performance on RnaPredict, an evolutionary algorithm for RNA folding RNA折叠进化算法rnappredict上efn2热力学模型性能的扩展研究

2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology Pub Date : 2010-05-02 DOI: 10.1109/CIBCB.2010.5510321

K. Wiese, A. Hendriks

引用次数: 1

New approaches to clustering microarray time-series data using multiple expression profile alignment 利用多表达谱对齐聚类微阵列时间序列数据的新方法

2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology Pub Date : 2010-05-02 DOI: 10.1109/CIBCB.2010.5510385

N. Subhani, L. Rueda, A. Ngom, C. J. Burden

引用次数: 1

Predicting chemical activities from structures by attributed molecular graph classification 利用分子图分类预测结构的化学活性

2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology Pub Date : 2010-05-02 DOI: 10.1109/CIBCB.2010.5510690

Qian Xu, Derek Hao Hu, H. Xue, Qiang Yang

{"title":"Predicting chemical activities from structures by attributed molecular graph classification","authors":"Qian Xu, Derek Hao Hu, H. Xue, Qiang Yang","doi":"10.1109/CIBCB.2010.5510690","DOIUrl":"https://doi.org/10.1109/CIBCB.2010.5510690","url":null,"abstract":"Designing Quantitative Structure-Activity Relationship (QSAR) models has been a recurrent research interest for biologists and computer scientists. An example is to predict the toxicity of chemical compounds using their structural properties as features represented by graphs. A popular method to classify these graphs is to exploit classifiers such as support vector machines (SVMs) and graph kernels to incorporate the sequential, structural and chemical information. Previous works have focused on designing specific graph kernels for this task, amongst which graph alignment kernels are one of the most popular approach. Graph alignment kernels align the nodes of one graph to the nodes of the second graph so that the total overall similarity is maximized with respect to all possible alignments. However, taking both vertex and edge similarities into account makes the problem NP-Hard. In this paper, we present a novel general graph-matching based method for QSAR. We view the problem of calculating optimal assignments of two attributed graphs from a different perspective. Instead of first designing an atom kernel function and a bond kernel function, we first provide a training set of pairs of graphs with their corresponding matchings. We then try to learn the compatibility function over atoms and use only the atom kernel function to compute graph matchings. Our algorithm has the advantage of being more general and yet efficient than previous approaches for the QSAR problem. We evaluate our method on a set of chemical structure-activity prediction benchmark datasets, and show that our algorithm can achieve better or comparable accuracies over the optimal assignment kernel method.","PeriodicalId":340637,"journal":{"name":"2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128173322","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Modular clustering of protein-protein interaction networks 蛋白质-蛋白质相互作用网络的模块化聚类

2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology Pub Date : 2010-05-02 DOI: 10.1109/CIBCB.2010.5510590

Nassim Sohaee, C. Forst

引用次数: 3

Computation intelligence method to find generic non-coding RNA search models 通用非编码RNA搜索模型的计算智能方法

2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology Pub Date : 2010-05-02 DOI: 10.1109/CIBCB.2010.5510341

Jennifer A. Smith

引用次数: 2

Simulation of oscillatory dynamics of blood testosterone levels using the crossover method 用交叉法模拟血睾酮水平的振荡动力学

2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology Pub Date : 2010-05-02 DOI: 10.1109/CIBCB.2010.5510490

A. Sabnis, R. Harrison

引用次数: 2