2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology最新文献

筛选
英文 中文
Supervised learning of maternal cigarette-smoking signatures from placental gene expression data: A case study 胎盘基因表达数据中母体吸烟特征的监督学习:一个案例研究
Chengpeng Bi, C. Vyhlidal, J. Leeder
{"title":"Supervised learning of maternal cigarette-smoking signatures from placental gene expression data: A case study","authors":"Chengpeng Bi, C. Vyhlidal, J. Leeder","doi":"10.1109/CIBCB.2010.5510587","DOIUrl":"https://doi.org/10.1109/CIBCB.2010.5510587","url":null,"abstract":"This paper aims to conduct supervised learning of the cigarette-smoking signatures from the placental gene expression data sets under the neural network framework and build classifiers to identify the cigarette-smoking moms during pregnancy. First, a unified model for gene selection is proposed to single out a set of informative gene sets (up-or down-regulated genes). The selected signature gene sets are subject to refinement, and then so refined informative gene sets are fed into three supervised statistical learning algorithms, linear discriminant function (LDF), probabilistic neural network (PNN) and support vector machine (SVM) for training and testing. It shows that SVM is the best classifier in predicting the cigarette-smoking moms compared to other methods tested.","PeriodicalId":340637,"journal":{"name":"2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122713283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Sequence transformation to a complex signature form for consistent phylogenetic tree using Extensible Markov Model 基于可扩展马尔可夫模型的一致系统发育树的复签名形式序列转换
Rao M. Kotamarti, Michael Hahsler, Douglas W. Raiford, M. Dunham
{"title":"Sequence transformation to a complex signature form for consistent phylogenetic tree using Extensible Markov Model","authors":"Rao M. Kotamarti, Michael Hahsler, Douglas W. Raiford, M. Dunham","doi":"10.1109/CIBCB.2010.5510472","DOIUrl":"https://doi.org/10.1109/CIBCB.2010.5510472","url":null,"abstract":"Phylogenetic tree analysis using molecular sequences continues to expand beyond the 16S rRNA marker. By addressing the multi-copy issue known as the intra-heterogeneity, this paper restores the focus in using the 16S rRNA marker. Through use of a novel learning and model building algorithm, the multiple gene copies are integrated into a compact complex signature using the Extensible Markov Model (EMM). The method clusters related sequence segments while preserving their inherent order to create an EMM signature for a mi-crobial organism. A library of EMM signatures is generated from which samples are drawn for phylogenetic analysis. By matching the components of two signatures, referred to as quasi-alignment, the differences are highlighted and scored. Scoring quasi-alignments is done using adapted Karlin-Altschul statistics to compute a novel distance metric. The metric satisfies conditions of identity, symmetry, triangular inequality and the four point rule required for a valid evolution distance metric. The resulting distance matrix is input to PHYologeny Inference Package (PHYLIP) to generate phylogenies using neighbor joining algorithms. Through control of clustering in signature creation, the diversity of similar organisms and their placement in the phylogeny is explained. The experiments include analysis of genus Burkholderia, a random microbial sample spanning several phyla and a diverse sample that includes RNA of Eukaryotic origin. The NCBI sequence data for 16S rRNA is used for validation.","PeriodicalId":340637,"journal":{"name":"2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127422405","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Detecting retroviruses using reading frame information and side effect machines 利用读框信息和副作用机检测逆转录病毒
W. Ashlock, S. Datta
{"title":"Detecting retroviruses using reading frame information and side effect machines","authors":"W. Ashlock, S. Datta","doi":"10.1109/CIBCB.2010.5510699","DOIUrl":"https://doi.org/10.1109/CIBCB.2010.5510699","url":null,"abstract":"This paper addresses the problem of distinguishing retroviruses from non-coding DNA sequences. Retroviruses have a distinctive reading frame structure that includes multiple reading frames that often overlap. This paper uses reading frame information generated from Fourier spectral analysis as input for Side Effect Machines (SEMs) that are evolved to create clusterings which separate the two types of sequences. The output from these SEMs is then used to train Support Vector Machines (SVMs) to perform the classification. The best classifier out of 100 replicates achieves 100% accuracy using complete retroviral genomes and the average classifier achieves 85% accuracy. Using endogenous retroviral data that includes many mutations, the best classifier achieves 86% accuracy; the average achieves an accuracy of 71%. The method also was able to distinguish lentiviruses from other types of retroviruses with a best accuracy of 100% (average 93%). In order to better understand the evolved SEMs, classifiers trained on SEMs evolved using endogenous retroviral data were used to classify the complete unmutated retroviral genomes and vice versa. It was found that, regardless of which type of data was used to create the classifiers, their performance on the test data sets was similar. This suggests that SEMs are able to extract the distinctive retroviral reading frame structure from the Fourier spectra, but that in some of the endogenous retroviruses in our data set there were too many mutations for this structure to be discernable from the data using this method.","PeriodicalId":340637,"journal":{"name":"2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology","volume":"28 12","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114028142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Classification of HIV-1 protease crystal structures using Random Forest, linear discriminant analysis and logistic regression 利用随机森林、线性判别分析和逻辑回归对HIV-1蛋白酶晶体结构进行分类
Gene M. Ko, A. Reddy, Sunil Kumar, S. A. Bailey, R. Garg
{"title":"Classification of HIV-1 protease crystal structures using Random Forest, linear discriminant analysis and logistic regression","authors":"Gene M. Ko, A. Reddy, Sunil Kumar, S. A. Bailey, R. Garg","doi":"10.1109/CIBCB.2010.5510465","DOIUrl":"https://doi.org/10.1109/CIBCB.2010.5510465","url":null,"abstract":"The present study develops a classification model to correlate the binding pockets of 70 HIV-1 protease crystal structures in terms of their structural descriptors to their complexed HIV-1 protease inhibitors. The Random Forest classification model is used to reduce the chemical descriptor space from 456 to the 12 most relevant descriptors based on the Gini importance measure. The selected 12 descriptors are then used to develop classification models using linear discriminant analysis (LDA) and logistic regression (LR). The top eight descriptors were found to produce the best LDA model with an overall error of 30% and a leave-one-out cross validation error of 44.29%, while the top five descriptors were found to produce the best LR model with an overall error of 28.57% and a leave-one-out cross validation error of 41.43%. Hierarchical clustering was performed on the top five and eight descriptors to verify whether the descriptor selection of Random Forest can group together the binding pockets based on their complexed ligands. The selected descriptors would play a crucial role in understanding the HIV-1 protease binding pocket structure in terms of its chemical descriptors.","PeriodicalId":340637,"journal":{"name":"2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology","volume":"22 6","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114110581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Expanded study of efn2 thermodynamic model performance on RnaPredict, an evolutionary algorithm for RNA folding RNA折叠进化算法rnappredict上efn2热力学模型性能的扩展研究
K. Wiese, A. Hendriks
{"title":"Expanded study of efn2 thermodynamic model performance on RnaPredict, an evolutionary algorithm for RNA folding","authors":"K. Wiese, A. Hendriks","doi":"10.1109/CIBCB.2010.5510321","DOIUrl":"https://doi.org/10.1109/CIBCB.2010.5510321","url":null,"abstract":"The shape that organic molecules such as biopolymers form within organic systems largely determines the function said molecules perform. RNA is a biopolymer that plays a central part in several stages of protein synthesis, and also has structural, functional, and regulatory roles in the cell. In an ab initio case most common structure prediction techniques employ minimization of the free energy of a given RNA molecule via a thermodynamic model. RnaPredict is an evolutionary algorithm for RNA folding. This paper compares the performance of an advanced thermodynamic model, efn2, against the stacking-energy thermodynamic models INN and INN-HB on a test set containing 24 sequences from 4 rRNA subtypes. The prediction accuracy of efn2 is demonstrated on a majority of test sequences. A comparison is also made with the mfold prediction algorithm which demonstrated RnaPredict's comparable performance.","PeriodicalId":340637,"journal":{"name":"2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121114528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
New approaches to clustering microarray time-series data using multiple expression profile alignment 利用多表达谱对齐聚类微阵列时间序列数据的新方法
N. Subhani, L. Rueda, A. Ngom, C. J. Burden
{"title":"New approaches to clustering microarray time-series data using multiple expression profile alignment","authors":"N. Subhani, L. Rueda, A. Ngom, C. J. Burden","doi":"10.1109/CIBCB.2010.5510385","DOIUrl":"https://doi.org/10.1109/CIBCB.2010.5510385","url":null,"abstract":"An important process in functional genomic studies is clustering microarray time-series data, where genes with similar expression profiles are expected to be functionally related. Clustering microarray time-series data via pairwise alignment of piecewise linear profiles has been recently introduced. In this paper, we propose a clustering approach based on a multiple profile alignment of natural cubic spline and piecewise linear representations of gene expression profiles. We combine these multiple alignment approaches with k-means. We ran our methods on a well-known data set of pre-clustered Saccharomyces cerevisiae gene expression profiles and a data set of 3315 Pseudomonas aeruginosa expression profiles. We assessed the validity of the resulting clusters and applied a c-nearest neighbor classifier for evaluating the performance of our approaches, obtaining accuracies of 89.51% and 86.12% respectively, on Saccharomyces cerevisiae data, and 90.90% and 93.71% accuracies for cubic spline and piecewise linear respectively on Pseudomonas aeruginosa data.","PeriodicalId":340637,"journal":{"name":"2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122914323","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Predicting chemical activities from structures by attributed molecular graph classification 利用分子图分类预测结构的化学活性
Qian Xu, Derek Hao Hu, H. Xue, Qiang Yang
{"title":"Predicting chemical activities from structures by attributed molecular graph classification","authors":"Qian Xu, Derek Hao Hu, H. Xue, Qiang Yang","doi":"10.1109/CIBCB.2010.5510690","DOIUrl":"https://doi.org/10.1109/CIBCB.2010.5510690","url":null,"abstract":"Designing Quantitative Structure-Activity Relationship (QSAR) models has been a recurrent research interest for biologists and computer scientists. An example is to predict the toxicity of chemical compounds using their structural properties as features represented by graphs. A popular method to classify these graphs is to exploit classifiers such as support vector machines (SVMs) and graph kernels to incorporate the sequential, structural and chemical information. Previous works have focused on designing specific graph kernels for this task, amongst which graph alignment kernels are one of the most popular approach. Graph alignment kernels align the nodes of one graph to the nodes of the second graph so that the total overall similarity is maximized with respect to all possible alignments. However, taking both vertex and edge similarities into account makes the problem NP-Hard. In this paper, we present a novel general graph-matching based method for QSAR. We view the problem of calculating optimal assignments of two attributed graphs from a different perspective. Instead of first designing an atom kernel function and a bond kernel function, we first provide a training set of pairs of graphs with their corresponding matchings. We then try to learn the compatibility function over atoms and use only the atom kernel function to compute graph matchings. Our algorithm has the advantage of being more general and yet efficient than previous approaches for the QSAR problem. We evaluate our method on a set of chemical structure-activity prediction benchmark datasets, and show that our algorithm can achieve better or comparable accuracies over the optimal assignment kernel method.","PeriodicalId":340637,"journal":{"name":"2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128173322","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Modular clustering of protein-protein interaction networks 蛋白质-蛋白质相互作用网络的模块化聚类
Nassim Sohaee, C. Forst
{"title":"Modular clustering of protein-protein interaction networks","authors":"Nassim Sohaee, C. Forst","doi":"10.1109/CIBCB.2010.5510590","DOIUrl":"https://doi.org/10.1109/CIBCB.2010.5510590","url":null,"abstract":"Identifying the modular structures in proteinprotein interaction networks is crucial to the understanding of the organization and function of biological systems. In this paper we introduce the concept of critical module in a network and propose an efficient algorithm to find all critical modules in a given network. Finally we tested the proposed algorithm on Yeast protein interaction data set.","PeriodicalId":340637,"journal":{"name":"2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123015184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Computation intelligence method to find generic non-coding RNA search models 通用非编码RNA搜索模型的计算智能方法
Jennifer A. Smith
{"title":"Computation intelligence method to find generic non-coding RNA search models","authors":"Jennifer A. Smith","doi":"10.1109/CIBCB.2010.5510341","DOIUrl":"https://doi.org/10.1109/CIBCB.2010.5510341","url":null,"abstract":"Fairly effective methods exist for finding new non-coding RNA genes using search models based on known families of ncRNA genes (for example covariance models). However, these models only find new members of the existing families and are not useful in finding potential members of novel ncRNA families. Other problems with family-specific search include large processing requirements, ambiguity in defining which sequences form a family and lack of sufficient numbers of known sequences to properly estimate model parameters. An ncRNA search model is proposed which includes a collection of non-overlapping RNA hairpin structure covariance models. The hairpin models are chosen from a hairpin-model list compiled from many families in the Rfam non-coding RNA families database. The specific hairpin models included and the overall score threshold for the search model is determined through the use of a genetic algorithm.","PeriodicalId":340637,"journal":{"name":"2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124954170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Simulation of oscillatory dynamics of blood testosterone levels using the crossover method 用交叉法模拟血睾酮水平的振荡动力学
A. Sabnis, R. Harrison
{"title":"Simulation of oscillatory dynamics of blood testosterone levels using the crossover method","authors":"A. Sabnis, R. Harrison","doi":"10.1109/CIBCB.2010.5510490","DOIUrl":"https://doi.org/10.1109/CIBCB.2010.5510490","url":null,"abstract":"Blood testosterone levels oscillate periodically in humans. The in vivo dynamics of this biochemical system cannot be simulated in silico using a continuous deterministic solution of a previously reported mathematical model. The use of the stochastic simulation algorithm (SSA), however, has been reported to generate sustained oscillations that are qualitatively and quantitatively consistent with the experimental observations. Although the SSA is capable of accurately simulating a biochemical network, it is extremely inefficient from a computational standpoint. In this work, we have attempted to simulate the above mentioned model using a deterministic-stochastic crossover method, for three separate sets of parameters. Each time, not only did the results show the existence of sustained oscillations but also that the computational time was at least four times lower than the corresponding SSA solution. The crossover method can hence be proposed as a viable alternative to the SSA for simulating biochemical systems that are commonly encountered in systems biology applications.","PeriodicalId":340637,"journal":{"name":"2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology","volume":"107 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130469007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信