{"title":"Intrinsic Disorder and Prote in Modifications: Building an SVM Predictor for Methylation","authors":"Kenneth Daily, P. Radivojac, A. Dunker","doi":"10.1109/CIBCB.2005.1594957","DOIUrl":"https://doi.org/10.1109/CIBCB.2005.1594957","url":null,"abstract":"Post-translational prote in modifications play an important role in many protein path ways and interactions. It has been hypothesized that modifications to prote insoccur in regions that are easily accessible, and many have been determined to belocated with in intrinsically disordered regions. However, identifying precise locations of prote in modifications involve sex pensive and time consuming laboratory work. Thus, automated identification of these sites is helpful. This paper studies methylated proteins and describes methods of building a predictor for arginine and lysine methylation sites using support vector machines. Our results indicate that, based on current data, both arginine and lysine methylation sites are likely to be intrinsically disordered and that the accuracies of methylation site predictions are high enough to be useful for prote in screening and for testing biological hypotheses.","PeriodicalId":330810,"journal":{"name":"2005 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130410703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Selection Of Genetically Diverse Recombinant Inbreds With An Ordered Gene Evolutionary Algorithm","authors":"D. Ashlock, Ruth Swanson, P. Schnable","doi":"10.1109/CIBCB.2005.1594923","DOIUrl":"https://doi.org/10.1109/CIBCB.2005.1594923","url":null,"abstract":"Recombinant inbreds are created by crossing two genetically distinct inbred lines and then inbreeding the resulting progeny multiple times. They are used to estimate associations of genes by co-inheritance of alleles from the two parent inbred types in the recombinant inbreds derived from the cross in a process called genetic mapping. Typically the recombinant inbred lines used in a genetic mapping study are relatively well studied and so they are natural choices for microarray, proteomic, and metabolomic studies. These are quite costly and so typically use fewer individuals than are used in most genetic mapping studies. An evolutionary algorithm for selecting a subset of a collection of recombinant inbred lines with maximum genetic diversity in their mapping characters is described. The evolutionary algorithm is an ordered-gene algorithm with the first k genes in the ordered selection taken to be the subset. Ordered genes are a convenient representation for subset selection. It is found that the problem is not difficult and that in a well mixed mapping population of recombinant inbreds the marginal increase in diversity obtained by evolutionary optimization is small but significant. In order to better understand the problem, synthetic data are also examined and suggest that the problem is easy in general, not only in the specific biological cases used. Recombinant inbreds are created by crossing two genetically distinct inbred lines and then inbreeding the resulting progeny multiple times.","PeriodicalId":330810,"journal":{"name":"2005 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123647081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Evolutionary Granular Kernel Trees and Applications in Drug Activity Comparisons","authors":"Bo Jin, Yanqing Zhang, Binghe Wang","doi":"10.1109/CIBCB.2005.1594907","DOIUrl":"https://doi.org/10.1109/CIBCB.2005.1594907","url":null,"abstract":"Kernel methods, specifically support vector machines (SVMs), have been widely used in many fields for data classification and pattern recognition. The performance of SVMs is mainly affected by kernel functions. With the growing interest of biological data prediction and chemical data prediction such as structure-property based molecule comparison, protein structure prediction and long DNA sequence comparison, more powerful and flexible kernels need to be designed in order effectively to express the prior knowledge and relationships within each data item. In this paper, the granular kernel concept is presented and related properties are described in detail. A hierarchical kernel design method is proposed to construct granular kernel trees (GKTs). For a particular problem, genetic algorithms (GAs) are used to find the optimum parameter settings of GKTs. In applications, SVMs with new kernel trees are employed for the comparisons of drug activities. The experimental results show that SVMs with GKTs and evolutionary GKTs can achieve better performances than SVMs with traditional RBF kernels in terms of prediction accuracy.","PeriodicalId":330810,"journal":{"name":"2005 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131128708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
C. Spieth, F. Streichert, N. Speer, C. Sinzger, Kathrin Eberhard, A. Zell
{"title":"Predicting Single Genes Related to Immune-Relevant Processes","authors":"C. Spieth, F. Streichert, N. Speer, C. Sinzger, Kathrin Eberhard, A. Zell","doi":"10.1109/CIBCB.2005.1594955","DOIUrl":"https://doi.org/10.1109/CIBCB.2005.1594955","url":null,"abstract":"In this paper we address the problem of predicting gene activities by finding gene regulatory dependencies in experimental DNA microarray data. Only few approaches to infer the dependencies of complete gene interconnectivity networks can be found in the literature. Due to the limited number of available data, the inferring problem is under-determined and ambiguous. Therefore, we introduce a new algorithm to infer relationships only between selected genes and the unknown gene network. This method is able to predict gene activation by mathematical modeling of the network and its simulation. The parameters of the mathematical model are determined by optimization with evolutionary algorithms. In this paper we will show that our approach is able to correctly predict gene responses in immune related regulatory processes and correctly identify some of the true genomic relationships of these genes.","PeriodicalId":330810,"journal":{"name":"2005 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128597902","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Biclustering of Gene Expression Data Using Genetic Algorithm","authors":"Anupam Chakraborty, Hitashyam Maka","doi":"10.1109/CIBCB.2005.1594893","DOIUrl":"https://doi.org/10.1109/CIBCB.2005.1594893","url":null,"abstract":"The biclustering problem of gene expression data deals with finding a subset of genes which exhibit similar expression patterns along a subset of conditions. Most of the current algorithms use a statistically predefined threshold as an input parameter for biclustering. This threshold defines the maximum allowable dissimilarity between the cells of a bicluster and is very hard to determine beforehand. Hence we propose two genetic algorithms that embed greedy algorithm as local search procedure and find the best biclusters independent of this threshold score. We also establish that the HScore of a bicluster under the additive model approximately follows chi-square distribution. We found that these genetic algorithms outperformed other greedy algorithms on yeast and lymphoma datasets.","PeriodicalId":330810,"journal":{"name":"2005 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132860832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-Class Protein Subcellular Localization Prediction using Support Vector Machines","authors":"Peng Wai Meng, Jagath Rajapakse","doi":"10.1109/CIBCB.2005.1594964","DOIUrl":"https://doi.org/10.1109/CIBCB.2005.1594964","url":null,"abstract":"Prediction of protein subcellular localization from amino acid sequence is an important step towards elucidating the function of a protein. Here, we present an approach for predicting protein subcellular localizations from eukaryotic sequences using Support Vector Machines. Apart from using amino acid compositions, our prediction approach also considers biochemical characteristics of amino acids and their distribution patterns along the primary sequence of the query proteins. Consequently, improved predictive accuracy has been achieved on the Reinhardt and Hubbard’s dataset. For the four subcellular localizations of eukaryotic proteins, the total prediction accuracy obtained using the “ leave-one-out” cross-validation test is 88.88%. To the best of our knowledge, our approach obtained by far the best prediction accuracy for mitochondrial proteins, which are notoriously difficult to predict among eukaryotic proteins. Performance comparison results also showed that our approach outperformed existing protein subcellular localization prediction methods based solely on amino acid composition.","PeriodicalId":330810,"journal":{"name":"2005 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129290266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
C. Spieth, F. Streichert, J. Supper, N. Speer, A. Zell
{"title":"Feedback Memetic Algorithms for Modeling Gene Regulatory Networks","authors":"C. Spieth, F. Streichert, J. Supper, N. Speer, A. Zell","doi":"10.1109/CIBCB.2005.1594899","DOIUrl":"https://doi.org/10.1109/CIBCB.2005.1594899","url":null,"abstract":"In this paper we address the problem of finding gene regulatory networks from experimental DNA microarray data. We focus on the evaluation of the performance of memetic algorithms on the inference problem. These algorithms are used to evolve an underlying quantitative mathematical model. The dynamics of the regulatory system are modeled with two commonly used approaches, namely linear weight matrices and S-systems. Due to the complexity of the inference problem, some researchers suggested evolutionary algorithms for this purpose. We introduce memetic enhancements to this optimization process to infer the parameters of sparsely connected nonlinear systems from the observed data. Due to the limited number of available data, the inferring problem is underdetermined and ambiguous. Further on, the problem often is multimodal and therefore appropriate optimization strategies become necessary. We propose a memetic method, which separates the overall inference problem into two subproblems to find the correct network: first, the search for a valid topology, and secondly, the optimization of the parameters of the mathematical model. The performance and the properties of the proposed methods are evaluated and compared to standard algorithms found in the literature.","PeriodicalId":330810,"journal":{"name":"2005 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131820273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multiple Sequence Alignment Containing a Sequence of Regular Expressions","authors":"Abdullah N. Arslan","doi":"10.1109/CIBCB.2005.1594922","DOIUrl":"https://doi.org/10.1109/CIBCB.2005.1594922","url":null,"abstract":"A classical algorithm for the pairwise sequence alignment is the Smith Waterman algorithm which uses dynamic programming. The algorithm computes the maximum score of alignments that use insertions, deletions, and substitutions, with no consideration given in composition of the alignments. However, biologists favor applying their knowledge about common structures or functions into the alignment process. For alignment of protein sequences, several methods have been suggested for taking into account the motifs (a restricted regular expression) from the PROSITE database to guide alignments. One method modifies the Smith Waterman dynamic programming solution to reward alignments that contain matching motifs. Another method introduces the regular expression constrained sequence alignment problem in which pairwise alignments are constrained to contain a given regular expression. This latter method constructs a weighted finite automaton from a given regular expression, and presents a dynamic programming solution that simulates copies of this automaton in seeking an alignment with maximum score containing the regular expression. We generalize this approach: 1) We introduce a variation of the problem for multiple sequences, namely the regular expression constrained multiple sequence alignment, and present an algorithm for it; 2) We develop an algorithm for the case of the problem when the alignments sought are required to contain a given sequence of regular expressions.","PeriodicalId":330810,"journal":{"name":"2005 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134305203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Razvan Andonie, L. Fabry-Asztalos, Catharine Collar, Sarah Abdul-Wahid, N. Salim
{"title":"Neuro-fuzzy Prediction of Biological Activity and Rule Extraction for HIV-1 Protease Inhibitors","authors":"Razvan Andonie, L. Fabry-Asztalos, Catharine Collar, Sarah Abdul-Wahid, N. Salim","doi":"10.1109/CIBCB.2005.1594906","DOIUrl":"https://doi.org/10.1109/CIBCB.2005.1594906","url":null,"abstract":"A fuzzy neural network (FNN) and multiple linear regression (MLR) were used to predict biological activities of 26 newly designed HIV-1 protease potential inhibitory compounds. Molecular descriptors of 151 known inhibitors were used to train and test the FNN and to develop MLR models. The predictive ability of these two models was investigated and compared. We found the predictive ability of the FNN to be generally superior to that of MLR. The fuzzy IF/THEN rules were extracted from the trained network. These rules map chemical structure descriptors to predicted inhibitory values. The obtained rules can be used to analyze the influence of descriptors. Our results indicate that FNN and fuzzy IF/THEN rules are powerful modeling tools for QSAR studies.","PeriodicalId":330810,"journal":{"name":"2005 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology","volume":"121 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114522839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"LVQ Approach Using AA Indices for Protein Subcellular Localisation Prediction","authors":"Kok-Sin Toh, M. N. Nguyen, Jagath Rajapakse","doi":"10.1109/CIBCB.2005.1594932","DOIUrl":"https://doi.org/10.1109/CIBCB.2005.1594932","url":null,"abstract":"Knowledge of subcellular localisation of proteins is important in determining their function and involvement in different pathways. A wide variety of methods has been proposed over the recent years in order to predict the subcellular localisation of proteins, mainly based on amino acid composition or single sequence inputs. We propose a Learning Vector Quantization (LVQ) method for protein subcellular localisation prediction based on N-terminal sorting signals by using the information derived from Amino Acid (AA) index database. The LVQ approach achieved overall prediction accuracies of 84.7% for 2427 eukaryotic protein sequences on Reinhardt and Hubbard dataset and upto 86.8% on the non-plant (eukaryotes) dataset of 2738 sequences from the TargetP website, which are comparable or better than the results of existing prediction methods.","PeriodicalId":330810,"journal":{"name":"2005 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114590036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}