{"title":"Flow Model of the Protein-protein Interaction Network for Finding Credible Interactions","authors":"Kinya Okada, K. Asai, Masanori Arita","doi":"10.1142/9781860947995_0034","DOIUrl":"https://doi.org/10.1142/9781860947995_0034","url":null,"abstract":"Large-scale protein-protein interactions (PPIs) detected by yeast-two-hybrid (Y2H) systems are known to contain many false positives. The separation of credible interactions from background noise is still an unavoidable task. In the present study, we propose the relative reliability score for PPI as an intrinsic characteristic of global topology in the PPI networks. Our score is calculated as the dominant eigenvector of an adjacency matrix and represents the steady state of the network flow. By using this reliability score as a cut-off threshold from noisy Y2H PPI data, the credible interactions were extracted with better or comparable performance of previously proposed methods which were also based on the network topology. The result suggests that the application of the network-flow model to PPI data is useful for extracting credible interactions from noisy experimental data.","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":"27 1","pages":"317-326"},"PeriodicalIF":0.0,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74766164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fast Structural Similarity Search Based on Topology String Matching","authors":"Sung-Hee Park, D. Gilbert, K. Ryu","doi":"10.1142/9781860947995_0036","DOIUrl":"https://doi.org/10.1142/9781860947995_0036","url":null,"abstract":"We describe an abstract data model of protein structures by representing the geometry of proteins using spatial data types and present a framework for fast structural similarity search based on the matching of topology strings using bipartite graph matching. The system has been implemented on top of the Oracle 9i spatial database management system. The performance evaluation was conducted on 36 proteins from the Chew and Kedem data set and also on a subset of the PDB40. Our method performs well in terms of the quality of matching whilst having the advantage of fast execution and being able to compute similarity search in polynomial time. Thus, this work shows that the pre-computed string representation of topological properties between secondary structure elements using spatial relationships of spatial database management system is practical for fast structural similarity search.","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":"4 1","pages":"341-351"},"PeriodicalIF":0.0,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88494807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Supper, H. Fröhlich, C. Spieth, Andreas Dräger, A. Zell
{"title":"Inferring Gene Regulatory Networks by Machine Learning Methods","authors":"J. Supper, H. Fröhlich, C. Spieth, Andreas Dräger, A. Zell","doi":"10.1142/9781860947995_0027","DOIUrl":"https://doi.org/10.1142/9781860947995_0027","url":null,"abstract":"The ability to measure the transcriptional response after a stimulus has drawn much attention to the underlying gene regulatory networks. Several machine learning related methods, such as Bayesian networks and decision trees, have been proposed to deal with this difficult problem, but rarely a systematic comparison between different algorithms has been performed. In this work, we critically evaluate the application of multiple linear regression, SVMs, decision trees and Bayesian networks to reconstruct the budding yeast cell cycle network. The performance of these methods is assessed by comparing the topology of the reconstructed models to a validation network. This validation network is defined a priori and each interaction is specified by at least one publication. We also investigate the quality of the network reconstruction if a varying amount of gene regulatory dependencies is provided a priori.","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":"28 1","pages":"247-256"},"PeriodicalIF":0.0,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87057540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"All Hits All The Time: Parameter Free Calculation of Seed Sensitivity","authors":"Denise Y. F. Mak, Gary Benson","doi":"10.1142/9781860947995_0035","DOIUrl":"https://doi.org/10.1142/9781860947995_0035","url":null,"abstract":"Standard search techniques for DNA repeats start by identifying seeds , that is, small matching words, that may inhabit larger repeats. Recent innovations in seed structure have led to the development of spacedseeds [8] andindel seeds [9] which are more sensitive than contiguous seeds (also known as k-mers, k-tuples, l-words, etc.). Evaluating seed s nsitivityrequires 1) specifying a homology model which describes types of alignments that can occur between two copies of a repeat, and 2) assigning probabilities to those alignments. Optimal seed selection is a resource intensive activity because essentially all alternative seeds must be tested [7]. Current methods require that the model and probability parameters be specified in advance. When the parameters change, the entire calculation has to be rerun. In this paper, we show how to eliminatethe need for prior parameter specification. The ideas presented follow from a simple observation: given a homology model, the alignments hit by a particular seed remain the same regardless of the probability parameters. Only the weights assigned to those alignments change. Therefore, if we know all the hits, we can easily (and quickly) find optimal seeds. We describe a highly efficient preprocessing step, which is computed just oncefor each seed. In this calculation, strings which represent possible alignments are unweightedby any probability parameters. Then we show several increasingly efficient methods to find the optimal seed when given specific probability parameters. Indeed, we show how to determine exactly which seeds can never be optimal under any set of probability parameters. This leads to the startling observation that out of thousands of seeds, only a handful have any chance of being optimal. We then show how to find optimal seeds and the boundaries within probability space where they are optimal. We expect this method to greatly facilitate the study of seed space sensitivity, construction of multiple seed sets, and the use of alternative definitions of optimality.","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":"31 1","pages":"327-340"},"PeriodicalIF":0.0,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77724482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Semi-supervised Pattern Learning for Extracting Relations from Bioscience Texts","authors":"Shilin Ding, Minlie Huang, Xiaoyan Zhu","doi":"10.1142/9781860947995_0033","DOIUrl":"https://doi.org/10.1142/9781860947995_0033","url":null,"abstract":"A variety of pattern-based methods have been exploited to extract biological relations from literatures. Many of them require significant domain-specific knowledge to build the patterns by hand, or a large amount of labeled data to learn the patterns automatically. In this paper, a semisupervised model is presented to combine both unlabeled and labeled data for the pattern learning procedure. First, a large amount of unlabeled data is used to generate a raw pattern set. Then it is refined in the evaluating phase by incorporating the domain knowledge provided by a relatively small labeled data. Comparative results show that labeled data, when used in conjunction with the inexpensive unlabeled data, can considerably improve the learning accuracy.","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":"25 1","pages":"307-316"},"PeriodicalIF":0.0,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88422880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exploring Genomes of Distantly Related Mammals","authors":"J. Graves","doi":"10.1142/9781860947995_0001","DOIUrl":"https://doi.org/10.1142/9781860947995_0001","url":null,"abstract":"There are three groups of extant mammals, two of which abound in Australia. Marsupials (kangaroos and their relatives) and monotremes (echidna and the fabulous platypus) have been evolving independently for most of mammalian history. The genomes of marsupial and monotreme mammals are particularly valuable because these alternative mammals fill a phylogenetic gap in vertebrate species lined up for exhaustive genomic study. Human and mice (∼70MY) are too close to distinguish signal, whereas mammal/bird comparisons (∼310MY) are too distant to allow alignment. Kangaroos (180 MY) and platypus (210 MY) are just right. Sequence has diverged sufficiently for stringent detection of homologies that can reveal coding regions and regulatory signals. Importantly, marsupials and monotremes share with humans many mammal-specific developmental pathways and regulatory systems such as sex determination, lactation and X chromosome inactivation. The ARC Centre for Kangaroo Genomics is characterizing the genome of the model Australian kangaroo Macropus eugenii (the tammar wallaby), which is being sequenced by AGRF in Australia, and Baylor (funded by NIH) in the US. We are developing detailed physical and linkage maps of the genome to complement sequencing, and will prepare and array cDNAs for functional studies, especially of reproduction and development. Complete sequencing of the distantly related Brazilian short-tailed opossum Monodelphis domestica by the NIH allows us to compare distantly related marsupials. Sequencing of the genome of the platypus, Ornithorhynchus anatinus by Washington University (funded by the NIH) is complete, and our lab is anchoring contigs to the physical map. We have isolated and completely characterized many BACs and cDNAs containing kangaroo and platypus genes of interest, and demonstrate the value of comparisons to reveal conserved genome organization and function, and new insights in the evolution of the mammalian genome, particularly sex chromosomes.","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":"2 1","pages":"1"},"PeriodicalIF":0.0,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78925367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deriving Protein Structure Topology from the Helix Skeletion in Low Resolution Density Map using Rosetta","authors":"Y. Lu, Jing He, C. Strauss","doi":"10.1142/9781860947995_0017","DOIUrl":"https://doi.org/10.1142/9781860947995_0017","url":null,"abstract":"Electron cryo-microscopy (cryo-EM) is an experimental technique to determine the 3-dimensional structure for large protein complexes. Currently this technique is able to generate protein density maps at 6 to 9 A resolution. Although secondary structures such as α-helix and β-sheet can be visualized from these maps, there is no mature approach to deduce their tertiary topology, the linear order of the secondary structures on the sequence. The problem is challenging because given N secondary structure elements, the number of possible orders is (2)*N!. We have developed a method to predict the topology of the secondary structures using ab initio structure prediction. The Rosetta structure prediction algorithm was used to make purely sequence based structure predictions for the protein. We produced 1000 of these ab initio models, and then screened the models produced by Rosetta for agreement with the helix skeleton derived from the density map. The method was benchmarked on 60 mainly alpha helical proteins, finding that for about 3/4 of all the proteins, the majority of the helices in the skeleton were correctly assigned by one of the top 10 suggested topologies from the method, while for about 1/3 of all the proteins the best topology assignment without errors was ranked the first. This approach also provides an estimate of the sequence alignment of the skeleton. For most of those true-positive assignments, the alignment was accurate to within +/2 amino acids in the sequence.","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":"192 1","pages":"143-151"},"PeriodicalIF":0.0,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83024073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Combining N-grams and Alignment in G-protein Coupling Specificity Prediction","authors":"B. Cheng, J. Carbonell","doi":"10.1142/9781860947995_0038","DOIUrl":"https://doi.org/10.1142/9781860947995_0038","url":null,"abstract":"G-protein coupled receptors (GPCR) interact with G-proteins to regulate much of the cell’s response to external stimuli; abnormalities in which cause numerous diseases. We developed a new method to predict the families of G-proteins with which it interacts, given its residue sequence. We combine both alignment and n-gram features. The former captures long-range interactions but assumes the linear ordering of conserved segments is preserved. The latter makes no such assumption but cannot capture long-range interactions. By combining alignment and n-gram features, and using the entire GPCR sequence (instead of intracellular regions alone, as was done by others), our method outperformed the current state-of-the-art in precision, recall and F1, attaining 0.753 in F1 and 0.796 in accuracy on the PTbase 2004 dataset. Moreover, analysis of our results shows that the majority of coupling specificity information lies in the beginning of the 2nd intracellular loop and over the length of the 3rd.","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":"197 1","pages":"363-372"},"PeriodicalIF":0.0,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73245716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Stissing, Christian N. S. Pedersen, T. Mailund, G. Brodal, Rolf Fagerberg
{"title":"Computing the Quartet Distance Between Evolutionary Trees of Bounded Degree","authors":"M. Stissing, Christian N. S. Pedersen, T. Mailund, G. Brodal, Rolf Fagerberg","doi":"10.1142/9781860947995_0013","DOIUrl":"https://doi.org/10.1142/9781860947995_0013","url":null,"abstract":"We present an algorithm for calculating the quartet distance between two evolutionary trees of bounded degree on a common set of n species. The previous best algorithm has running time O(d2n2) when considering trees, where no node is of more than degree d. The algorithm developed herein has running time O(d9n logn)) which makes it the first algorithm for computing the quartet distance between non-binary trees which has a sub-quadratic worst case running time.","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":"86 1","pages":"101-110"},"PeriodicalIF":0.0,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73012520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Protein Structure-Structure Alignment with Discrete Fr'echet Distance","authors":"Minghui Jiang, Ying Xu, B. Zhu","doi":"10.1142/9781860947995_0016","DOIUrl":"https://doi.org/10.1142/9781860947995_0016","url":null,"abstract":"","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":"34 1","pages":"131-141"},"PeriodicalIF":0.0,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79027320","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}