{"title":"A new clustering method for microarray data analysis","authors":"Louxin Zhang, Song Zhu","doi":"10.1109/CSB.2002.1039349","DOIUrl":"https://doi.org/10.1109/CSB.2002.1039349","url":null,"abstract":"A novel clustering approach is introduced to overcome missing data and inconsistency of gene expression levels under different conditions in the stage of clustering. It is based on the so-called smooth score, which is defined for measuring the deviation of the expression level of a gene and the average expression level of all the genes involved under a condition. We present an efficient greedy algorithm for finding clusters with a smooth score below a threshold after studying its computational complexity. The algorithm was tested intensively on random matrices and yeast data. It was shown to perform it well in finding co-regulation patterns in a test with the yeast data.","PeriodicalId":87204,"journal":{"name":"Proceedings. IEEE Computer Society Bioinformatics Conference","volume":"1 1","pages":"268-275"},"PeriodicalIF":0.0,"publicationDate":"2002-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/CSB.2002.1039349","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62214635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Suffix trees (and relatives) come of age in bioinformatics","authors":"D. Gusfield","doi":"10.1109/CSB.2002.1039321","DOIUrl":"https://doi.org/10.1109/CSB.2002.1039321","url":null,"abstract":"Summary form only given. In the past, several things limited the wider application of suffix trees: large memory requirements; limited locality of reference; the conceptual difficulty of the algorithms; and lack of available code; lack of general exposure in the bioinformatics community (and even the computer science community) to suffix trees. Much has changed since 1997. Suffix trees and close relatives are now widely taught in graduate level courses on computer algorithms and on bioinformatics; there are several good expositions on suffix tree algorithms and uses; the space requirements have been substantially reduced; machine memories have greatly increased; additional variants of suffix trees have been introduced that address some of their deficiencies; and suffix tree code is publicly available. As a result, and to some extent as a cause, there are now many more applications in bioinformatics of suffix trees and related data structures. The author addresses the wider uses of suffix trees in bioinformatics.","PeriodicalId":87204,"journal":{"name":"Proceedings. IEEE Computer Society Bioinformatics Conference","volume":"1 1","pages":"3-"},"PeriodicalIF":0.0,"publicationDate":"2002-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/CSB.2002.1039321","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62213950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Visualization of K-tuple distribution in procaryote complete genomes and their randomized counterparts","authors":"Huimin Xie, Bailin Hao","doi":"10.1109/CSB.2002.1039327","DOIUrl":"https://doi.org/10.1109/CSB.2002.1039327","url":null,"abstract":"We (2000) previously developed a simple scheme to visualize the string composition of long DNA sequences in terms of two- and one-dimensional (2D and 1D) histograms. While the patterns in the 2D histograms have been well understood, the structure of the 1D histograms has not been analyzed in details. It turns out that the structure of the 1D histograms of the genomic sequences and their randomized counterparts varies significantly depending on the g+c content of the genomes. In particular the 1D histograms of some randomized sequences may show rich structure, a seemingly anti-intuitive result. Three approaches are used to explain the phenomenon: (1) Monte Carlo simulation, (2) exact computation by using the Goulden-Jackson cluster method, and (3) a Poisson approximation method. The multi-modal phenomena in K-histograms are well elucidated by the last approach.","PeriodicalId":87204,"journal":{"name":"Proceedings. IEEE Computer Society Bioinformatics Conference","volume":"1 1","pages":"31-42"},"PeriodicalIF":0.0,"publicationDate":"2002-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/CSB.2002.1039327","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62214097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An application of a pathway alignment method to comparative analysis between genome and pathways","authors":"Shoko Miyake, Y. Tohsato, H. Matsuda","doi":"10.1109/CSB.2002.1039356","DOIUrl":"https://doi.org/10.1109/CSB.2002.1039356","url":null,"abstract":"We present a method for the comparative analysis of genomes and metabolic pathways based on similarity between gene orders and enzymatic reactions. To measure the reaction similarity, we formalized a scoring system by using the functional hierarchy of the EC numbers of enzymes. We have used an alignment method between given pathways, which is based on the longest common subsequence algorithm with the scoring system. The similarity score between pathways is expressed as the information content of their alignment. By applying our algorithm to the metabolic pathway in Escherichia coli, we have found several common patterns among the purine, lysine and arginine biosynthesis and other amino acid related metabolic pathways. We have also compared the alignments with gene orders on the E. coli genome by using a heuristic graph comparison method From the comparison, we have found that reaction orders and gene orders are conserved in the histidine and tryptophan biosynthesis pathways.","PeriodicalId":87204,"journal":{"name":"Proceedings. IEEE Computer Society Bioinformatics Conference","volume":"1 1","pages":"329-"},"PeriodicalIF":0.0,"publicationDate":"2002-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/CSB.2002.1039356","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62214289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Palakal, Matthew J. Stephens, S. Mukhopadhyay, R. Raje, Simon Rhodes
{"title":"A multi-level text mining method to extract biological relationships","authors":"M. Palakal, Matthew J. Stephens, S. Mukhopadhyay, R. Raje, Simon Rhodes","doi":"10.1109/CSB.2002.1039333","DOIUrl":"https://doi.org/10.1109/CSB.2002.1039333","url":null,"abstract":"Accurate and computationally efficient approaches in discovering relationships between biological objects from text documents are important for biologists to develop biological models. This paper presents a novel approach to extract relationships between multiple biological objects that are present in a text document. The approach involves object identification, reference resolution, ontology and synonym discovery, and extracting object-object relationships. Hidden Markov models (HMMs), dictionaries, and N-Gram models are used to set the framework to tackle the complex task of extracting object-object relationships. Experiments were carried out using a corpus of one thousand Medline abstracts. Intermediate results were obtained for the object identification process, synonym discovery, and finally the relationship extraction. For a corpus of thousand abstracts, 53 relationships were extracted of which 43 were correct, giving a specificity of 81%. The approach is both adaptable and scalable to new problems as opposed to rule-based methods.","PeriodicalId":87204,"journal":{"name":"Proceedings. IEEE Computer Society Bioinformatics Conference","volume":"1 1","pages":"97-108"},"PeriodicalIF":0.0,"publicationDate":"2002-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/CSB.2002.1039333","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62214547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fast and sensitive alignment of large genomic sequences","authors":"M. Brudno, B. Morgenstern","doi":"10.1109/CSB.2002.1039337","DOIUrl":"https://doi.org/10.1109/CSB.2002.1039337","url":null,"abstract":"Comparative analysis of syntenic genome sequences can be used to identify functional sites such as exons and regulatory elements. Here, the first step is to align two or several evolutionary related sequences and, in recent years, a number of computer programs have been developed for alignment of large genomic sequences. Some of these programs are extremely fast but often time-efficiency is achieved at the expense of sensitivity. One way of combining speed and sensitivity is to use an anchored-alignment approach. In a first step, a fast heuristic identifies a chain of strong sequence similarities that serve as anchor points. In a second step, regions between these anchor points are aligned using a slower but more sensitive method. We present CHAOS, a novel algorithm for rapid identification of chains of local sequence similarities among large genomic sequences. Similarities identified by CHAOS are used as anchor points to improve the running time of the DIALIGN alignment program. Systematic test runs show that this method can reduce the running time of DIALIGN by more than 93% while affecting the quality of the resulting alignments by only 1%. The source code for CHAOS is available at http://www.stanford.edu//spl sim/brudno/chaos/.","PeriodicalId":87204,"journal":{"name":"Proceedings. IEEE Computer Society Bioinformatics Conference","volume":"1 1","pages":"138-147"},"PeriodicalIF":0.0,"publicationDate":"2002-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/CSB.2002.1039337","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62214700","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Genome annotation and protein structure","authors":"S. Brenner","doi":"10.1109/CSB.2002.1039322","DOIUrl":"https://doi.org/10.1109/CSB.2002.1039322","url":null,"abstract":"Summary form only given. Structural genomics aims to provide a good experimental structure or computational model of every tractable protein in a complete genome. Underlying this goal is the immense value of protein structure, especially in permitting recognition of distant evolutionary relationships for proteins whose sequence analysis has failed to find any significant homologue. A considerable fraction of the genes in all sequenced genomes have no known function, and structure determination provides a direct means of revealing homology that may be used to infer their putative molecular function. The solved structures is similarly useful for elucidating the biochemical or biophysical role of proteins that have been previously ascribed only phenotypic functions. More generally, knowledge of an increasingly complete repertoire of protein structures will aid structure prediction methods, improve understanding of protein structure, and ultimately lend insight into molecular interactions and pathways.","PeriodicalId":87204,"journal":{"name":"Proceedings. IEEE Computer Society Bioinformatics Conference","volume":"1 1","pages":"4-"},"PeriodicalIF":0.0,"publicationDate":"2002-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/CSB.2002.1039322","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62213994","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Imoto, SunYong Kim, Takao Goto, S. Aburatani, Kousuke Tashiro, S. Kuhara, S. Miyano
{"title":"Bayesian network and nonparametric heteroscedastic regression for nonlinear modeling of genetic network.","authors":"S. Imoto, SunYong Kim, Takao Goto, S. Aburatani, Kousuke Tashiro, S. Kuhara, S. Miyano","doi":"10.1109/CSB.2002.1039344","DOIUrl":"https://doi.org/10.1109/CSB.2002.1039344","url":null,"abstract":"We propose a new statistical method for constructing genetic network from microarray gene expression data by using a Bayesian network. An essential point of Bayesian network construction is in the estimation of the conditional distribution of each random variable. We consider fitting nonparametric regression models with heterogeneous error variances to the microarray gene expression data to capture the nonlinear structures between genes. A problem still remains to be solved in selecting an optimal graph, which gives the best representation of the system among genes. We theoretically derive a new graph selection criterion from Bayes approach in general situations. The proposed method includes previous methods based on Bayesian networks. We demonstrate the effectiveness of the proposed method through the analysis of Saccharomyces cerevisiae gene expression data newly obtained by disrupting 100 genes.","PeriodicalId":87204,"journal":{"name":"Proceedings. IEEE Computer Society Bioinformatics Conference","volume":"1 1","pages":"219-27"},"PeriodicalIF":0.0,"publicationDate":"2002-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/CSB.2002.1039344","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62214390","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Bokhari, M. Glaser, H. Jordan, Y. Lansac, J. Sauer, B. Zeghbroeck
{"title":"Parallelizing a DNA simulation code for the Cray MTA-2","authors":"S. Bokhari, M. Glaser, H. Jordan, Y. Lansac, J. Sauer, B. Zeghbroeck","doi":"10.1109/CSB.2002.1039351","DOIUrl":"https://doi.org/10.1109/CSB.2002.1039351","url":null,"abstract":"The Cray MTA-2 (Multithreaded Architecture) is an unusual parallel supercomputer that promises ease of use and high performance. We describe our experience on the MTA-2 with a molecular dynamics code, SIMU-MD, that we are using to simulate the translocation of DNA through a nanopore in a silicon based ultrafast sequencer. Our sequencer is constructed using standard VLSI technology and consists of a nanopore surrounded by field effect transistors (FETs). We propose to use the FETs to sense variations in charge as a DNA molecule translocates through the pore and thus differentiate between the four building block nucleotides of DNA. We were able to port SIMU-MD, a serial C code, to the MTA with only a modest effort and with good performance. Our porting process needed neither a parallelism support platform nor attention to the intimate details of parallel programming and interprocessor communication, as would have been the case with more conventional supercomputers.","PeriodicalId":87204,"journal":{"name":"Proceedings. IEEE Computer Society Bioinformatics Conference","volume":"1 1","pages":"291-302"},"PeriodicalIF":0.0,"publicationDate":"2002-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/CSB.2002.1039351","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62214655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Design of genetic switches with only positive feedback loops","authors":"Tetsuya J. Kobayashi, Luonan Chen, K. Aihara","doi":"10.1109/CSB.2002.1039338","DOIUrl":"https://doi.org/10.1109/CSB.2002.1039338","url":null,"abstract":"We develop a new methodology to design synthetic genetic switch networks with multiple genes and time delays, by using monotone dynamical theory. We show that the networks with only positive feedback loops have no stable oscillation except equilibria whose stability is also independent of the time delays. Such systems have ideal properties for switch networks and can be designed without consideration of time delays, because the systems can be reduced from functional spaces to Euclidian spaces due to the independence to time delays. Specifically, we first prove the basic properties of the genetic networks composed of only positive feedback loops, and then propose a procedure to design the switches, which drastically simplifies analysis of the switches and makes theoretical analysis and designing tractable even for large scale systems. Finally, we demonstrate our theoretical results by designing a biologically plausible synthesized genetic switch with experimentally well investigated lacI, tetR, and cI genes.","PeriodicalId":87204,"journal":{"name":"Proceedings. IEEE Computer Society Bioinformatics Conference","volume":"1 1","pages":"151-162"},"PeriodicalIF":0.0,"publicationDate":"2002-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/CSB.2002.1039338","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62214258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}