Mahdi Belcaid, Anne Bergeron, A. Chateau, C. Chauve, Yannick Gingras, G. Poisson, M. Vendette
{"title":"Exploring Genome Rearrangements using Virtual Hybridization","authors":"Mahdi Belcaid, Anne Bergeron, A. Chateau, C. Chauve, Yannick Gingras, G. Poisson, M. Vendette","doi":"10.1142/9781860947995_0023","DOIUrl":"https://doi.org/10.1142/9781860947995_0023","url":null,"abstract":"Genomes evolve with both mutations and large scale events, such as inversions, translocations, duplications and losses, that modify the structure of a set of chromosomes. In order to study these types of large-scale events, the first task is to select, in different genomes, sub-sequences that are considered “equivalent”. Many approaches have been used to identify equivalent sequences, either based on biological experiments, gene annotations, or sequence alignments. These techniques suffer from a variety of drawbacks that often result in the impossibility, for independent researchers, to reproduce the datasets used in the studies, or to adapt them to newly sequenced genomes. In this paper, we show that carefully selected small probes can be efficiently used to construct datasets. Once a set of probes is identified ‐ and published ‐, datasets for whole genome comparisons can be produced, and reproduced, with elementary algorithms; decisions about what is considered an occurrence of a probe in a genome can be criticized and reevaluated; and the structure of a newly sequenced genome can be obtained rapidly, without the need of gene annotations or intensive computations.","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":"33 1","pages":"205-214"},"PeriodicalIF":0.0,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76101912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Subtle Motif Discovery for Detection of DNA Regulatory Sites","authors":"M. Comin, L. Parida","doi":"10.1142/9781860947995_0006","DOIUrl":"https://doi.org/10.1142/9781860947995_0006","url":null,"abstract":"We address the problem of detecting consensus motifs, that occur with subtle variations, across multiple sequences. These are usually functional domains in DNA sequences such as transcriptional binding factors or other regulatory sites. The problem in its generality has been considered difficult and various benchmark data serve as the litmus test for different computational methods. We present a method centered around unsupervised combinatorial pattern discovery. The parameters are chosen using a careful statistical analysis of consensus motifs. This method works well on the benchmark data and is general enough to be extended to a scenario where the variation in the consensus motif includes indels (along with mutations). We also present some results on detection of transcription binding factors in human DNA sequences. Availability: The system will be made available at www.research.ibm.com/computationalgenomics.","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":"44 1","pages":"27-36"},"PeriodicalIF":0.0,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78268989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"etagenome Analysis using Megan","authors":"D. Huson, Alexander F. Auch, Qi Ji, S. Schuster","doi":"10.1142/9781860947995_0004","DOIUrl":"https://doi.org/10.1142/9781860947995_0004","url":null,"abstract":"In metagenomics, the goal is to analyze the genomic content of a sample of organisms collected from a common habitat. One approach is to apply large-scale random shotgun sequencing techniques to obtain a collection of DNA reads from the sample. This data is then compared against databases of known sequences such as NCBI-nr or NCBI-nt, in an attempt to identify the taxonomical content of the sample. We introduce a new software called MEGAN (Meta Genome ANalyzer) that generates species profiles from such sequencing data by assigning reads to taxa of the NCBI taxonomy using a straight-forward assignment algorithm. The approach is illustrated by application to a number of datasets obtained using both sequencing-by-synthesis and Sanger sequencing technology, including metagenomic data from a mammoth bone, a portion of the Sargasso sea data set, and several complete microbial test genomes used for validation proposes.","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":"73 1","pages":"7-16"},"PeriodicalIF":0.0,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78078645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Distance Between Randomly Constructed Genomes","authors":"W. Xu","doi":"10.1142/9781860947995_0025","DOIUrl":"https://doi.org/10.1142/9781860947995_0025","url":null,"abstract":"In this paper, we study the exact probability distribution of the number of cycles c in the breakpoint graph of two random genomes with n genes or markers and 1 and 2 linear chromosomes, respectively. The genomic distance d between the two genomes is d = n c. In the limit we find that the expectation of d is n 2 1 2 2 1+2 2 1 1 2 ln","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":"20 1","pages":"227-236"},"PeriodicalIF":0.0,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85711737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Genotype-Based Case-Control Analysis, Violation of Hardy-Weinberg Equilibrium, and Phase Diagrams","authors":"Y. Suh, Wentian Li","doi":"10.1142/9781860947995_0021","DOIUrl":"https://doi.org/10.1142/9781860947995_0021","url":null,"abstract":"We study in detail a particular statistical method in genetic case-control analysis, labeled “genotypebased association”, in which the two test results from assuming dominant and recessive model are combined in one optimal output. This method differs both from the allele-based association which artificially doubles the sample size, and the direct χ test on 3-by-2 contingency table which may overestimate the degree of freedom. We conclude that the comparative advantage (or disadvantage) of the genotype-based test over the allele-based test mainly depends on two parameters, the allele frequency difference δ and the Hardy-Weinberg disequilibrium coefficient difference δǫ. Six different situations, called “phases”, characterized by the two X test statistics in allele-based and genotypebased test, are well separated in the phase diagram parameterized by δ and δǫ. For two major groups of phases, a single parameter θ = tan(δ/δǫ) is able to achieves an almost perfect phase separation. We also applied the analytic result to several types of disease models. It is shown that for dominant and additive models, genotype-based tests are favored over allele-based tests.","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":"35 1","pages":"185-194"},"PeriodicalIF":0.0,"publicationDate":"2006-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80622850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"RECOMP: A Parsimony-Based Method for Detecting Recombination","authors":"Derek A. Ruths, L. Nakhleh","doi":"10.1142/9781860947292_0009","DOIUrl":"https://doi.org/10.1142/9781860947292_0009","url":null,"abstract":"The central role phylogeny plays in biology and its pervasiveness in comparative genomics studies have led researchers to develop a plethora of methods for its accurate reconstruction. Most phylogeny reconstruction methods, though, assume a single tree underlying a given sequence alignment. While a good first approximation in many cases, a tree may not always model the evolutionary history of a set of organisms. When events such as interspecific recombi nation occur, different regions in the alignment may have different underlying trees. Accurate reconstruction of the evolutionary history of a set of sequences requires recombination detection, followed by separate analyses of the nonrecombining regions. Besides aiding accurate phylogenetic analyses, detecting recombination helps in understanding one of the main mechanisms of bacterial genome diversification. In this paper, we introduce RECOMP, an accurate and fast method for detecting recombination events in a sequence alignment. The method slides a fixed-width window across the alignment and determines the presence of recombination events based on a combination of topology and parsimony score differences in neighboring windows. On several synthetic and biological datasets, our method performs much faster than existing tools with accuracy comparable to the best available method.","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":"24 1","pages":"59-68"},"PeriodicalIF":0.0,"publicationDate":"2005-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74004680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ching-Tai Chen, Hsin-Nan Lin, K. Wu, Ting-Yi Sung, W. Hsu
{"title":"A Knowledge-Based Approach to Protein Local Structure Prediction","authors":"Ching-Tai Chen, Hsin-Nan Lin, K. Wu, Ting-Yi Sung, W. Hsu","doi":"10.1142/9781860947292_0029","DOIUrl":"https://doi.org/10.1142/9781860947292_0029","url":null,"abstract":"Local structure prediction can facilitate ab initio structure prediction, protein threading, and remote homology detection. However, previous approaches to local structure prediction suffer from poor accuracy. In this paper, we propose a knowledge-based prediction method that assigns a measure called the local match rate to each position of an amino acid sequence to estimate the confidence of our approach. To remedy prediction results with low local match rates, we use a neural network prediction method. Then, we have a hybrid prediction method, HYPLOSP (HYbrid method to Protein LOcal Structure Prediction) that combines our knowledge-based method with a neural network method. We test the method on two different structural alphabets and evaluate it by QN, which is similar to Q3 in secondary structure prediction. The experimental results show that our method yields a significant improvement over previous studies.","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":"66 1","pages":"257-266"},"PeriodicalIF":0.0,"publicationDate":"2005-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87993225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Aßfalg, H. Kriegel, Peer Kröger, Peter Kunath, A. Pryakhin, M. Renz
{"title":"Semi-Supervised Threshold Queries on Pharmacogenomics Time Sequences","authors":"J. Aßfalg, H. Kriegel, Peer Kröger, Peter Kunath, A. Pryakhin, M. Renz","doi":"10.1142/9781860947292_0034","DOIUrl":"https://doi.org/10.1142/9781860947292_0034","url":null,"abstract":"The analysis of time series data is of capital importance for pharmacogenomics since the experimental evaluations are usually based on observations of time dependent reactions or behaviors of organisms. Thus, data mining in time series databases is an important instrument towards understanding the effects of drugs on individuals. However, the complex nature of time series poses a big challenge for effective and efficient data mining. In this paper, we focus on the detection of temporal dependencies between different time series: we introduce the novel analysis concept of threshold queries and its semi-supervised extension which supports the parameter setting by applying training datasets. Basically, threshold queries report those time series exceeding an user-defined query threshold at certain time frames. For semi-supervised threshold queries the corresponding threshold is automatically adjusted to the characteristics of the data set, the training dataset, respectively. In order to support threshold queries efficiently, we present a new efficient access method which uses the fact that only partial information of the time series is required at query time. In an extensive experimental evaluation we demonstrate the performance of our solution and show that semi-supervised threshold queries applied to gene expression data are very worthwhile. Data mining in time series data is a key step within the study of drugs and their impact on living systems, including the discovery, design, usage, modes of action, and metabolism of chemically defined therapeutics and toxic agents. In particular, the analysis of time series data is of great practical importance for pharmacogenomics. Classical time series analysis is based on techniques for forecasting or for identifying patterns (e.g. trend analysis or seasonality). The similarity between time series, e.g. similar movements of time series, plays a key role for the analysis. In this paper, we introduce a novel but very important similarity query type which we call threshold query. Given a time series database DB, a query time series Q, and a query threshold τ , a threshold query TSQ DB(Q, τ ) returns those time series X ∈ DB having the most similar sequence of time intervals in which the time series values are above τ .I n other words, we assume that each time series X ∈ DB ∪{ Q} is transformed into a sequence of disjoint time intervals covering only those values of X that are (strictly) above the threshold τ . Then, a threshold query returns for a given query object Q that object X ∈ DB having the most similar sequence of time intervals. Let us note that the exact values of the time series are not considered, rather we are only interested in whether the time series is above or below a given threshold τ . In other words, the concept of threshold queries enables us to focus only on the duration of certain events indicated by increased time series amplitudes, while the degree of the corresponding amplitudes are i","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":"81 1","pages":"307-316"},"PeriodicalIF":0.0,"publicationDate":"2005-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88587852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the Complexity of Finding Control Strategies for Boolean Networks","authors":"T. Akutsu, M. Hayashida, W. Ching, M. Ng","doi":"10.1142/9781860947292_0013","DOIUrl":"https://doi.org/10.1142/9781860947292_0013","url":null,"abstract":"This paper considers a problem of finding control strategies for Boolean networks, where Boolean networks have been used as a model of genetic networks. This paper shows that finding a control strategy leading to the desired global state is NP-hard even if there is only one control node in the network. This result justifies existing exponential time algorithms for finding control strategies for probabilistic Boolean networks. On the other hand, this paper shows that the problem can be solved in polynomial time if the network has a tree structure.","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":"30 1","pages":"99-108"},"PeriodicalIF":0.0,"publicationDate":"2005-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91501188","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Whole Genome Optical Mapping","authors":"M. Waterman","doi":"10.1142/9781860947292_0003","DOIUrl":"https://doi.org/10.1142/9781860947292_0003","url":null,"abstract":"An innovative new technology, optical mapping, is used to infer the genome map of the location of short sequence patterns called restriction sites. The technology, developed by David Schwartz, allows the visualization of the maps of randomly located single molecules around a million base pairs in length. The genome map is constructed from overlapping these shorter maps. The mathematical and computational challenges come from modeling the measurement errors and from the process of map assembly.","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":"61 1","pages":"5"},"PeriodicalIF":0.0,"publicationDate":"2005-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81370929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}