{"title":"Fast and sensitive algorithm for aligning ESTs to human genome","authors":"Jun Ogasawara, S. Morishita","doi":"10.1109/CSB.2002.1039328","DOIUrl":"https://doi.org/10.1109/CSB.2002.1039328","url":null,"abstract":"There is a pressing need to align growing set of expressed sequence tags (ESTs) to newly sequenced human genome. The problem is, however, complicated by the exon/intron structure of eucaryotic genes, misread nucleotides in ESTs, and millions of repetitive sequences in genomic sequences. Indeed, to solve this, algorithms that use dynamic programming (DP) have been proposed, but in reality, these algorithms require an enormous amount of processing time. In an effort to improve the computational efficiency of these classical DP algorithms, we develop a software that fully utilizes the lookup-table for allowing the efficient detection of the start- and end-points of an EST within a given DNA sequence, and subsequently, the prompt identification of exons and introns. In addition, high sensitivity and accuracy must be achieved by calculating locations of all spliced sites correctly for more ESTs while retaining high computational efficiency. This goal is hard to accomplish in practice, owing to misread nucleotides in ESTs and repetitive sequences in the genome, but we present a couple of heuristics effective in settling this issue. Experimental results confirm that our technique improves the overall computation time by orders of magnitude compared with common tools such as sim4 and BLAT and attains high sensitivity and accuracy against datasets of clean and documented genes at the same time.","PeriodicalId":87204,"journal":{"name":"Proceedings. IEEE Computer Society Bioinformatics Conference","volume":"1 1","pages":"43-53"},"PeriodicalIF":0.0,"publicationDate":"2002-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/CSB.2002.1039328","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62214130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lada A. Adamic, Dennis M. Wilkinson, B. Huberman, Eytan Adar
{"title":"A literature based method for identifying gene-disease connections","authors":"Lada A. Adamic, Dennis M. Wilkinson, B. Huberman, Eytan Adar","doi":"10.1109/CSB.2002.1039334","DOIUrl":"https://doi.org/10.1109/CSB.2002.1039334","url":null,"abstract":"We present a statistical method that can swiftly identify, from the literature, sets of genes known to be associated with given diseases. It offers a comprehensive way to treat alias symbols, a statistical method for computing the relevance of the gene to the query, and a novel way to disambiguate gene symbols from other abbreviations. The method is illustrated by finding genes related to breast cancer.","PeriodicalId":87204,"journal":{"name":"Proceedings. IEEE Computer Society Bioinformatics Conference","volume":"1 1","pages":"109-117"},"PeriodicalIF":0.0,"publicationDate":"2002-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/CSB.2002.1039334","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62214555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"P-quasi complete linkage analysis for gene-expression data","authors":"S. Seno, R. Teramoto, H. Matsuda","doi":"10.1109/CSB.2002.1039365","DOIUrl":"https://doi.org/10.1109/CSB.2002.1039365","url":null,"abstract":"In order to find the function of genes from gene-expression profiles, hierarchical clustering has generally been used, but this method has problems, for example a dendrogram tends to change by data dependence, therefore it is easy to be influenced of the error of an experimental noise. To cope with problems, we propose another type of clustering. We formulate the problem of clustering as a graph-covering problem by connected subgraphs where vertices and edges of the graph denote genes and similarities between genes, respectively. The method is based on the p-quasi complete linkage algorithm for describing clusters. We present the outline of an algorithm for clustering a set of genes into subsets corresponding to p-quasi complete linkage graphs.","PeriodicalId":87204,"journal":{"name":"Proceedings. IEEE Computer Society Bioinformatics Conference","volume":"1 1","pages":"342-"},"PeriodicalIF":0.0,"publicationDate":"2002-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/CSB.2002.1039365","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62214595","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Human factors in software development","authors":"B. Curtis","doi":"10.1002/0471028959.SOF152","DOIUrl":"https://doi.org/10.1002/0471028959.SOF152","url":null,"abstract":"Since the 1950s, psychologists have studied the behavioral aspects of computer programming. However, it has been difficult to integrate their data with theory because of the mixture of psychological paradigms that have guided their research. This article reviews the research results that have been generated under the fives psychological paradigms used most often in exploring programming problems. These five paradigms are (1) individual differences, (2) group behavior, (3) organizational behavior, (4) human factors, and (5) cognitive science. The major theoretical and practical contributions of each area to the theory and practice of software engineering are discussed. Current trends indicate that research guided by the paradigm of cognitive science will be the easiest to integrate with new developments in artificial intelligence and computer science theory. \u0000 \u0000 \u0000Keywords: \u0000 \u0000human factors; \u0000paradigms; \u0000group behavior; \u0000organizational behavior; \u0000cognitive ergobomics; \u0000requiquirement; \u0000design aid; \u0000cognitive science programming","PeriodicalId":87204,"journal":{"name":"Proceedings. IEEE Computer Society Bioinformatics Conference","volume":"9 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2002-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76164175","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mathew Palakal, Matthew Stephens, Snehasis Mukhopadhyay, Rajeev Raje, Simon Rhodes
{"title":"A multi-level text mining method to extract biological relationships.","authors":"Mathew Palakal, Matthew Stephens, Snehasis Mukhopadhyay, Rajeev Raje, Simon Rhodes","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Accurate and computationally efficient approaches in discovering relationships between biological objects from text documents are important for biologists to develop biological models. This paper presents a novel approach to extract relationships between multiple biological objects that are present in a text document. The approach involves object identification, reference resolution, ontology and synonym discovery, and extracting object-object relationships. Hidden Markov Models (HMMs), dictionaries, and N-Gram models are used to set the framework to tackle the complex task of extracting object-object relationships. Experiments were carried out using a corpus of one thousand Medline abstracts. Intermediate results were obtained for the object identification process, synonym discovery, and finally the relationship extraction. For a corpus of thousand abstracts, 53 relationships were extracted of which 43 were correct, giving a specificity of 81%. The approach is both adaptable and scalable to new problems as opposed to rule-based methods.</p>","PeriodicalId":87204,"journal":{"name":"Proceedings. IEEE Computer Society Bioinformatics Conference","volume":"1 ","pages":"97-108"},"PeriodicalIF":0.0,"publicationDate":"2002-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25064508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Constrained multiple sequence alignment tool development and its application to RNase family alignment.","authors":"Chuan Yi Tang, Chin Lung Lu, Margaret Dah-Tsyr Chang, Yin-Te Tsai, Yuh-Ju Sun, Kun-Mao Chao, Jia-Ming Chang, Yu-Han Chiou, Chia-Mao Wu, Hao-Teng Chang, Wei-I Chou","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>In this paper, we design an algorithm of computing a constrained multiple sequence alignment (CMSA for short) for guaranteeing that the generated alignment satisfies the user-specified constraints that some particular residues should be aligned together. If the number of residues needed to be aligned together is a constant alpha, then the time-complexity of our CMSA algorithm for aligning K sequences is O (alphaKn4), where n is the maximum of the lengths of sequences. In addition, we have build up such a CMSA software system and made several experiments on the RNase sequences, which mainly function in catalyzing the degradation of RNA molecules. The resulting alignments illustrate the practicability of our method.</p>","PeriodicalId":87204,"journal":{"name":"Proceedings. IEEE Computer Society Bioinformatics Conference","volume":"1 ","pages":"127-37"},"PeriodicalIF":0.0,"publicationDate":"2002-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25064511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Design of genetic switches with only positive feedback loops.","authors":"Tetsuya Kobayashi, Luonan Chen, Kazuyuki Aihara","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>We develop a new methodology to design synthetic genetic switch networks with multiple genes and time delays, by using monotone dynamical theory. We show that the networks with only positive feedback loops have no stable oscillation except equilibria whose stability is also independent of the time delays. Such systems have ideal properties for switch networks and can be designed with out consideration of time delays, because the systems can be reduced from functional spaces to Euclidian spaces due to the independence to time delays. Specifically, we first prove the basic properties of the genetic networks composed of only positive feedback loops, and then propose a procedure to design the switches, which drastically simplifies analysis of the switches and makes theoretical analysis and designing tractable even for large scale systems. Finally, we demonstrate our theoretical results by designing a biologically plausible synthesized genetic switch with experimentally well investigated lacI, tetR, and cI genes.</p>","PeriodicalId":87204,"journal":{"name":"Proceedings. IEEE Computer Society Bioinformatics Conference","volume":"1 ","pages":"151-62"},"PeriodicalIF":0.0,"publicationDate":"2002-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25064513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Masazumi Takahashi, Fumihiko Matsuda, Nino Margetic, Mark Lathrop
{"title":"Automated identification of single nucleotide polymorphisms from sequencing data.","authors":"Masazumi Takahashi, Fumihiko Matsuda, Nino Margetic, Mark Lathrop","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Single nucleotide polymorphisms (SNPs) provide abundant information about genetic variation. Large scale discovery of high frequency SNPs is being undertaken using various methods. However, the publicly available SNP data are not always accurate, and therefore should be verified. If only a particular gene locus is concerned,locus-specific polymerase chain reaction amplification may be useful. Problem of this method is that the secondary peak has to be measured. We have analyzed trace data from conventional sequencing equipment and found an applicable rule to discern SNPs from noise. We have developed software that integrates this function to automatically identify SNPs. The software works accurately for high quality sequences and also can detect SNPs in low quality sequences. Further, it can determine allele frequency, display this information as a bar graph and assign corresponding nucleotide combinations. It is very useful for identifying de novo SNPs in a DNA fragment of interest.</p>","PeriodicalId":87204,"journal":{"name":"Proceedings. IEEE Computer Society Bioinformatics Conference","volume":"1 ","pages":"87-93"},"PeriodicalIF":0.0,"publicationDate":"2002-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25235877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A new clustering method for microarray data analysis.","authors":"Louxin Zhang, Song Zhu","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>A novel clustering approach is introduced to overcome data missing and inconsistency of gene expression levels under different conditions in the stage of clustering. It is based on the so-called smooth score, which is defined for measuring the deviation of the expression level of a gene and the average expression level of all the genes involved under a condition. We present an efficient greedy algorithm for finding clusters with smooth score below a threshold after studying its computational complexity. The algorithm was tested intensively on random matrixes and a yeast data. It was shown to perform well in finding co-regulation patterns in a test with the yeast data.</p>","PeriodicalId":87204,"journal":{"name":"Proceedings. IEEE Computer Society Bioinformatics Conference","volume":"1 ","pages":"268-75"},"PeriodicalIF":0.0,"publicationDate":"2002-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25064327","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Visualization techniques for genomic data.","authors":"Ann E Loraine, Gregg A Helt","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>In order to take full advantage of the newly available public human genome sequence data and associated annotations, biologists require visualization tools that can accommodate the high frequency of alternative splicing in human genes and other complexities. In this article, we describe techniques for presenting human genomic sequence data and annotations in an interactive, graphical format, with the aim of providing developers with a guide to what features are most likely to meet biologists' needs. These techniques include: one-dimensional semantic zooming to show sequence data alongside gene structures; moveable, adjustable tiers; visual encoding of translation frame to show how alternative transcript structure affects encoded proteins; and display of protein domains in the context of genomic sequence to show how alternative splicing impacts protein structure and function.</p>","PeriodicalId":87204,"journal":{"name":"Proceedings. IEEE Computer Society Bioinformatics Conference","volume":"1 ","pages":"321-6"},"PeriodicalIF":0.0,"publicationDate":"2002-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25064332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}