Ravi Vijaya Satya, Amar Mukherjee, Udaykumar Ranga
{"title":"A pattern matching algorithm for codon optimization and CpG motif-engineering in DNA expression vectors.","authors":"Ravi Vijaya Satya, Amar Mukherjee, Udaykumar Ranga","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Codon optimization enhances the efficiency of DNA expression vectors used in DNA vaccination and gene therapy by increasing protein expression. Additionally, certain nucleotide motifs have experimentally been shown to be immuno-stimulatory while certain others immuno-suppressive. In this paper, we present algorithms to locate a given set of immuno-modulatory motifs in the DNA expression vectors corresponding to a given amino acid sequence and maximize or minimize the number and the context of the immuno-modulatory motifs in the DNA expression vectors. The main contribution is to use multiple pattern matching algorithms to synthesize a DNA sequence for a given amino acid sequence and a graph theoretic approach for finding the longest weighted path in a directed graph that will maximize or minimize certain motifs. This is achieved using O(n(2)) time, where n is the length of the amino acid sequence. Based on this, we develop a software tool.</p>","PeriodicalId":87204,"journal":{"name":"Proceedings. IEEE Computer Society Bioinformatics Conference","volume":"2 ","pages":"294-305"},"PeriodicalIF":0.0,"publicationDate":"2003-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25834892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A block coding method that leads to significantly lower entropy values for the proteins and coding sections of Haemophilus influenzae.","authors":"G Sampath","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>A simple statistical block code in combination with the LZW-based compression utilities gzip and compress has been found to increase by a significant amount the level of compression possible for the proteins encoded in Haemophilus influenzae, the first fully sequenced genome. The method yields an entropy value of 3.665 bits per symbol (bps), which is 0.657 bps below the maximum of 4.322 bps and an improvement of 0.452 bps over the best known to date of 4.118 bps using Matsumoto, Sadakane, and Imai's lza-CTW algorithm. Calculations based on a compact inverse genetic code show that the genome has a maximum entropy of 1.757 bps for the coding regions, with a possibly lower actual entropy. These results hint at the existence of hitherto unexplored redundancies that do not show up in Markov models and are indicative of more internal structure than suspected in both the protein and the genome.</p>","PeriodicalId":87204,"journal":{"name":"Proceedings. IEEE Computer Society Bioinformatics Conference","volume":"2 ","pages":"287-93"},"PeriodicalIF":0.0,"publicationDate":"2003-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25834891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Clustering time-varying gene expression profiles using scale-space signals.","authors":"Tanveer Syeda-Mahmood","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>The functional state of an organism is determined largely by the pattern of expression of its genes. The analysis of gene expression data from gene chips has primarily revolved around clustering and classification of the data using machine learning techniques based on the intensity of expression alone with the time-varying pattern mostly ignored. In this paper, we present a pattern recognition-based approach to capturing similarity by finding salient changes in the time-varying expression patterns of genes. Such changes can give clues about important events, such as gene regulation by cell-cycle phases, or even signal the onset of a disease. Specifically, we observe that dissimilarity between time series is revealed by the sharp twists and bends produced in a higher-dimensional curve formed from the constituent signals. Scale-space analysis is used to detect the sharp twists and turns and their relative strength with respect to the component signals is estimated to form a shape similarity measure between time profiles. A clustering algorithm is presented to cluster gene profiles using the scale-space distance as a similarity metric. Multi-dimensional curves formed from time series within clusters are used as cluster prototypes or indexes to the gene expression database, and are used to retrieve the functionally similar genes to a query gene profile. Extensive comparison of clustering using scale-space distance in comparison to traditional Euclidean distance is presented on the yeast genome database.</p>","PeriodicalId":87204,"journal":{"name":"Proceedings. IEEE Computer Society Bioinformatics Conference","volume":"2 ","pages":"48-56"},"PeriodicalIF":0.0,"publicationDate":"2003-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25834432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Refactoring reusable business components","authors":"C. Neill, Binny S. Gill","doi":"10.1109/MITP.2003.1176488","DOIUrl":"https://doi.org/10.1109/MITP.2003.1176488","url":null,"abstract":"Object evangelists have long heralded software reuse as a bonus for applying object-oriented analysis, design, and programming techniques, but the benefits have been less dramatic than anticipated. Designing reusable software systems is difficult because a complete understanding of the software under consideration is only available toward the project's end. An appropriate alternative, then, is to refactor for reuse, restructure the completed system without modifying or adding to its behavior. We describe a refactoring effort undertaken at a Delaware-Valley-based financial firm. This firm sought to reuse components from a large Web-based system.","PeriodicalId":87204,"journal":{"name":"Proceedings. IEEE Computer Society Bioinformatics Conference","volume":"76 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2003-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72874224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automated protein NMR resonance assignments.","authors":"Xiang Wan, Dong Xu, Carolyn M Slupsky, Guohui Lin","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>NMR resonance peak assignment is one of the key steps in solving an NMR protein structure. The assignment process links resonance peaks to individual residues of the target protein sequence, providing the prerequisite for establishing intra- and inter-residue spatial relationships between atoms. The assignment process is tedious and time-consuming, which could take many weeks. Though there exist a number of computer programs to assist the assignment process, many NMR labs are still doing the assignments manually to ensure quality. This paper presents (1) a new scoring system for mapping spin systems to residues, (2) an automated adjacency information extraction procedure from NMR spectra, and (3) a very fast assignment algorithm based on our previous proposed greedy filtering method and a maximum matching algorithm to automate the assignment process. The computational tests on 70 instances of (pseudo) experimental NMR data of 14 proteins demonstrate that the new score scheme has much better discerning power with the aid of adjacency information between spin systems simulated across various NMR spectra. Typically, with automated extraction of adjacency information, our method achieves nearly complete assignments for most of the proteins. The experiment shows very promising perspective that the fast automated assignment algorithm together with the new score scheme and automated adjacency extraction may be ready for practical use.</p>","PeriodicalId":87204,"journal":{"name":"Proceedings. IEEE Computer Society Bioinformatics Conference","volume":"2 ","pages":"197-208"},"PeriodicalIF":0.0,"publicationDate":"2003-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25833755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Robert Grossman, Pavan Kasturi, Donald Hamelberg, Bing Liu
{"title":"Experimental studies of the Universal Chemical Key (UCK) algorithm on the NCI database of chemical compounds.","authors":"Robert Grossman, Pavan Kasturi, Donald Hamelberg, Bing Liu","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>We have developed an algorithm called the Universal Chemical Key (UCK) algorithm that constructs a unique key for a molecular structure. The molecular structures are represented as undirected labeled graphs with the atoms representing the vertices of the graph and the bonds representing the edges. The algorithm was tested on 236,917 compounds obtained from the National Cancer Institute (NCI) database of chemical compounds. In this paper we present the algorithm,some examples and the experimental results on the NCI database. On the NCI database, the UCK algorithm provided distinct unique keys for chemicals with different molecular structures.</p>","PeriodicalId":87204,"journal":{"name":"Proceedings. IEEE Computer Society Bioinformatics Conference","volume":"2 ","pages":"244-50"},"PeriodicalIF":0.0,"publicationDate":"2003-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25834886","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alexandre Tchourbanov, Daniel Quest, Hesham Ali, Mark Pauley, Robert Norgren
{"title":"A new approach for gene annotation using unambiguous sequence joining.","authors":"Alexandre Tchourbanov, Daniel Quest, Hesham Ali, Mark Pauley, Robert Norgren","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>The problem addressed by this paper is accurate and automatic gene annotation following precise identification/ annotation of exon and intron boundaries of biologically verified nucleotide sequences using the alignment of human genomic DNA to curated mRNA transcripts. We provide a detailed description of a new cDNA/DNA homology gene annotation algorithm that combines the results of BLASTN searches and spliced alignments. Compared to other programs currently in use, annotation quality is significantly increased through the unambiguous junction of genomic DNA sequences. We also address gene annotation with both non-canonic splice sites and short exons. The approach has been tested on the Genie learning subset as well as full-scale human RefSeq, and has demonstrated performance as high as 97%.</p>","PeriodicalId":87204,"journal":{"name":"Proceedings. IEEE Computer Society Bioinformatics Conference","volume":"2 ","pages":"353-62"},"PeriodicalIF":0.0,"publicationDate":"2003-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25834137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Liliana Florea, Bjarni Halldórsson, Oliver Kohlbacher, Russell Schwartz, Stephen Hoffman, Sorin Istrail
{"title":"Epitope prediction algorithms for peptide-based vaccine design.","authors":"Liliana Florea, Bjarni Halldórsson, Oliver Kohlbacher, Russell Schwartz, Stephen Hoffman, Sorin Istrail","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Peptide-based vaccines, in which small peptides derived from target proteins (eptiopes) are used to provoke an immune reaction, have attracted considerable attention recently as a potential means both of treating infectious diseases and promoting the destruction of cancerous cells by a patient's own immune system. With the availability of large sequence databases and computers fast enough for rapid processing of large numbers of peptides, computer aided design of peptide-based vaccines has emerged as a promising approach to screening among billions of possible immune-active peptides to find those likely to provoke an immune response to a particular cell type. In this paper, we describe the development of three novel classes of methods for the prediction problem. We present a quadratic programming approach that can be trained on quantitative as well as qualitative data. The second method uses linear programming to counteract the fact that our training data contains mostly positive examples. The third class of methods uses sequence profiles obtained by clustering known epitopes to score candidate peptides. By integrating these methods, using a simple voting heuristic, we achieve improved accuracy over the state of the art.</p>","PeriodicalId":87204,"journal":{"name":"Proceedings. IEEE Computer Society Bioinformatics Conference","volume":"2 ","pages":"17-26"},"PeriodicalIF":0.0,"publicationDate":"2003-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"26133913","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Safran, Irina Solomon, O. Shmueli, Michal Lapidot, S. Shen-Orr, A. Adato, Uri Ben-Dor, Nir Esterman, Naomi Rosen, Inga Peter, T. Olender, V. Chalifa-Caspi, D. Lancet
{"title":"GeneCards/spl trade/ 2002: an evolving human gene compendium","authors":"M. Safran, Irina Solomon, O. Shmueli, Michal Lapidot, S. Shen-Orr, A. Adato, Uri Ben-Dor, Nir Esterman, Naomi Rosen, Inga Peter, T. Olender, V. Chalifa-Caspi, D. Lancet","doi":"10.1109/CSB.2002.1039362","DOIUrl":"https://doi.org/10.1109/CSB.2002.1039362","url":null,"abstract":"GeneCards/spl trade/ (http://bioinfo.weizmann.ac.il/cards/) is an automated, integrated database of human genes, genomic maps, proteins, and diseases, with software that retrieves, consolidates, searches, and displays human genome information. Over the past few years, the system has consistently, added new features including sequence accessions, genomic locations, cDNA assemblies, orthologies, medical information, 3D protein structures, SNP summaries, and gene expression. In parallel, its infrastructure is being upgraded to use object-oriented Perl to produce, display, and search data that is formatted in Extensible Markup Language (XML, (http://www.w3.org/XML), providing a basis for schema-driven display code and context-specific searches.","PeriodicalId":87204,"journal":{"name":"Proceedings. IEEE Computer Society Bioinformatics Conference","volume":"1 1","pages":"339-"},"PeriodicalIF":0.0,"publicationDate":"2002-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/CSB.2002.1039362","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62214482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Parallel Monte Carlo methods for physical mapping of chromosomes","authors":"S. Bhandarkar, J. Arnold","doi":"10.1109/CSB.2002.1039330","DOIUrl":"https://doi.org/10.1109/CSB.2002.1039330","url":null,"abstract":"Reconstructing a physical map of a chromosome from a genomic library presents a central computational problem in genetics. Physical map reconstruction in the presence of errors is a problem of high, computational complexity. Parallel Monte Carlo methods for a maximum likelihood estimation-based approach to physical map reconstruction are presented. The estimation procedure entails gradient descent search for determining the optimal spacings between probes for a given probe ordering. The optimal probe ordering is determined using a simulated Monte Carlo algorithm. A two-tier parallelization. strategy is proposed wherein the gradient descent search is parallelized at the lower level and the simulated Monte Carlo algorithm is simultaneously parallelized at the higher level. Implementation and experimental results on a network of shared-memory symmetric multiprocessors (SMP) are presented.","PeriodicalId":87204,"journal":{"name":"Proceedings. IEEE Computer Society Bioinformatics Conference","volume":"1 1","pages":"64-75"},"PeriodicalIF":0.0,"publicationDate":"2002-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/CSB.2002.1039330","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62213822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}