Matej Oresic, Clary B Clish, Eugene J Davidov, Elwin Verheij, Jack Vogels, Louis M Havekes, Eric Neumann, Aram Adourian, Stephen Naylor, Jan van der Greef, Thomas Plasterer
{"title":"Phenotype characterisation using integrated gene transcript, protein and metabolite profiling.","authors":"Matej Oresic, Clary B Clish, Eugene J Davidov, Elwin Verheij, Jack Vogels, Louis M Havekes, Eric Neumann, Aram Adourian, Stephen Naylor, Jan van der Greef, Thomas Plasterer","doi":"10.2165/00822942-200403040-00002","DOIUrl":"https://doi.org/10.2165/00822942-200403040-00002","url":null,"abstract":"<p><p>Multifactorial diseases present a significant challenge for functional genomics. Owing to their multiple compartmental effects and complex biomolecular activities, such diseases cannot be adequately characterised by changes in single components, nor can pathophysiological changes be understood by observing gene transcripts alone. Instead, a pattern of subtle changes is observed in multifactorial diseases across multiple tissues and organs with complex associations between corresponding gene, protein and metabolite levels. This article presents methods for exploratory and integrative analysis of pathophysiological changes at the biomolecular level. In particular, novel approaches are introduced for the following challenges: (i) data processing and analysis methods for proteomic and metabolomic data obtained by electrospray ionisation (ESI) liquid chromatography-tandem mass spectrometry (LC/MS); (ii) association analysis of integrated gene, protein and metabolite patterns that are most descriptive of pathophysiological changes; and (iii) interpretation of results obtained from association analyses in the context of known biological processes. These novel approaches are illustrated with the apolipoprotein E3-Leiden transgenic mouse model, a commonly used model of atherosclerosis. We seek to gain insight into the early responses of disease onset and progression by determining and identifying--well in advance of pathogenic manifestations of disease--the sets of gene transcripts, proteins and metabolites, along with their putative relationships in the transgenic model and associated wild-type cohort. Our results corroborate previous findings and extend predictions for three processes in atherosclerosis: aberrant lipid metabolism, inflammation, and tissue development and maintenance.</p>","PeriodicalId":87049,"journal":{"name":"Applied bioinformatics","volume":"3 4","pages":"205-17"},"PeriodicalIF":0.0,"publicationDate":"2004-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2165/00822942-200403040-00002","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25118637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Aleksandra Nowicka, Pawel Mackiewicz, Malgorzata Dudkiewicz, Dorota Mackiewicz, Maria Kowalczuk, Joanna Banaszak, Stanislaw Cebrat, Miroslaw R Dudek
{"title":"Representation of mutation pressure and selection pressure by PAM matrices.","authors":"Aleksandra Nowicka, Pawel Mackiewicz, Malgorzata Dudkiewicz, Dorota Mackiewicz, Maria Kowalczuk, Joanna Banaszak, Stanislaw Cebrat, Miroslaw R Dudek","doi":"10.2165/00822942-200403010-00005","DOIUrl":"https://doi.org/10.2165/00822942-200403010-00005","url":null,"abstract":"<p><p>This paper analyses the relationship between the mutation data matrix 1PAM/PET91, representing the effect of both mutation and selection pressures exerted on 16130 homologous proteins of different organisms, and a mutation probability matrix (1PAM/MPM) representing the effect of pure mutation pressure on protein coding of the Borrelia burgdorferi genome. The 1PAM/PMP matrix was derived with the help of computer simulations, which used empirical nucleotide substitution rates found for the B. burgdorferi genome. Here, it is shown that the frequency of amino acid occurrence is strongly related to their effective survival time. We found that the shorter the turnover time of an amino acid under pure mutation pressure, the lower its fraction in the proteins coded by the genome and the more protected by selection pressure is its position in proteins. Results of analyses suggest that during evolution the mutational pressure has been optimised to some extent to the selection requirements.</p>","PeriodicalId":87049,"journal":{"name":"Applied bioinformatics","volume":"3 1","pages":"31-9"},"PeriodicalIF":0.0,"publicationDate":"2004-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2165/00822942-200403010-00005","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25739565","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shang-Jung Lee, James R Mortimer, Donald R Forsdyke
{"title":"Genomic conflict settled in favour of the species rather than the gene at extreme GC percentage values.","authors":"Shang-Jung Lee, James R Mortimer, Donald R Forsdyke","doi":"10.2165/00822942-200403040-00003","DOIUrl":"https://doi.org/10.2165/00822942-200403040-00003","url":null,"abstract":"<p><p>Wada and colleagues have shown that, whether prokaryotic or eukaryotic, each gene has a \"homostabilising propensity\" to adopt a relatively uniform GC percentage (GC%). Accordingly, each gene can be viewed as a \"microisochore\" occupying a discrete GC% niche of relatively uniform base composition amongst its fellow genes. Although first, second and third codon positions usually differ in GC%, each position tends to maintain a uniform, gene-specific GC% value. Thus, within a genome, genic GC% values can cover a wide range. This is most evident at third codon positions, which are least constrained by amino acid encoding needs. In 1991, Wada and colleagues further noted that, within a phylogenetic group, genomic GC% values can also cover a wide range. This is again most evident at third codon positions. Thus, the dispersion of GC% values among genes within a genome matches the dispersion of GC% values among genomes within a phylogenetic group. Wada described the context-independence of plots of different codon position GC% values against total GC% as a \"universal\" characteristic. Several studies relate this to recombination. We have confirmed that third codon positions usually relate more to the genes that contain them than to the species. However, in genomes with extreme GC% values (low or high), third codon positions tend to maintain a constant GC%, thus relating more to the species than to the genes that contain them. Genes in an extreme-GC% genome collectively span a smaller GC% range, and mainly rely on first and second codon positions for differentiation as \"microisochores\". Our results are consistent with the view that differences in GC% serve to recombinationally isolate both genome sectors (facilitating gene duplication) and genomes (facilitating genome duplication, e.g. speciation). In intermediate-GC% genomes, conflict between the needs of the species and the needs of individual genes within that species is minimal. However, in extreme-GC% genomes there is a conflict, which is settled in favour of the species (i.e. group selection) rather than in favour of the gene (genic selection).</p>","PeriodicalId":87049,"journal":{"name":"Applied bioinformatics","volume":"3 4","pages":"219-28"},"PeriodicalIF":0.0,"publicationDate":"2004-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2165/00822942-200403040-00003","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25118638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Renée X de Menezes, Judith M Boer, Hans C van Houwelingen
{"title":"Microarray data analysis: a hierarchical T-test to handle heteroscedasticity.","authors":"Renée X de Menezes, Judith M Boer, Hans C van Houwelingen","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>The analysis of differential gene expression in microarray experiments requires the development of adequate statistical tools. This article describes a simple statistical method for detecting differential expression between two conditions with a low number of replicates. When comparing two group means using a traditional t-test, gene-specific variance estimates are unstable and can lead to wrong conclusions. We construct a likelihood ratio test while modelling these variances hierarchically across all genes, and express it as a t-test statistic. By borrowing information across genes we can take advantage of their large numbers, and still yield a gene-specific test statistic. We show that this hierarchical t-test is more powerful than its traditional version and generates less false positives in a simulation study, especially with small sample sizes. This approach can be extended to cases where there are more than two groups.</p>","PeriodicalId":87049,"journal":{"name":"Applied bioinformatics","volume":"3 4","pages":"229-35"},"PeriodicalIF":0.0,"publicationDate":"2004-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25118639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Te Ren, Mallika Veeramalai, Aik Choon Tan, David Gilbert
{"title":"MSAT: a multiple sequence alignment tool based on TOPS.","authors":"Te Ren, Mallika Veeramalai, Aik Choon Tan, David Gilbert","doi":"10.2165/00822942-200403020-00009","DOIUrl":"https://doi.org/10.2165/00822942-200403020-00009","url":null,"abstract":"<p><p>This article describes the development of a new method for multiple sequence alignment based on fold-level protein structure alignments, which provides an improvement in accuracy compared with the most commonly used sequence-only-based techniques. This method integrates the widely used, progressive multiple sequence alignment approach ClustalW with the Topology of Protein Structure (TOPS) topology-based alignment algorithm. The TOPS approach produces a structural alignment for the input protein set by using a topology-based pattern discovery program, providing a set of matched sequence regions that can be used to guide a sequence alignment using ClustalW. The resulting alignments are more reliable than a sequence-only alignment, as determined by 20-fold cross-validation with a set of 106 protein examples from the CATH database, distributed in seven superfold families. The method is particularly effective for sets of proteins that have similar structures at the fold level but low sequence identity. The aim of this research is to contribute towards bridging the gap between protein sequence and structure analysis, in the hope that this can be used to assist the understanding of the relationship between sequence, structure and function. The tool is available at http://balabio.dcs.gla.ac.uk/msat/.</p>","PeriodicalId":87049,"journal":{"name":"Applied bioinformatics","volume":"3 2-3","pages":"149-58"},"PeriodicalIF":0.0,"publicationDate":"2004-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2165/00822942-200403020-00009","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"24941802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Challenges and opportunities for biological language modelling in biomedical high-throughput genomic and proteomic informatics.","authors":"James Lyons-Weiler","doi":"10.2165/00822942-200403020-00001","DOIUrl":"https://doi.org/10.2165/00822942-200403020-00001","url":null,"abstract":"","PeriodicalId":87049,"journal":{"name":"Applied bioinformatics","volume":"3 2-3","pages":"77-80"},"PeriodicalIF":0.0,"publicationDate":"2004-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2165/00822942-200403020-00001","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"24943059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sabine Bernauer, David Croft, Paul Gardina, Eric Minch, Manuel de Rinaldis, Ivayla Vatcheva
{"title":"Case study: data management strategies in an integrated pathway tool.","authors":"Sabine Bernauer, David Croft, Paul Gardina, Eric Minch, Manuel de Rinaldis, Ivayla Vatcheva","doi":"10.2165/00822942-200403010-00008","DOIUrl":"https://doi.org/10.2165/00822942-200403010-00008","url":null,"abstract":"<p><p>This paper describes the development strategies for an integrated tool to support scientists in the creative exploration of data relating to biochemical pathways. The multiple user groups, diverse functionalities, and many types and sources of data demanded a flexible yet coherent approach. This paper summarises the software requirements and the implied modules and functions, and focuses on the design decisions relevant to the representation, management and flow of data. Finally, several case studies in the use of the software are described and evaluated, and recommendations are made for future work.</p>","PeriodicalId":87049,"journal":{"name":"Applied bioinformatics","volume":"3 1","pages":"63-75"},"PeriodicalIF":0.0,"publicationDate":"2004-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2165/00822942-200403010-00008","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25732258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A probabilistic method to correlate ion pairs with protein thermostability.","authors":"Shir-Ly Huang, Li-Cheng Wu, Hsien-Da Huang, Han-Kuen Liang, Ming-Tat Ko, Jorng-Tzong Horng","doi":"10.2165/00822942-200403010-00004","DOIUrl":"https://doi.org/10.2165/00822942-200403010-00004","url":null,"abstract":"<p><p>Recent developments in research on the stability of proteins - specifically, comparisons of the ion pairs of homologous structures - show that ion pairs potentially contribute to the thermostability of proteins. This study proposes a probabilistic Bayesian statistical method to efficiently predict the thermostability of proteins based on the properties of ion pairs. The experimental results suggest that the numbers, types and bond lengths of ion pairs can be used to predict with high accuracy (up to 80%) the thermostability of functionally similar proteins. The predictions have high precision (99%), especially for hyperthermophilic proteins. Results for proteins with differing functions also indicate that the number of ion pairs is related to the thermostability of proteins, and that predictions of thermostability can also be made for proteins with different functions.</p>","PeriodicalId":87049,"journal":{"name":"Applied bioinformatics","volume":"3 1","pages":"21-9"},"PeriodicalIF":0.0,"publicationDate":"2004-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2165/00822942-200403010-00004","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25739564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"caGEDA: a web application for the integrated analysis of global gene expression patterns in cancer.","authors":"Satish Patel, James Lyons-Weiler","doi":"10.2165/00822942-200403010-00007","DOIUrl":"https://doi.org/10.2165/00822942-200403010-00007","url":null,"abstract":"<p><p>The explosion of microarray data from pilot studies, basic research and large-scale clinical trials requires the development of integrative computational tools that can not only analyse gene expression patterns but that can also evaluate the methods of analysis adopted and then provide a boost to post-analysis translational interpretation of those patterns. We have developed a web application called caGEDA (cancer gene expression data analyzer) that can: (1) upload gene expression profiles from cDNA or oligonucleotide microarrays; (2) conduct a diverse range of serial linear normalisations; (3) identify differentially expressed genes using a variety of tests - either threshold or permutation tests; (4) produce tables of literature references to papers reporting that specific genes (identified by accession numbers) are up- or down-regulated in specific cancers; (5) estimate the error of sample class prediction using the significant gene set for features; (6) perform low-bias and accurate validated learning using three computational validation techniques (leave-one out validation, k-fold validation, random re-sampling validation); and (7) validate a classifier with a randomly selected or user-defined validation set. Significant genes are reported in a table of links to entries in the following databases: Locus Link, Genome View, UCSC, Ensembl, UniGene, dbSNP, AmiGO and OMIM. caGEDA is seamlessly integrated via embedded forms with UCSD's (University of California at San Diego) 2HAPI server (for medical subject heading (MeSH) term exploration) and EZ-Retrieve (to identify common transcription factors located upstream of sets of genes that exhibit similar modes of differential expression). caGEDA offers a variety of previously described and novel tests for differentially expressed genes, most notably the permutation percentile separability test, which is most appropriate for identifying genes that are significantly differentially expressed in a subset of patients. caGEDA, which is open source and free to academic users, will soon be greatly enhanced by operating with the components of the National Cancer Institute's new cancer bioinformatics grid (caBIG).</p>","PeriodicalId":87049,"journal":{"name":"Applied bioinformatics","volume":"3 1","pages":"49-62"},"PeriodicalIF":0.0,"publicationDate":"2004-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2165/00822942-200403010-00007","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25732257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Neural networks for protein classification.","authors":"Wagner Rodrigo Weinert, Heitor Silvério Lopes","doi":"10.2165/00822942-200403010-00006","DOIUrl":"https://doi.org/10.2165/00822942-200403010-00006","url":null,"abstract":"<p><p>This paper describes a biomolecular classification methodology based on multilayer perceptron neural networks. The system developed is used to classify enzymes found in the Protein Data Bank. The primary goal of classification, here, is to infer the function of an (unknown) enzyme by analysing its structural similarity to a given family of enzymes. A new codification scheme was devised to convert the primary structure of enzymes into a real-valued vector. The system was tested with a different number of neural networks, training set sizes and training epochs. For all experiments, the proposed system achieved a higher accuracy rate when compared with profile hidden Markov models. Results demonstrated the robustness of this approach and the possibility of implementing fast and efficient biomolecular classification using neural networks.</p>","PeriodicalId":87049,"journal":{"name":"Applied bioinformatics","volume":"3 1","pages":"41-8"},"PeriodicalIF":0.0,"publicationDate":"2004-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2165/00822942-200403010-00006","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25739566","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}