Applied bioinformatics最新文献_第10页

Natively disordered proteins: functions and predictions. 天然无序蛋白:功能和预测。

Applied bioinformatics Pub Date : 2004-01-01 DOI: 10.2165/00822942-200403020-00005

Pedro Romero, Zoran Obradovic, A Keith Dunker

{"title":"Natively disordered proteins: functions and predictions.","authors":"Pedro Romero, Zoran Obradovic, A Keith Dunker","doi":"10.2165/00822942-200403020-00005","DOIUrl":"https://doi.org/10.2165/00822942-200403020-00005","url":null,"abstract":"Proteins can exist in at least three forms: the ordered form (solid-like), the partially folded form (collapsed, molten globule-like or liquid-like) and the extended form (extended, random coil-like or gas-like). The protein trinity hypothesis has two components: (i) a given native protein can be in any one of the three forms, depending on the sequence and the environment; and (ii) function can arise from any one of the three forms or from transitions between them. In this study, bioinformatics and data mining were used to investigate intrinsic disorder in proteins and develop neural network-based predictors of natural disordered regions (PONDR) that can discriminate between ordered and disordered residues with up to 84% accuracy. Predictions of intrinsic disorder indicate that the three kingdoms follow the disorder ranking eubacteria < archaebacteria << eukaryotes, with approximately half of eukaryotic proteins predicted to contain substantial regions of intrinsic disorder. Many of the known disordered regions are involved in signalling, regulation or control. Involvement of highly flexible or disordered regions in signalling is logical: a flexible sensor more readily undergoes conformational change in response to environmental perturbations than does a rigid one. Thus, the increased disorder in the eukaryotes is likely the direct result of an increased need for signalling and regulation in nucleated organisms. PONDR can also be used to detect molecular recognition elements that are disordered in the unbound state and become structured when bound to a biologically meaningful partner. Application of disorder predictions to cell-signalling, cancer-associated and control protein databases supports the widespread occurrence of protein disorder in these processes.","PeriodicalId":87049,"journal":{"name":"Applied bioinformatics","volume":"3 2-3","pages":"105-13"},"PeriodicalIF":0.0,"publicationDate":"2004-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2165/00822942-200403020-00005","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"24941798","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A sequence alignment-independent method for protein classification. 一种与序列比对无关的蛋白质分类方法。

Applied bioinformatics Pub Date : 2004-01-01 DOI: 10.2165/00822942-200403020-00008

John K Vries, Rajan Munshi, Dror Tobi, Judith Klein-Seetharaman, Panayiotis V Benos, Ivet Bahar

{"title":"A sequence alignment-independent method for protein classification.","authors":"John K Vries, Rajan Munshi, Dror Tobi, Judith Klein-Seetharaman, Panayiotis V Benos, Ivet Bahar","doi":"10.2165/00822942-200403020-00008","DOIUrl":"https://doi.org/10.2165/00822942-200403020-00008","url":null,"abstract":"Annotation of the rapidly accumulating body of sequence data relies heavily on the detection of remote homologues and functional motifs in protein families. The most popular methods rely on sequence alignment. These include programs that use a scoring matrix to compare the probability of a potential alignment with random chance and programs that use curated multiple alignments to train profile hidden Markov models (HMMs). Related approaches depend on bootstrapping multiple alignments from a single sequence. However, alignment-based programs have limitations. They make the assumption that contiguity is conserved between homologous segments, which may not be true in genetic recombination or horizontal transfer. Alignments also become ambiguous when sequence similarity drops below 40%. This has kindled interest in classification methods that do not rely on alignment. An approach to classification without alignment based on the distribution of contiguous sequences of four amino acids (4-grams) was developed. Interest in 4-grams stemmed from the observation that almost all theoretically possible 4-grams (20(4)) occur in natural sequences and the majority of 4-grams are uniformly distributed. This implies that the probability of finding identical 4-grams by random chance in unrelated sequences is low. A Bayesian probabilistic model was developed to test this hypothesis. For each protein family in Pfam-A and PIR-PSD, a feature vector called a probe was constructed from the set of 4-grams that best characterised the family. In rigorous jackknife tests, unknown sequences from Pfam-A and PIR-PSD were compared with the probes for each family. A classification result was deemed a true positive if the probe match with the highest probability was in first place in a rank-ordered list. This was achieved in 70% of cases. Analysis of false positives suggested that the precision might approach 85% if selected families were clustered into subsets. Case studies indicated that the 4-grams in common between an unknown and the best matching probe correlated with functional motifs from PRINTS. The results showed that remote homologues and functional motifs could be identified from an analysis of 4-gram patterns.","PeriodicalId":87049,"journal":{"name":"Applied bioinformatics","volume":"3 2-3","pages":"137-48"},"PeriodicalIF":0.0,"publicationDate":"2004-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2165/00822942-200403020-00008","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"24941801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 23

Gene structure prediction using an orthologous gene of known exon-intron structure. 利用已知外显子-内含子结构的同源基因进行基因结构预测。

Applied bioinformatics Pub Date : 2004-01-01 DOI: 10.2165/00822942-200403020-00002

Stephanie Seneff, Chao Wang, Christopher B Burge

{"title":"Gene structure prediction using an orthologous gene of known exon-intron structure.","authors":"Stephanie Seneff, Chao Wang, Christopher B Burge","doi":"10.2165/00822942-200403020-00002","DOIUrl":"https://doi.org/10.2165/00822942-200403020-00002","url":null,"abstract":"Given the availability of complete genome sequences from related organisms, sequence conservation can provide important clues for predicting gene structure. In particular, one should be able to leverage information about known genes in one species to help determine the structures of related genes in another. Such an approach is appealing in that high-quality gene prediction can be achieved for newly sequenced species, such as mouse and puffer fish, using the extensive knowledge that has been accumulated about human genes. This article reports a novel approach to predicting the exon-intron structures of mouse genes by incorporating constraints from orthologous human genes using techniques that have previously been exploited in speech and natural language processing applications. The approach uses a context-free grammar to parse a training corpus of annotated human genes. A statistical training procedure produces a weighted recursive transition network (RTN) intended to capture the general features of a mammalian gene. This RTN is expanded into a finite state transducer (FST) and composed with an FST capturing the specific features of the human orthologue. This model includes a trigram language model on the amino acid sequence as well as exon length constraints. A final stage uses the free software package ClustalW to align the top n candidates in the search space. For a set of 98 orthologous human-mouse pairs, we achieved 96% sensitivity and 97% specificity at the exon level on the mouse genes, given only knowledge gleaned from the annotated human genome.","PeriodicalId":87049,"journal":{"name":"Applied bioinformatics","volume":"3 2-3","pages":"81-90"},"PeriodicalIF":0.0,"publicationDate":"2004-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2165/00822942-200403020-00002","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"24943060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

KMD: an open-source port of the ArrayExpress microarray database. KMD: ArrayExpress微阵列数据库的开源端口。

Applied bioinformatics Pub Date : 2004-01-01 DOI: 10.2165/00822942-200403040-00008

Jean-Pierre Mainguy, Grant Macdonnell, Stefan Bund, David L Wild

引用次数: 1

Biosphere: the interoperation of web services in microarray cluster analysis. 生物圈:微阵列聚类分析中网络服务的互操作。

Applied bioinformatics Pub Date : 2004-01-01 DOI: 10.2165/00822942-200403040-00007

Kei-Hoi Cheung, Remko de Knikker, Youjun Guo, Guoneng Zhong, Janet Hager, Kevin Y Yip, Albert K H Kwan, Peter Li, David W Cheung

{"title":"Biosphere: the interoperation of web services in microarray cluster analysis.","authors":"Kei-Hoi Cheung, Remko de Knikker, Youjun Guo, Guoneng Zhong, Janet Hager, Kevin Y Yip, Albert K H Kwan, Peter Li, David W Cheung","doi":"10.2165/00822942-200403040-00007","DOIUrl":"https://doi.org/10.2165/00822942-200403040-00007","url":null,"abstract":"Unlabelled: The growing use of DNA microarrays in biomedical research has led to the proliferation of analysis tools. These software programs address different aspects of analysis (e.g. normalisation and clustering within and across individual arrays) as well as extended analysis methods (e.g. clustering, annotation and mining of multiple datasets). Therefore, microarray data analysis typically requires the interoperability of multiple software programs involving different analysis types and methods. Such interoperation is often hampered by the heterogeneity inherent in the software tools (which may function by implementing different interfaces and using different programming languages). To address this problem, we employed the simple object access protocol (SOAP)-based web service approach that provides a uniform programmatic interface to these heterogeneous software components. To demonstrate this approach in the microarray context, we created a web server application, Biosphere, which interoperates a number of web services that are geographically widely distributed. These web services include a clustering web service, which is a suite of different clustering algorithms for analysing microarray data; XEMBL, developed at the European Bioinformatics Institute (EBI) for retrieving EMBL Nucleotide Sequence Database sequence data; and three gene annotation web services: GetGO, GetHAPI and GetUMLS. GetGO allows retrieval of Gene Ontology (GO) annotation, and the other two web services retrieve annotation from the biomedical literature that is indexed based on the Medical Subject Headings (MeSH) terms. With these web services, Biosphere allows the users to do the following: (i) cluster gene expression data using seven different algorithms; (ii) visualise the clustering results that are grouped statistically in colour; and (iii) retrieve sequence, annotation and citation data for the genes of interest.Availability: Biosphere and its web services described in Web Service Description Language (WSDL) can be accessed at http://rook.cecid.hku.hk:8280/BiosphereServer.","PeriodicalId":87049,"journal":{"name":"Applied bioinformatics","volume":"3 4","pages":"253-6"},"PeriodicalIF":0.0,"publicationDate":"2004-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2165/00822942-200403040-00007","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25118642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 25

Beyond average protein secondary structure content prediction using FTIR spectroscopy. 利用FTIR光谱预测蛋白质二级结构含量。

Applied bioinformatics Pub Date : 2004-01-01 DOI: 10.2165/00822942-200403010-00003

Joachim A Hering, Peter R Innocent, Parvez I Haris

{"title":"Beyond average protein secondary structure content prediction using FTIR spectroscopy.","authors":"Joachim A Hering, Peter R Innocent, Parvez I Haris","doi":"10.2165/00822942-200403010-00003","DOIUrl":"https://doi.org/10.2165/00822942-200403010-00003","url":null,"abstract":"This paper demonstrates that secondary structure information beyond purely protein secondary structure content can be predicted from FTIR (Fourier transform infrared spectroscopy) spectra of proteins with a high degree of accuracy. Both neural networks and adaptive neuro-fuzzy inference systems (ANFISs) were employed to predict helix/sheet segment information. The best results were achieved using ANFISs with fuzzy subtractive clustering based on normalised, compressed amide I data with an average SEP (standard error of prediction, root mean of squared errors) of 1.51. Predictions for average helix/sheet length based merely on the amide I band maximum position in combination with the full-width at half-height resulted in a comparable average SEP of 1.62. This suggests the importance of information on the position and width of the amide I band maximum for the prediction of helix/sheet segment information. Finally, the most promising pattern recognition approaches found in this study were applied to a protein with an as yet unknown x-ray structure: native a1-antichymotrypsin (a1-ACT).","PeriodicalId":87049,"journal":{"name":"Applied bioinformatics","volume":"3 1","pages":"9-20"},"PeriodicalIF":0.0,"publicationDate":"2004-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2165/00822942-200403010-00003","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25739563","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Inferring property selection pressure from positional residue conservation. 由位置剩余守恒推断性质选择压力。

Applied bioinformatics Pub Date : 2004-01-01 DOI: 10.2165/00822942-200403020-00011

Rose Hoberman, Judith Klein-Seetharaman, Roni Rosenfeld

引用次数: 6

Five hierarchical levels of sequence-structure correlation in proteins. 蛋白质序列-结构相关的五个层次。

Applied bioinformatics Pub Date : 2004-01-01 DOI: 10.2165/00822942-200403020-00004

Christopher Bystroff, Yu Shao, Xin Yuan

{"title":"Five hierarchical levels of sequence-structure correlation in proteins.","authors":"Christopher Bystroff, Yu Shao, Xin Yuan","doi":"10.2165/00822942-200403020-00004","DOIUrl":"https://doi.org/10.2165/00822942-200403020-00004","url":null,"abstract":"This article reviews recent work towards modelling protein folding pathways using a bioinformatics approach. Statistical models have been developed for sequence-structure correlations in proteins at five levels of structural complexity: (i) short motifs; (ii) extended motifs; (iii) nonlocal pairs of motifs; (iv) 3-dimensional arrangements of multiple motifs; and (v) global structural homology. We review statistical models, including sequence profiles, hidden Markov models (HMMs) and interaction potentials, for the first four levels of structural detail. The I-sites (folding Initiation sites) Library models short local structure motifs. Each succeeding level has a statistical model, as follows: HMMSTR (HMM for STRucture) is an HMM for extended motifs; HMMSTR-CM (Contact Maps) is a model for pairwise interactions between motifs; and SCALI-HMM (HMMs for Structural Core ALIgnments) is a set of HMMs for the spatial arrangements of motifs. The parallels between the statistical models and theoretical models for folding pathways are discussed in this article; however, global sequence models are not discussed because they have been extensively reviewed elsewhere. The data used and algorithms presented in this article are available at http://www.bioinfo.rpi.edu/~bystrc/ (click on \"servers\" or \"downloads\") or by request to bystrc@rpi.edu .","PeriodicalId":87049,"journal":{"name":"Applied bioinformatics","volume":"3 2-3","pages":"97-104"},"PeriodicalIF":0.0,"publicationDate":"2004-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2165/00822942-200403020-00004","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"24941797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Extracting hydrogen-bond signature patterns from protein structure data. 从蛋白质结构数据中提取氢键特征模式。

Applied bioinformatics Pub Date : 2004-01-01 DOI: 10.2165/00822942-200403020-00007

Tejasvini Prasad, Tamilselvi Subramanian, Sridhar Hariharaputran, H S Chaitra, Nagasuma Chandra

{"title":"Extracting hydrogen-bond signature patterns from protein structure data.","authors":"Tejasvini Prasad, Tamilselvi Subramanian, Sridhar Hariharaputran, H S Chaitra, Nagasuma Chandra","doi":"10.2165/00822942-200403020-00007","DOIUrl":"https://doi.org/10.2165/00822942-200403020-00007","url":null,"abstract":"Classification of protein sequences and structures into families is a fundamental task in biology, and it is often used as a basis for designing experiments for gaining further knowledge. Some relationships between proteins are detected by the similarities in their sequences, and many more by the similarities in their structures. Despite this, there are a number of examples of functionally similar molecules without any recognisable sequence or structure similarities, and there are also a number of protein molecules that share common structural scaffolds but exhibit different functions. Newer methods of comparing molecules are required in order to detect similarities and dissimilarities in protein molecules. In this article, it is proposed that the precise 3-dimensional disposition of key residues in a protein molecule is what matters for its function, or what conveys the \"meaning\" for a biological system, but not what means it uses to achieve this. The concept of comparing two molecules through their intramolecular interaction networks is explored, since these networks dictate the disposition of amino acids in a protein structure. First, signature patterns, or fingerprints, of interaction networks in pre-classified protein structural families are computed using an approach to find structural equivalences and consensus hydrogen bonds. Five examples from different structural classes are illustrated. These patterns are then used to search the entire Protein Data Bank, an approach through which new, unexpected similarities have been found. The potential for finding relationships through this approach is highlighted. The use of hydrogen-bond fingerprints as a new metric for measuring similarities in protein structures is also described.","PeriodicalId":87049,"journal":{"name":"Applied bioinformatics","volume":"3 2-3","pages":"125-35"},"PeriodicalIF":0.0,"publicationDate":"2004-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2165/00822942-200403020-00007","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"24941800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Reusing microarrays within closely related species: experimental validation through phylogenetic inference. 在密切相关的物种中重复使用微阵列:通过系统发育推断的实验验证。

Applied bioinformatics Pub Date : 2004-01-01 DOI: 10.2165/00822942-200403020-00003

Deepika Jagan, Gautam B Singh

引用次数: 0