Computational systems bioinformatics. Computational Systems Bioinformatics Conference最新文献_第5页

Voting algorithms for the motif finding problem. 基序查找问题的投票算法。

Computational systems bioinformatics. Computational Systems Bioinformatics Conference Pub Date : 2008-01-01 DOI: 10.1142/9781848162648_0004

Xiaowen Liu, Bin Ma, Lusheng Wang

{"title":"Voting algorithms for the motif finding problem.","authors":"Xiaowen Liu, Bin Ma, Lusheng Wang","doi":"10.1142/9781848162648_0004","DOIUrl":"https://doi.org/10.1142/9781848162648_0004","url":null,"abstract":"UNLABELLED Finding motifs in many sequences is an important problem in computational biology, especially in identification of regulatory motifs in DNA sequences. Let c be a motif sequence. Given a set of sequences, each is planted with a mutated version of c at an unknown position, the motif finding problem is to find these planted motifs and the original c. In this paper, we study the VM model of the planted motif problem, which is proposed by Pevzner and Sze. We give a simple Selecting One Voting algorithm and a more powerful Selecting k Voting algorithm. When the length of motif and the number of input sequences are large enough, we prove that the two algorithms can find the unknown motif consensus with high probability. In the proof, we show why a large number of input sequences is so important for finding motifs, which is believed by most researchers. Experimental results on simulated data also support the claim. Selecting k Voting algorithm is powerful, but computational intensive. To speed up the algorithm, we propose a progressive filtering algorithm, which improves the running time significantly and has good accuracy in finding motifs. Our experimental results show that Selecting k Voting algorithm with progressive filtering performs very well in practice and it outperforms some best known algorithms. AVAILABILITY The software is available upon request.","PeriodicalId":72665,"journal":{"name":"Computational systems bioinformatics. Computational Systems Bioinformatics Conference","volume":"7 1","pages":"37-47"},"PeriodicalIF":0.0,"publicationDate":"2008-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64001431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Efficient haplotype inference from pedigrees with missing data using linear systems with disjoint-set data structures. 利用线性系统与不相邻集合数据结构，从有缺失数据的血统中高效推断单倍型。

Computational systems bioinformatics. Computational Systems Bioinformatics Conference Pub Date : 2008-01-01

Xin Li, Jing Li

{"title":"Efficient haplotype inference from pedigrees with missing data using linear systems with disjoint-set data structures.","authors":"Xin Li, Jing Li","doi":"","DOIUrl":"","url":null,"abstract":"We study the haplotype inference problem from pedigree data under the zero recombination assumption, which is well supported by real data for tightly linked markers (i.e., single nucleotide polymorphisms (SNPs)) over a relatively large chromosome segment. We solve the problem in a rigorous mathematical manner by formulating genotype constraints as a linear system of inheritance variables. We then utilize disjoint-set structures to encode connectivity information among individuals, to detect constraints from genotypes, and to check consistency of constraints. On a tree pedigree without missing data, our algorithm can output a general solution as well as the number of total specific solutions in a nearly linear time O (mn x alpha(n)), where m is the number of loci, n is the number of individuals and alpha is the inverse Ackermann function, which is a further improvement over existing ones. We also extend the idea to looped pedigrees and pedigrees with missing data by considering existing (partial) constraints on inheritance variables. The algorithm has been implemented in C++ and will be incorporated into our PedPhase package. Experimental results show that it can correctly identify all 0-recombinant solutions with great efficiency. Comparisons with other two popular algorithms show that the proposed algorithm achieves 10 to 10(5)-fold improvements over a variety of parameter settings. The experimental study also provides empirical evidences on the complexity bounds suggested by theoretical analysis.","PeriodicalId":72665,"journal":{"name":"Computational systems bioinformatics. Computational Systems Bioinformatics Conference","volume":"7 ","pages":"297-308"},"PeriodicalIF":0.0,"publicationDate":"2008-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3326667/pdf/nihms231595.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"28336040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A probabilistic coding based quantum genetic algorithm for multiple sequence alignment. 基于概率编码的多序列比对量子遗传算法。

Computational systems bioinformatics. Computational Systems Bioinformatics Conference Pub Date : 2008-01-01

Hongwei Huo, Qiaoluan Xie, Xubang Shen, Vojislav Stojkovic

引用次数: 0

An ORFome assembly approach to metagenomics sequences analysis. ORFome组装方法用于宏基因组序列分析。

Computational systems bioinformatics. Computational Systems Bioinformatics Conference Pub Date : 2008-01-01

Yuzhen Ye, Haixu Tang

{"title":"An ORFome assembly approach to metagenomics sequences analysis.","authors":"Yuzhen Ye, Haixu Tang","doi":"","DOIUrl":"","url":null,"abstract":"Metagenomics is an emerging methodology for the direct genomic analysis of a mixed community of uncultured microorganisms. The current analyses of metagenomics data largely rely on the computational tools originally designed for microbial genomics projects. The challenge of assembling metagenomic sequences arises mainly from the short reads and the high species complexity of the community. Alternatively, individual (short) reads will be searched directly against databases of known genes (or proteins) to identify homologous sequences. The latter approach may have low sensitivity and specificity in identifying homologous sequences, which may further bias the subsequent diversity analysis. In this paper, we present a novel approach to metagenomic data analysis, called Metagenomic ORFome Assembly (MetaORFA). The whole computational framework consists of three steps. Each read from a metagenomics project will first be annotated with putative open reading frames (ORFs) that likely encode proteins. Next, the predicted ORFs are assembled into a collection of peptides using an EULER assembly method. Finally, the assembled peptides (i.e., ORFome) are used for database searching of homologs and subsequent diversity analysis. We applied MetaORFA approach to several metagenomics datasets with low coverage short reads. The results show that MetaORFA can produce long peptides even when the sequence coverage of reads is extremely low. Hence, the ORFome assembly significantly increased the sensitivity of homology searching, and may potentially improve the diversity analysis of the metagenomic data. This improvement is especially useful for the metagenomic projects when the genome assembly does not work because of the low sequence coverage.","PeriodicalId":72665,"journal":{"name":"Computational systems bioinformatics. Computational Systems Bioinformatics Conference","volume":"7 ","pages":"3-13"},"PeriodicalIF":0.0,"publicationDate":"2008-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"28411958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Hausdorff-based NOE assignment algorithm using protein backbone determined from residual dipolar couplings and rotamer patterns. 基于hausdorff的NOE分配算法，利用剩余偶极偶联和旋转体模式确定蛋白质骨架。

Computational systems bioinformatics. Computational Systems Bioinformatics Conference Pub Date : 2008-01-01

Jianyang Zeng, Chittaranjan Tripathy, Pei Zhou, Bruce R Donald

{"title":"A Hausdorff-based NOE assignment algorithm using protein backbone determined from residual dipolar couplings and rotamer patterns.","authors":"Jianyang Zeng, Chittaranjan Tripathy, Pei Zhou, Bruce R Donald","doi":"","DOIUrl":"","url":null,"abstract":"High-throughput structure determination based on solution Nuclear Magnetic Resonance (NMR) spectroscopy plays an important role in structural genomics. One of the main bottlenecks in NMR structure determination is the interpretation of NMR data to obtain a sufficient number of accurate distance restraints by assigning nuclear Overhauser effect (NOE) spectral peaks to pairs of protons. The difficulty in automated NOE assignment mainly lies in the ambiguities arising both from the resonance degeneracy of chemical shifts and from the uncertainty due to experimental errors in NOE peak positions. In this paper we present a novel NOE assignment algorithm, called HAusdorff-based NOE Assignment (HANA), that starts with a high-resolution protein backbone computed using only two residual dipolar couplings (RDCs) per residue, employs a Hausdorff-based pattern matching technique to deduce similarity between experimental and back-computed NOE spectra for each rotamer from a statistically diverse library, and drives the selection of optimal position-specific rotamers for filtering ambiguous NOE assignments. Our algorithm runs in time O(tn3 + tn log t), where t is the maximum number of rotamers per residue and n is the size of the protein. Application of our algorithm on biological NMR data for three proteins, namely, human ubiquitin, the zinc finger domain of the human DNA Y-polymerase Eta (pol eta) and the human Set2-Rpb1 interacting domain (hSRI) demonstrates that our algorithm overcomes spectral noise to achieve more than 90% assignment accuracy. Additionally, the final structures calculated using our automated NOE assignments have backbone RMSD < 1.7 A and all-heavy-atom RMSD < 2.5 A from reference structures that were determined either by X-ray crystallography or traditional NMR approaches. These results show that our NOE assignment algorithm can be successfully applied to protein NMR spectra to obtain high-quality structures.","PeriodicalId":72665,"journal":{"name":"Computational systems bioinformatics. Computational Systems Bioinformatics Conference","volume":"7 ","pages":"169-81"},"PeriodicalIF":0.0,"publicationDate":"2008-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"28337242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

On the accurate construction of consensus genetic maps. 论共识遗传图谱的准确构建。

Computational systems bioinformatics. Computational Systems Bioinformatics Conference Pub Date : 2008-01-01 DOI: 10.1142/9781848162648_0025

Yonghui Wu, T. Close, S. Lonardi

引用次数: 64

Proceedings of Computational Systems Bioinformatics 2008. August 26-29, 2008. Palo Alto, California, USA. 计算系统生物信息学学报2008。2008年8月26日至29日。美国加州帕洛阿尔托。

Computational systems bioinformatics. Computational Systems Bioinformatics Conference Pub Date : 2008-01-01

引用次数: 0

Improving homology models for protein-ligand binding sites. 改进蛋白质-配体结合位点的同源性模型。

Computational systems bioinformatics. Computational Systems Bioinformatics Conference Pub Date : 2008-01-01

Chris Kauffman, Huzefa Rangwala, George Karypis

引用次数: 0

Designing secondary structure profiles for fast ncRNA identification. 设计用于ncRNA快速鉴定的二级结构谱。

Computational systems bioinformatics. Computational Systems Bioinformatics Conference Pub Date : 2008-01-01 DOI: 10.1142/9781848162648_0013

Yanni Sun, J. Buhler

{"title":"Designing secondary structure profiles for fast ncRNA identification.","authors":"Yanni Sun, J. Buhler","doi":"10.1142/9781848162648_0013","DOIUrl":"https://doi.org/10.1142/9781848162648_0013","url":null,"abstract":"Detecting non-coding RNAs (ncRNAs) in genomic DNA is an important part of annotation. However, the most widely used tool for modeling ncRNA families, the covariance model (CM), incurs a high computational cost when used for search. This cost can be reduced by using a filter to exclude sequence that is unlikely to contain the ncRNA of interest, applying the CM only where it is likely to match strongly. Despite recent advances, designing an efficient filter that can detect nearly all ncRNA instances while excluding most irrelevant sequences remains challenging. This work proposes a systematic procedure to convert a CM for an ncRNA family to a secondary structure profile (SSP), which augments a conservation profile with secondary structure information but can still be efficiently scanned against long sequences. We use dynamic programming to estimate an SSP's sensitivity and FP rate, yielding an efficient, fully automated filter design algorithm. Our experiments demonstrate that designed SSP filters can achieve significant speedup over unfiltered CM search while maintaining high sensitivity for various ncRNA families, including those with and without strong sequence conservation. For highly structured ncRNA families, including secondary structure conservation yields better performance than using primary sequence conservation alone.","PeriodicalId":72665,"journal":{"name":"Computational systems bioinformatics. Computational Systems Bioinformatics Conference","volume":"7 1","pages":"145-56"},"PeriodicalIF":0.0,"publicationDate":"2008-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64003418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Combining sequence and structural profiles for protein solvent accessibility prediction. 结合序列和结构特征预测蛋白质溶剂可及性。

Computational systems bioinformatics. Computational Systems Bioinformatics Conference Pub Date : 2008-01-01 DOI: 10.1142/9781848162648_0017

R. Bondugula, Dong Xu

引用次数: 8