Proceedings. IEEE Computational Systems Bioinformatics Conference最新文献

筛选
英文 中文
Shannon information in complete genomes. 完整基因组中的香农信息。
Proceedings. IEEE Computational Systems Bioinformatics Conference Pub Date : 2004-08-16 DOI: 10.1109/CSB.2004.153
Chang-Heng Chang, L. Hsieh, T. Chen, Hong-Da Chen, L. Luo, Hoong-Chien Lee
{"title":"Shannon information in complete genomes.","authors":"Chang-Heng Chang, L. Hsieh, T. Chen, Hong-Da Chen, L. Luo, Hoong-Chien Lee","doi":"10.1109/CSB.2004.153","DOIUrl":"https://doi.org/10.1109/CSB.2004.153","url":null,"abstract":"Shannon information in the genomes of all completely sequenced prokaryotes and eukaryotes are measured in word lengths of two to ten letters. It is found that in a scale-dependent way, the Shannon information in complete genomes are much greater than that in matching random sequences - thousands of times greater in the case of short words. Furthermore, with the exception of the 14 chromosomes of Plasmodium falciparum, the Shannon information in all available complete genomes belong to a universality class given by an extremely simple formula. The data are consistent with a model for genome growth composed of two main ingredients: random segmental duplications that increase the Shannon information in a scale-independent way, and random point mutations that preferentially reduces the larger-scale Shannon information. The inference drawn from the present study is that the large-scale and coarse-grained growth of genomes was selectively neutral and this suggests an independent corroboration of Kimura's neutral theory of evolution.","PeriodicalId":87417,"journal":{"name":"Proceedings. IEEE Computational Systems Bioinformatics Conference","volume":"1 1","pages":"20-30"},"PeriodicalIF":0.0,"publicationDate":"2004-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62215018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Inverse Protein Folding in 2D HP Mode (Extended Abstract) 二维HP模式下的蛋白质逆向折叠(扩展摘要)
Proceedings. IEEE Computational Systems Bioinformatics Conference Pub Date : 2004-08-16 DOI: 10.1109/CSB.2004.1332444
Arvind Gupta, Ján Manuch, L. Stacho
{"title":"Inverse Protein Folding in 2D HP Mode (Extended Abstract)","authors":"Arvind Gupta, Ján Manuch, L. Stacho","doi":"10.1109/CSB.2004.1332444","DOIUrl":"https://doi.org/10.1109/CSB.2004.1332444","url":null,"abstract":"The inverse protein folding problem is that of designing an amino acid sequence which has a particular native protein fold. This problem arises in drug design where a particular structure is necessary to ensure proper protein-protein interactions. In this paper we show that in the 2D HP model of Dill it is possible to solve this problem for a broad class of structures. These structures can be used to closely approximate any given structure. One of the most important properties of a good protein is its stability -- the aptitude not to fold simultanously into other structures. We show that for a number of basic structures, our sequences have a unique fold.","PeriodicalId":87417,"journal":{"name":"Proceedings. IEEE Computational Systems Bioinformatics Conference","volume":"1 1","pages":"311-8"},"PeriodicalIF":0.0,"publicationDate":"2004-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/CSB.2004.1332444","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62215002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Shannon information in complete genomes. 完整基因组中的香农信息。
Proceedings. IEEE Computational Systems Bioinformatics Conference Pub Date : 2004-01-01 DOI: 10.1109/csb.2004.1332413
Chang-Heng Chang, Li-Ching Hsieh, Ta-Yuan Chen, Hong-Da Chen, Liaofu Luo, Hoong-Chien Lee
{"title":"Shannon information in complete genomes.","authors":"Chang-Heng Chang,&nbsp;Li-Ching Hsieh,&nbsp;Ta-Yuan Chen,&nbsp;Hong-Da Chen,&nbsp;Liaofu Luo,&nbsp;Hoong-Chien Lee","doi":"10.1109/csb.2004.1332413","DOIUrl":"https://doi.org/10.1109/csb.2004.1332413","url":null,"abstract":"<p><p>Shannon information in the genomes of all completely sequenced prokaryotes and eukaryotes are measured in word lengths of two to ten letters. It is found that in a scale-dependent way, the Shannon information in complete genomes are much greater than that in matching random sequences - thousands of times greater in the case of short words. Furthermore, with the exception of the 14 chromosomes of Plasmodium falciparum, the Shannon information in all available complete genomes belong to a universality class given by an extremely simple formula. The data are consistent with a model for genome growth composed of two main ingredients: random segmental duplications that increase the Shannon information in a scale-independent way, and random point mutations that preferentially reduces the larger-scale Shannon information. The inference drawn from the present study is that the large-scale and coarse-grained growth of genomes was selectively neutral and this suggests an independent corroboration of Kimura's neutral theory of evolution.</p>","PeriodicalId":87417,"journal":{"name":"Proceedings. IEEE Computational Systems Bioinformatics Conference","volume":" ","pages":"20-30"},"PeriodicalIF":0.0,"publicationDate":"2004-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/csb.2004.1332413","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25829769","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Algorithms for association study design using a generalized model of haplotype conservation. 基于广义单倍型守恒模型的关联研究设计算法。
Russell Schwartz
{"title":"Algorithms for association study design using a generalized model of haplotype conservation.","authors":"Russell Schwartz","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>There is considerable interest in computational methods to assist in the use of genetic polymorphism data for locating disease-related genes. Haplotypes, contiguous sets of correlated variants, may provide a means of reducing the difficulty of the data analysis problems involved. The field to date has been dominated by methods based on the \"haplotype block\" hypothesis, which assumes discrete population-wide boundaries between conserved genetic segments, but there is strong reason to believe that haplotype blocks do not fully capture true haplotype conservation patterns. In this paper, we address the computational challenges of using a more flexible, block-free representation of haplotype structure called the \"haplotype motif\" model for downstream analysis problems. We develop algorithms for htSNP selection and missing data inference using this more generalized model of sequence conservation. Application to a dataset from the literature demonstrates the practical value of these block-free methods.</p>","PeriodicalId":87417,"journal":{"name":"Proceedings. IEEE Computational Systems Bioinformatics Conference","volume":" ","pages":"90-7"},"PeriodicalIF":0.0,"publicationDate":"2004-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25829776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SPIDER: software for protein identification from sequence tags with de novo sequencing error. 蜘蛛:软件蛋白质鉴定从序列标签与从头测序错误。
Proceedings. IEEE Computational Systems Bioinformatics Conference Pub Date : 2004-01-01 DOI: 10.1109/csb.2004.1332434
Yonghua Han, Bin Ma, Kaizhong Zhang
{"title":"SPIDER: software for protein identification from sequence tags with de novo sequencing error.","authors":"Yonghua Han,&nbsp;Bin Ma,&nbsp;Kaizhong Zhang","doi":"10.1109/csb.2004.1332434","DOIUrl":"https://doi.org/10.1109/csb.2004.1332434","url":null,"abstract":"<p><p>For the identification of novel proteins using MS/MS, de novo sequencing software computes one or several possible amino acid sequences (called sequence tags) for each MS/MS spectrum. Those tags are then used to match, accounting amino acid mutations, the sequences in a protein database. If the de novo sequencing gives correct tags, the homologs of the proteins can be identified by this approach and software such as MS-BLAST is available for the matching. However, de novo sequencing very often gives only partially correct tags. The most common error is that a segment of amino acids is replaced by another segment with approximately the same masses. We developed a new efficient algorithm to match sequence tags with errors to database sequences for the purpose of protein and peptide identification. A software package, SPIDER, was developed and made available on Internet for free public use. This paper describes the algorithms and features of the SPIDER software.</p>","PeriodicalId":87417,"journal":{"name":"Proceedings. IEEE Computational Systems Bioinformatics Conference","volume":" ","pages":"206-15"},"PeriodicalIF":0.0,"publicationDate":"2004-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/csb.2004.1332434","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25829589","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
MinPD: distance-based phylogenetic analysis and recombination detection of serially-sampled HIV quasispecies. MinPD:基于距离的HIV准种系统发育分析与重组检测。
Patricia Buendia, Giri Narasimhan
{"title":"MinPD: distance-based phylogenetic analysis and recombination detection of serially-sampled HIV quasispecies.","authors":"Patricia Buendia,&nbsp;Giri Narasimhan","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>A new computational method to study within-host viral evolution is explored to better understand the evolution and pathogenesis of viruses. Traditional phylogenetic tree methods are better suited to study relationships between contemporaneous species, which appear as leaves of a phylogenetic tree. However, viral sequences are often sampled serially from a single host. Consequently, data may be available at the leaves as well as the internal nodes of a phylogenetic tree. Recombination may further complicate the analysis. Such relationships are not easily expressed by traditional phylogenetic methods. We propose a new algorithm, called MinPD, based on minimum pairwise distances. Our algorithm uses multiple distance matrices and correlation rules to output a MinPD tree or network. We test our algorithm using extensive simmulations and apply it to a set of HIV sequence data isolated from one patient over a period of ten years. The proposed visualization of the phylogenetic treenetwork further enhances the benefits of our methods.</p>","PeriodicalId":87417,"journal":{"name":"Proceedings. IEEE Computational Systems Bioinformatics Conference","volume":" ","pages":"110-9"},"PeriodicalIF":0.0,"publicationDate":"2004-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3195421/pdf/nihms326150.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25829708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Comparison of two schemes for automatic keyword extraction from MEDLINE for functional gene clustering. MEDLINE功能基因聚类关键词自动提取两种方案的比较。
Proceedings. IEEE Computational Systems Bioinformatics Conference Pub Date : 2004-01-01 DOI: 10.1109/csb.2004.1332452
Ying Liu, Brian J Ciliax, Karin Borges, Venu Dasigi, Ashwin Ram, Shamkant B Navathe, Ray Dingledine
{"title":"Comparison of two schemes for automatic keyword extraction from MEDLINE for functional gene clustering.","authors":"Ying Liu,&nbsp;Brian J Ciliax,&nbsp;Karin Borges,&nbsp;Venu Dasigi,&nbsp;Ashwin Ram,&nbsp;Shamkant B Navathe,&nbsp;Ray Dingledine","doi":"10.1109/csb.2004.1332452","DOIUrl":"https://doi.org/10.1109/csb.2004.1332452","url":null,"abstract":"<p><p>One of the key challenges of microarray studies is to derive biological insights from the unprecedented quatities of data on gene-expression patterns. Clustering genes by functional keyword association can provide direct information about the nature of the functional links among genes within the derived clusters. However, the quality of the keyword lists extracted from biomedical literature for each gene significantly affects the clustering results. We extracted keywords from MEDLINE that describes the most prominent functions of the genes, and used the resulting weights of the keywords as feature vectors for gene clustering. By analyzing the resulting cluster quality, we compared two keyword weighting schemes: normalized z-score and term frequency-inverse document frequency (TFIDF). The best combination of background comparison set, stop list and stemming algorithm was selected based on precision and recall metrics. In a test set of four known gene groups, a hierarchical algorithm correctly assigned 25 of 26 genes to the appropriate clusters based on keywords extracted by the TDFIDF weighting scheme, but only 23 og 26 with the z-score method. To evaluate the effectiveness of the weighting schemes for keyword extraction for gene clusters from microarray profiles, 44 yeast genes that are differentially expressed during the cell cycle were used as a second test set. Using established measures of cluster quality, the results produced from TFIDF-weighted keywords had higher purity, lower entropy, and higher mutual information than those produced from normalized z-score weighted keywords. The optimized algorithms should be useful for sorting genes from microarray lists into functionally discrete clusters.</p>","PeriodicalId":87417,"journal":{"name":"Proceedings. IEEE Computational Systems Bioinformatics Conference","volume":" ","pages":"394-404"},"PeriodicalIF":0.0,"publicationDate":"2004-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/csb.2004.1332452","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25830003","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Calculation, visualization, and manipulation of MASTs (Maximum Agreement Subtrees). 计算,可视化和桅杆(最大协议子树)的操作。
Proceedings. IEEE Computational Systems Bioinformatics Conference Pub Date : 2004-01-01 DOI: 10.1109/csb.2004.1332453
Shiming Dong, Eileen Kraemer
{"title":"Calculation, visualization, and manipulation of MASTs (Maximum Agreement Subtrees).","authors":"Shiming Dong,&nbsp;Eileen Kraemer","doi":"10.1109/csb.2004.1332453","DOIUrl":"https://doi.org/10.1109/csb.2004.1332453","url":null,"abstract":"<p><strong>Unlabelled: </strong>Phylogenetic trees are used to represent the evolutionary history of a set of species. Comparison of multiple phylogenetic trees can help researchers find the common classification of a tree group, compare tree construction inferences or obtain distances between trees. We present TreeAnalyzer, a freely available package for phylogenetic tree comparison. A MAST (Maximum Agreement Subtree) algorithm is implemented to compare the trees. Additional features of this software include tree comparison, visualization, manipulation, labeling, and printing.</p><p><strong>Availability: </strong>http://www.cs.uga.edu/~eileen/TreeAnalyzer.</p>","PeriodicalId":87417,"journal":{"name":"Proceedings. IEEE Computational Systems Bioinformatics Conference","volume":" ","pages":"405-14"},"PeriodicalIF":0.0,"publicationDate":"2004-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/csb.2004.1332453","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25830004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Minimum entropy clustering and applications to gene expression analysis. 最小熵聚类及其在基因表达分析中的应用。
Proceedings. IEEE Computational Systems Bioinformatics Conference Pub Date : 2004-01-01 DOI: 10.1109/csb.2004.1332427
Haifeng Li, Keshu Zhang, Tao Jiang
{"title":"Minimum entropy clustering and applications to gene expression analysis.","authors":"Haifeng Li,&nbsp;Keshu Zhang,&nbsp;Tao Jiang","doi":"10.1109/csb.2004.1332427","DOIUrl":"https://doi.org/10.1109/csb.2004.1332427","url":null,"abstract":"<p><p>Clustering is a common methodology for analyzing the gene expression data. In this paper, we present a new clustering algorithm from an information-theoretic point of view. First, we propose the minimum entropy (measured on a posteriori probabilities) criterion, which is the conditional entropy of clusters given the observations. Fano's inequality indicates that it could be a good criterion for clustering. We generalize the criterion by replacing Shannon's entropy with Havrda-Charvat's structural alpha-entropy. Interestingly, the minimum entropy criterion based on structural alpha-entropy is equal to the probability error of the nearest neighbor method when alpha = 2. This is another evidence that the proposed criterion is good for clustering. With a non-parametric approach for estimating a posteriori probabilities, an efficient iterative algorithm is then established to minimize the entropy. The experimental results show that the clustering algorithm performs significantly better than k-means/medians, hierarchical clustering, SOM, and EM in terms of adjusted Rand index. Particularly, our algorithm performs very well even when the correct number of clusters is unknown. In addition, most clustering algorithms produce poor partitions in presence of outliers while our method can correctly reveal the structure of data and effectively identify outliers simultaneously.</p>","PeriodicalId":87417,"journal":{"name":"Proceedings. IEEE Computational Systems Bioinformatics Conference","volume":" ","pages":"142-51"},"PeriodicalIF":0.0,"publicationDate":"2004-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/csb.2004.1332427","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25829711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Recurrence time statistics: versatile tools for genomic DNA sequence analysis. 复发时间统计:基因组DNA序列分析的通用工具。
Proceedings. IEEE Computational Systems Bioinformatics Conference Pub Date : 2004-01-01 DOI: 10.1109/csb.2004.1332415
Yinhe Cao, Wen-Wen Tung, J B Gao
{"title":"Recurrence time statistics: versatile tools for genomic DNA sequence analysis.","authors":"Yinhe Cao,&nbsp;Wen-Wen Tung,&nbsp;J B Gao","doi":"10.1109/csb.2004.1332415","DOIUrl":"https://doi.org/10.1109/csb.2004.1332415","url":null,"abstract":"<p><p>With the completion of the human and a few model organisms' genomes, and the genomes of many other organisms waiting to be sequenced, it has become increasingly important to develop faster computational tools which are capable of easily identifying the structures and extracting features from DNA sequences. One of the more important structures in a DNA sequence is repeat-related. Often they have to be masked before protein coding regions along a DNA sequence are to be identified or redundant expressed sequence tags (ESTs) are to be sequenced. Here we report a novel recurrence time based method for sequence analysis. The method can conveniently study all kinds of periodicity and exhaustively find all repeat-related features from a genomic DNA sequence. An efficient codon index is also derived from the recurrence time statistics, which has the salient features of being largely species-independent and working well on very short sequences. Efficient codon indices are key elements of successful gene finding algorithms, and are particularly useful for determining whether a suspected EST belongs to a coding or non-coding region. We illustrate the power of the method by studying the genomes of E. coli, the yeast S. cervisivae, the nematode worm C. elegans, and the human, Homo sapiens. Computationally, our method is very efficient. It allows us to carry out analysis of genomes on the whole genomic scale by a PC.</p>","PeriodicalId":87417,"journal":{"name":"Proceedings. IEEE Computational Systems Bioinformatics Conference","volume":" ","pages":"40-51"},"PeriodicalIF":0.0,"publicationDate":"2004-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/csb.2004.1332415","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25829771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信