Proceedings. IEEE Computational Systems Bioinformatics Conference最新文献_第5页

Shannon information in complete genomes. 完整基因组中的香农信息。

Proceedings. IEEE Computational Systems Bioinformatics Conference Pub Date : 2004-08-16 DOI: 10.1109/CSB.2004.153

Chang-Heng Chang, L. Hsieh, T. Chen, Hong-Da Chen, L. Luo, Hoong-Chien Lee

引用次数: 2

Inverse Protein Folding in 2D HP Mode (Extended Abstract) 二维HP模式下的蛋白质逆向折叠(扩展摘要)

Proceedings. IEEE Computational Systems Bioinformatics Conference Pub Date : 2004-08-16 DOI: 10.1109/CSB.2004.1332444

Arvind Gupta, Ján Manuch, L. Stacho

引用次数: 6

Shannon information in complete genomes. 完整基因组中的香农信息。

Proceedings. IEEE Computational Systems Bioinformatics Conference Pub Date : 2004-01-01 DOI: 10.1109/csb.2004.1332413

Chang-Heng Chang, Li-Ching Hsieh, Ta-Yuan Chen, Hong-Da Chen, Liaofu Luo, Hoong-Chien Lee

{"title":"Shannon information in complete genomes.","authors":"Chang-Heng Chang, Li-Ching Hsieh, Ta-Yuan Chen, Hong-Da Chen, Liaofu Luo, Hoong-Chien Lee","doi":"10.1109/csb.2004.1332413","DOIUrl":"https://doi.org/10.1109/csb.2004.1332413","url":null,"abstract":"Shannon information in the genomes of all completely sequenced prokaryotes and eukaryotes are measured in word lengths of two to ten letters. It is found that in a scale-dependent way, the Shannon information in complete genomes are much greater than that in matching random sequences - thousands of times greater in the case of short words. Furthermore, with the exception of the 14 chromosomes of Plasmodium falciparum, the Shannon information in all available complete genomes belong to a universality class given by an extremely simple formula. The data are consistent with a model for genome growth composed of two main ingredients: random segmental duplications that increase the Shannon information in a scale-independent way, and random point mutations that preferentially reduces the larger-scale Shannon information. The inference drawn from the present study is that the large-scale and coarse-grained growth of genomes was selectively neutral and this suggests an independent corroboration of Kimura's neutral theory of evolution.","PeriodicalId":87417,"journal":{"name":"Proceedings. IEEE Computational Systems Bioinformatics Conference","volume":" ","pages":"20-30"},"PeriodicalIF":0.0,"publicationDate":"2004-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/csb.2004.1332413","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25829769","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Algorithms for association study design using a generalized model of haplotype conservation. 基于广义单倍型守恒模型的关联研究设计算法。

Proceedings. IEEE Computational Systems Bioinformatics Conference Pub Date : 2004-01-01

Russell Schwartz

引用次数: 0

SPIDER: software for protein identification from sequence tags with de novo sequencing error. 蜘蛛:软件蛋白质鉴定从序列标签与从头测序错误。

Proceedings. IEEE Computational Systems Bioinformatics Conference Pub Date : 2004-01-01 DOI: 10.1109/csb.2004.1332434

Yonghua Han, Bin Ma, Kaizhong Zhang

引用次数: 30

MinPD: distance-based phylogenetic analysis and recombination detection of serially-sampled HIV quasispecies. MinPD:基于距离的HIV准种系统发育分析与重组检测。

Proceedings. IEEE Computational Systems Bioinformatics Conference Pub Date : 2004-01-01

Patricia Buendia, Giri Narasimhan

引用次数: 0

Comparison of two schemes for automatic keyword extraction from MEDLINE for functional gene clustering. MEDLINE功能基因聚类关键词自动提取两种方案的比较。

Proceedings. IEEE Computational Systems Bioinformatics Conference Pub Date : 2004-01-01 DOI: 10.1109/csb.2004.1332452

Ying Liu, Brian J Ciliax, Karin Borges, Venu Dasigi, Ashwin Ram, Shamkant B Navathe, Ray Dingledine

{"title":"Comparison of two schemes for automatic keyword extraction from MEDLINE for functional gene clustering.","authors":"Ying Liu, Brian J Ciliax, Karin Borges, Venu Dasigi, Ashwin Ram, Shamkant B Navathe, Ray Dingledine","doi":"10.1109/csb.2004.1332452","DOIUrl":"https://doi.org/10.1109/csb.2004.1332452","url":null,"abstract":"One of the key challenges of microarray studies is to derive biological insights from the unprecedented quatities of data on gene-expression patterns. Clustering genes by functional keyword association can provide direct information about the nature of the functional links among genes within the derived clusters. However, the quality of the keyword lists extracted from biomedical literature for each gene significantly affects the clustering results. We extracted keywords from MEDLINE that describes the most prominent functions of the genes, and used the resulting weights of the keywords as feature vectors for gene clustering. By analyzing the resulting cluster quality, we compared two keyword weighting schemes: normalized z-score and term frequency-inverse document frequency (TFIDF). The best combination of background comparison set, stop list and stemming algorithm was selected based on precision and recall metrics. In a test set of four known gene groups, a hierarchical algorithm correctly assigned 25 of 26 genes to the appropriate clusters based on keywords extracted by the TDFIDF weighting scheme, but only 23 og 26 with the z-score method. To evaluate the effectiveness of the weighting schemes for keyword extraction for gene clusters from microarray profiles, 44 yeast genes that are differentially expressed during the cell cycle were used as a second test set. Using established measures of cluster quality, the results produced from TFIDF-weighted keywords had higher purity, lower entropy, and higher mutual information than those produced from normalized z-score weighted keywords. The optimized algorithms should be useful for sorting genes from microarray lists into functionally discrete clusters.","PeriodicalId":87417,"journal":{"name":"Proceedings. IEEE Computational Systems Bioinformatics Conference","volume":" ","pages":"394-404"},"PeriodicalIF":0.0,"publicationDate":"2004-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/csb.2004.1332452","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25830003","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Calculation, visualization, and manipulation of MASTs (Maximum Agreement Subtrees). 计算，可视化和桅杆(最大协议子树)的操作。

Proceedings. IEEE Computational Systems Bioinformatics Conference Pub Date : 2004-01-01 DOI: 10.1109/csb.2004.1332453

Shiming Dong, Eileen Kraemer

引用次数: 0

Minimum entropy clustering and applications to gene expression analysis. 最小熵聚类及其在基因表达分析中的应用。

Proceedings. IEEE Computational Systems Bioinformatics Conference Pub Date : 2004-01-01 DOI: 10.1109/csb.2004.1332427

Haifeng Li, Keshu Zhang, Tao Jiang

{"title":"Minimum entropy clustering and applications to gene expression analysis.","authors":"Haifeng Li, Keshu Zhang, Tao Jiang","doi":"10.1109/csb.2004.1332427","DOIUrl":"https://doi.org/10.1109/csb.2004.1332427","url":null,"abstract":"Clustering is a common methodology for analyzing the gene expression data. In this paper, we present a new clustering algorithm from an information-theoretic point of view. First, we propose the minimum entropy (measured on a posteriori probabilities) criterion, which is the conditional entropy of clusters given the observations. Fano's inequality indicates that it could be a good criterion for clustering. We generalize the criterion by replacing Shannon's entropy with Havrda-Charvat's structural alpha-entropy. Interestingly, the minimum entropy criterion based on structural alpha-entropy is equal to the probability error of the nearest neighbor method when alpha = 2. This is another evidence that the proposed criterion is good for clustering. With a non-parametric approach for estimating a posteriori probabilities, an efficient iterative algorithm is then established to minimize the entropy. The experimental results show that the clustering algorithm performs significantly better than k-means/medians, hierarchical clustering, SOM, and EM in terms of adjusted Rand index. Particularly, our algorithm performs very well even when the correct number of clusters is unknown. In addition, most clustering algorithms produce poor partitions in presence of outliers while our method can correctly reveal the structure of data and effectively identify outliers simultaneously.","PeriodicalId":87417,"journal":{"name":"Proceedings. IEEE Computational Systems Bioinformatics Conference","volume":" ","pages":"142-51"},"PeriodicalIF":0.0,"publicationDate":"2004-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/csb.2004.1332427","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25829711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Recurrence time statistics: versatile tools for genomic DNA sequence analysis. 复发时间统计:基因组DNA序列分析的通用工具。

Proceedings. IEEE Computational Systems Bioinformatics Conference Pub Date : 2004-01-01 DOI: 10.1109/csb.2004.1332415

Yinhe Cao, Wen-Wen Tung, J B Gao

{"title":"Recurrence time statistics: versatile tools for genomic DNA sequence analysis.","authors":"Yinhe Cao, Wen-Wen Tung, J B Gao","doi":"10.1109/csb.2004.1332415","DOIUrl":"https://doi.org/10.1109/csb.2004.1332415","url":null,"abstract":"With the completion of the human and a few model organisms' genomes, and the genomes of many other organisms waiting to be sequenced, it has become increasingly important to develop faster computational tools which are capable of easily identifying the structures and extracting features from DNA sequences. One of the more important structures in a DNA sequence is repeat-related. Often they have to be masked before protein coding regions along a DNA sequence are to be identified or redundant expressed sequence tags (ESTs) are to be sequenced. Here we report a novel recurrence time based method for sequence analysis. The method can conveniently study all kinds of periodicity and exhaustively find all repeat-related features from a genomic DNA sequence. An efficient codon index is also derived from the recurrence time statistics, which has the salient features of being largely species-independent and working well on very short sequences. Efficient codon indices are key elements of successful gene finding algorithms, and are particularly useful for determining whether a suspected EST belongs to a coding or non-coding region. We illustrate the power of the method by studying the genomes of E. coli, the yeast S. cervisivae, the nematode worm C. elegans, and the human, Homo sapiens. Computationally, our method is very efficient. It allows us to carry out analysis of genomes on the whole genomic scale by a PC.","PeriodicalId":87417,"journal":{"name":"Proceedings. IEEE Computational Systems Bioinformatics Conference","volume":" ","pages":"40-51"},"PeriodicalIF":0.0,"publicationDate":"2004-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/csb.2004.1332415","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25829771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10