改进大肠杆菌操作子预测

2005 IEEE Computational Systems Bioinformatics Conference - Workshops (CSBW'05) Pub Date : 2005-08-08 DOI:10.1109/CSBW.2005.76

P. Dam, V. Olman, Ying Xu

{"title":"改进大肠杆菌操作子预测","authors":"P. Dam, V. Olman, Ying Xu","doi":"10.1109/CSBW.2005.76","DOIUrl":null,"url":null,"abstract":"In bacterium, genes working in the same pathway or interacting with each other are often organized into operons. Currently, the prediction accuracy for operon/boundary gene pairs is fairly good in Escherichia coli, however, such a high level of success in recognizing a gene pair as a boundary or operon pair does not automatically transcribe into a high level of accuracy in predicting the boundary of operons. We found that for several operon prediction programs, the prediction accuracy is often less accurate when the intergenic region of a gene pair is between 40 to 250 base pairs. In our approach, multiple features of the intergenic region, gene length and available microarray data in E. coli were used to improve the accuracy of the operon prediction programs in general and of gene pairs in the above intergenic region in particular. These features were scored according to a log likelihood formula, and the result suggests that we can gain up to 8% increase in the accuracy level for gene pairs with the intergenic distance between 40-250 base pairs. For other regions, the newly added features also give a moderate improvement in prediction accuracy. Furthermore, the accuracy in predicting transcript boundary is also improved, comparing to methods using the intergenic distance and functional annotation alone. We are currently fine-tuning our program to predict all operons in E. coli, and applying this method to predict operons in other organisms.","PeriodicalId":123531,"journal":{"name":"2005 IEEE Computational Systems Bioinformatics Conference - Workshops (CSBW'05)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2005-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Improving operon prediction in E. coli\",\"authors\":\"P. Dam, V. Olman, Ying Xu\",\"doi\":\"10.1109/CSBW.2005.76\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In bacterium, genes working in the same pathway or interacting with each other are often organized into operons. Currently, the prediction accuracy for operon/boundary gene pairs is fairly good in Escherichia coli, however, such a high level of success in recognizing a gene pair as a boundary or operon pair does not automatically transcribe into a high level of accuracy in predicting the boundary of operons. We found that for several operon prediction programs, the prediction accuracy is often less accurate when the intergenic region of a gene pair is between 40 to 250 base pairs. In our approach, multiple features of the intergenic region, gene length and available microarray data in E. coli were used to improve the accuracy of the operon prediction programs in general and of gene pairs in the above intergenic region in particular. These features were scored according to a log likelihood formula, and the result suggests that we can gain up to 8% increase in the accuracy level for gene pairs with the intergenic distance between 40-250 base pairs. For other regions, the newly added features also give a moderate improvement in prediction accuracy. Furthermore, the accuracy in predicting transcript boundary is also improved, comparing to methods using the intergenic distance and functional annotation alone. We are currently fine-tuning our program to predict all operons in E. coli, and applying this method to predict operons in other organisms.\",\"PeriodicalId\":123531,\"journal\":{\"name\":\"2005 IEEE Computational Systems Bioinformatics Conference - Workshops (CSBW'05)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2005-08-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2005 IEEE Computational Systems Bioinformatics Conference - Workshops (CSBW'05)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CSBW.2005.76\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2005 IEEE Computational Systems Bioinformatics Conference - Workshops (CSBW'05)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CSBW.2005.76","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

在细菌中，作用于同一途径或相互影响的基因通常组成操作子。目前，在大肠杆菌中，操作子/边界基因对的预测准确率相当高，但是，将基因对识别为边界或操作子对的高成功率并不会自动转化为预测操作子边界的高准确率。我们发现，在几个操作子预测程序中，当基因对的基因间区域在 40 到 250 个碱基对之间时，预测的准确性往往较低。在我们的方法中，使用了大肠杆菌基因间区、基因长度和可用芯片数据的多种特征来提高操作子预测程序的准确性，特别是上述基因间区基因对的准确性。根据对数似然公式对这些特征进行评分，结果表明，对于基因间距离在 40-250 碱基对之间的基因对，我们可以将准确率提高 8%。对于其他区域，新添加的特征也能适度提高预测准确率。此外，与单独使用基因间距离和功能注释的方法相比，预测转录本边界的准确性也有所提高。我们目前正在对我们的程序进行微调，以预测大肠杆菌中的所有操作子，并将这种方法应用于预测其他生物的操作子。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Improving operon prediction in E. coli

In bacterium, genes working in the same pathway or interacting with each other are often organized into operons. Currently, the prediction accuracy for operon/boundary gene pairs is fairly good in Escherichia coli, however, such a high level of success in recognizing a gene pair as a boundary or operon pair does not automatically transcribe into a high level of accuracy in predicting the boundary of operons. We found that for several operon prediction programs, the prediction accuracy is often less accurate when the intergenic region of a gene pair is between 40 to 250 base pairs. In our approach, multiple features of the intergenic region, gene length and available microarray data in E. coli were used to improve the accuracy of the operon prediction programs in general and of gene pairs in the above intergenic region in particular. These features were scored according to a log likelihood formula, and the result suggests that we can gain up to 8% increase in the accuracy level for gene pairs with the intergenic distance between 40-250 base pairs. For other regions, the newly added features also give a moderate improvement in prediction accuracy. Furthermore, the accuracy in predicting transcript boundary is also improved, comparing to methods using the intergenic distance and functional annotation alone. We are currently fine-tuning our program to predict all operons in E. coli, and applying this method to predict operons in other organisms.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2005 IEEE Computational Systems Bioinformatics Conference - Workshops (CSBW'05)

自引率

0.00%

发文量