{"title":"基因组规律指导真菌和后生动物的基因预测。","authors":"Yaping Fang, Jun Li","doi":"10.1504/IJCBDD.2013.052197","DOIUrl":null,"url":null,"abstract":"<p><p>Protein coding gene prediction by computational approaches is a fundamental step for genome annotation. However, it is a challenge to accurately predict eukaryotic genes in silico. By surveying the model genomes, we found that the Spearman's rank correlation coefficient between the number of experimental-verified genes and the size of genomes was 0.96 for all eukaryotes except plants, indicating the relationship between genome size and the number of coding genes can be expressed with a monotonic function. Regression analysis found that the relationship of total protein coding genes over genome size followed a logarithmic equation. We integrated the equation into ab initio gene prediction software to guide the gene prediction by constraining the total number of predicted genes. We evaluated the software in three eukaryotic genomes. Results showed that >90% of false positive predictions were removed while >80% of true positives were retained, resulting in much higher specificity.</p>","PeriodicalId":39227,"journal":{"name":"International Journal of Computational Biology and Drug Design","volume":" ","pages":"157-69"},"PeriodicalIF":0.0000,"publicationDate":"2013-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/IJCBDD.2013.052197","citationCount":"2","resultStr":"{\"title\":\"Genomic law guided gene prediction in fungi and metazoans.\",\"authors\":\"Yaping Fang, Jun Li\",\"doi\":\"10.1504/IJCBDD.2013.052197\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Protein coding gene prediction by computational approaches is a fundamental step for genome annotation. However, it is a challenge to accurately predict eukaryotic genes in silico. By surveying the model genomes, we found that the Spearman's rank correlation coefficient between the number of experimental-verified genes and the size of genomes was 0.96 for all eukaryotes except plants, indicating the relationship between genome size and the number of coding genes can be expressed with a monotonic function. Regression analysis found that the relationship of total protein coding genes over genome size followed a logarithmic equation. We integrated the equation into ab initio gene prediction software to guide the gene prediction by constraining the total number of predicted genes. We evaluated the software in three eukaryotic genomes. Results showed that >90% of false positive predictions were removed while >80% of true positives were retained, resulting in much higher specificity.</p>\",\"PeriodicalId\":39227,\"journal\":{\"name\":\"International Journal of Computational Biology and Drug Design\",\"volume\":\" \",\"pages\":\"157-69\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1504/IJCBDD.2013.052197\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Computational Biology and Drug Design\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1504/IJCBDD.2013.052197\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2013/2/21 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q4\",\"JCRName\":\"Pharmacology, Toxicology and Pharmaceutics\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Computational Biology and Drug Design","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1504/IJCBDD.2013.052197","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2013/2/21 0:00:00","PubModel":"Epub","JCR":"Q4","JCRName":"Pharmacology, Toxicology and Pharmaceutics","Score":null,"Total":0}
Genomic law guided gene prediction in fungi and metazoans.
Protein coding gene prediction by computational approaches is a fundamental step for genome annotation. However, it is a challenge to accurately predict eukaryotic genes in silico. By surveying the model genomes, we found that the Spearman's rank correlation coefficient between the number of experimental-verified genes and the size of genomes was 0.96 for all eukaryotes except plants, indicating the relationship between genome size and the number of coding genes can be expressed with a monotonic function. Regression analysis found that the relationship of total protein coding genes over genome size followed a logarithmic equation. We integrated the equation into ab initio gene prediction software to guide the gene prediction by constraining the total number of predicted genes. We evaluated the software in three eukaryotic genomes. Results showed that >90% of false positive predictions were removed while >80% of true positives were retained, resulting in much higher specificity.