{"title":"使用线性方法预测原核生物基因组中的必需基因:ZUPLS。","authors":"Kai Song, Tuopong Tong, Fang Wu","doi":"10.1039/c3ib40241j","DOIUrl":null,"url":null,"abstract":"<p><p>An effective linear method, ZUPLS, was developed to improve the accuracy and speed of prokaryotic essential gene identification. ZUPLS only uses the Z-curve and other sequence-based features. Such features can be calculated readily from the DNA/amino acid sequences. Therefore, no well-studied biological network knowledge is required for using ZUPLS. This significantly simplifies essential gene identification, especially for newly sequenced species. ZUPLS can also select necessary features automatically by embedding the uninformative variable elimination tool into the partial least squares classifier. No optimized modelling parameters are needed. ZUPLS has been used, herein, to predict essential genes of 12 remotely related prokaryotes to test its performance. The cross-organism predictions yielded AUC (Area Under the Curve) scores between 0.8042 and 0.9319 by using E. coli genes as the training samples. Similarly, ZUPLS achieved AUC scores between 0.8111 and 0.9371 by using B. subtilis genes as the training samples. We also compared it with the best available results of the existing approaches for further testing. The improvement of the AUC score in predicting B. subtilis essential genes using E. coli genes was 0.13. Additionally, in predicting E. coli essential genes using P. aeruginosa genes, the significant improvement was 0.10. Similarly, the exceptional improvement of the average accuracy of M. pulmonis using M. genitalium and M. pulmonis genes was 14.7%. The combined superior feature extraction and selection power of ZUPLS enable it to give reliable prediction of essential genes for both Gram-positive/negative organisms and rich/poor culture media. </p>","PeriodicalId":520649,"journal":{"name":"Integrative biology : quantitative biosciences from nano to macro","volume":" ","pages":"460-9"},"PeriodicalIF":1.4000,"publicationDate":"2014-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1039/c3ib40241j","citationCount":"21","resultStr":"{\"title\":\"Predicting essential genes in prokaryotic genomes using a linear method: ZUPLS.\",\"authors\":\"Kai Song, Tuopong Tong, Fang Wu\",\"doi\":\"10.1039/c3ib40241j\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>An effective linear method, ZUPLS, was developed to improve the accuracy and speed of prokaryotic essential gene identification. ZUPLS only uses the Z-curve and other sequence-based features. Such features can be calculated readily from the DNA/amino acid sequences. Therefore, no well-studied biological network knowledge is required for using ZUPLS. This significantly simplifies essential gene identification, especially for newly sequenced species. ZUPLS can also select necessary features automatically by embedding the uninformative variable elimination tool into the partial least squares classifier. No optimized modelling parameters are needed. ZUPLS has been used, herein, to predict essential genes of 12 remotely related prokaryotes to test its performance. The cross-organism predictions yielded AUC (Area Under the Curve) scores between 0.8042 and 0.9319 by using E. coli genes as the training samples. Similarly, ZUPLS achieved AUC scores between 0.8111 and 0.9371 by using B. subtilis genes as the training samples. We also compared it with the best available results of the existing approaches for further testing. The improvement of the AUC score in predicting B. subtilis essential genes using E. coli genes was 0.13. Additionally, in predicting E. coli essential genes using P. aeruginosa genes, the significant improvement was 0.10. Similarly, the exceptional improvement of the average accuracy of M. pulmonis using M. genitalium and M. pulmonis genes was 14.7%. The combined superior feature extraction and selection power of ZUPLS enable it to give reliable prediction of essential genes for both Gram-positive/negative organisms and rich/poor culture media. </p>\",\"PeriodicalId\":520649,\"journal\":{\"name\":\"Integrative biology : quantitative biosciences from nano to macro\",\"volume\":\" \",\"pages\":\"460-9\"},\"PeriodicalIF\":1.4000,\"publicationDate\":\"2014-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1039/c3ib40241j\",\"citationCount\":\"21\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Integrative biology : quantitative biosciences from nano to macro\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1039/c3ib40241j\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2014/3/7 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Integrative biology : quantitative biosciences from nano to macro","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1039/c3ib40241j","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2014/3/7 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 21
摘要
为了提高原核生物必需基因鉴定的准确性和速度,建立了一种有效的线性方法ZUPLS。ZUPLS仅使用z曲线和其他基于序列的特征。这些特征可以很容易地从DNA/氨基酸序列中计算出来。因此,使用ZUPLS不需要充分研究生物网络知识。这大大简化了基本基因的鉴定,特别是对新测序的物种。通过将无信息变量消除工具嵌入到偏最小二乘分类器中,ZUPLS还可以自动选择必要的特征。不需要优化建模参数。ZUPLS已被使用,在这里,预测必需基因的12远程相关的原核生物,以测试其性能。以大肠杆菌基因作为训练样本,跨生物预测的AUC (Area Under The Curve)得分在0.8042 ~ 0.9319之间。同样,以枯草芽孢杆菌基因作为训练样本,ZUPLS的AUC得分在0.8111 ~ 0.9371之间。我们还将其与现有方法的最佳可用结果进行了比较,以便进一步测试。利用大肠杆菌基因预测枯草芽孢杆菌必需基因的AUC评分提高0.13。此外,在使用铜绿假单胞菌基因预测大肠杆菌必需基因时,显著提高0.10。同样,使用生殖支原体和肺支原体基因对肺支原体的平均准确率的显著提高为14.7%。ZUPLS结合了优越的特征提取和选择能力,使其能够可靠地预测革兰氏阳性/阴性生物和富/贫培养基的必需基因。
Predicting essential genes in prokaryotic genomes using a linear method: ZUPLS.
An effective linear method, ZUPLS, was developed to improve the accuracy and speed of prokaryotic essential gene identification. ZUPLS only uses the Z-curve and other sequence-based features. Such features can be calculated readily from the DNA/amino acid sequences. Therefore, no well-studied biological network knowledge is required for using ZUPLS. This significantly simplifies essential gene identification, especially for newly sequenced species. ZUPLS can also select necessary features automatically by embedding the uninformative variable elimination tool into the partial least squares classifier. No optimized modelling parameters are needed. ZUPLS has been used, herein, to predict essential genes of 12 remotely related prokaryotes to test its performance. The cross-organism predictions yielded AUC (Area Under the Curve) scores between 0.8042 and 0.9319 by using E. coli genes as the training samples. Similarly, ZUPLS achieved AUC scores between 0.8111 and 0.9371 by using B. subtilis genes as the training samples. We also compared it with the best available results of the existing approaches for further testing. The improvement of the AUC score in predicting B. subtilis essential genes using E. coli genes was 0.13. Additionally, in predicting E. coli essential genes using P. aeruginosa genes, the significant improvement was 0.10. Similarly, the exceptional improvement of the average accuracy of M. pulmonis using M. genitalium and M. pulmonis genes was 14.7%. The combined superior feature extraction and selection power of ZUPLS enable it to give reliable prediction of essential genes for both Gram-positive/negative organisms and rich/poor culture media.