Technical comment to "Database verification studies of SWISS-PROT and GenBank" by Karp et al

R. Apweiler, P. Kersey, Vivien L. Junker, A. Bairoch
{"title":"Technical comment to \"Database verification studies of SWISS-PROT and GenBank\" by Karp et al","authors":"R. Apweiler, P. Kersey, Vivien L. Junker, A. Bairoch","doi":"10.1093/BIOINFORMATICS/17.6.533","DOIUrl":null,"url":null,"abstract":"In their paper “Database verification studies of SWISS-PROT and GenBank” Karp et al. (2001) conclude:(1) “SWISS-PROT is more incomplete than we ex-pected...”; (2) “Even if wecombine SWISS-PROTand TrEMBL, some sequences from the full genomesare missing from the combined dataset”; (3) “In manycases, translated GenBank genes do not exactly matchthe corresponding SWISS-PROT sequences, ...”; and(4) “...that SWISS-PROT does not identify a significantnumber of experimentally characterized proteins”.These results, and the approach used to arrive at theseresults, are in our opinion somewhat misleading. Herein,we only focus on four major points.First, there has never been a claim that SWISS-PROTis comprehensive. Thus, it is surprising that Karp et al.found that “SWISS-PROT is more incomplete than weexpected...”. To makesequences available as quickly aspossible without diluting the quality of SWISS-PROT,the supplemental database TrEMBL was introducedin 1996 and contains the translation of all coding se-quences (CDS) in the DDBJ/EMBL/GenBank nucleotidesequence database, except those already included inSWISS-PROT. Snapshots of the SWISS-PROT, TrEMBLand TrEMBLnew databases are released weekly, syn-chronised with the DDBJ/EMBL/GenBank nucleotidesequence database and provide comprehensive cover-age (ftp://ftp.ebi.ac.uk/pub/databases/sp tr nrdb/). Theweekly comprehensive SWISS-PROT/TrEMBL nonre-dundant database (SPTR) has been widely publicisedon the EBI and ExPASy web-servers and in variouspublications (e.g. Apweiler, 2000).Second, the authors’ assertions that “Even if wecombine SWISS-PROT and TrEMBL, some sequencesfrom the full genomes are missing from the com-bined dataset.” and “SWISS-PROT curators apparentlychose not to replace existing SWISS-PROT sequenceswith sequences from complete-genome projects” arerather inaccurate. Karp et al. tried to establish corre-sponding sets of SWISS-PROT/TrEMBL proteins and","PeriodicalId":90576,"journal":{"name":"Journal of bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2001-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/BIOINFORMATICS/17.6.533","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8

Abstract

In their paper “Database verification studies of SWISS-PROT and GenBank” Karp et al. (2001) conclude:(1) “SWISS-PROT is more incomplete than we ex-pected...”; (2) “Even if wecombine SWISS-PROTand TrEMBL, some sequences from the full genomesare missing from the combined dataset”; (3) “In manycases, translated GenBank genes do not exactly matchthe corresponding SWISS-PROT sequences, ...”; and(4) “...that SWISS-PROT does not identify a significantnumber of experimentally characterized proteins”.These results, and the approach used to arrive at theseresults, are in our opinion somewhat misleading. Herein,we only focus on four major points.First, there has never been a claim that SWISS-PROTis comprehensive. Thus, it is surprising that Karp et al.found that “SWISS-PROT is more incomplete than weexpected...”. To makesequences available as quickly aspossible without diluting the quality of SWISS-PROT,the supplemental database TrEMBL was introducedin 1996 and contains the translation of all coding se-quences (CDS) in the DDBJ/EMBL/GenBank nucleotidesequence database, except those already included inSWISS-PROT. Snapshots of the SWISS-PROT, TrEMBLand TrEMBLnew databases are released weekly, syn-chronised with the DDBJ/EMBL/GenBank nucleotidesequence database and provide comprehensive cover-age (ftp://ftp.ebi.ac.uk/pub/databases/sp tr nrdb/). Theweekly comprehensive SWISS-PROT/TrEMBL nonre-dundant database (SPTR) has been widely publicisedon the EBI and ExPASy web-servers and in variouspublications (e.g. Apweiler, 2000).Second, the authors’ assertions that “Even if wecombine SWISS-PROT and TrEMBL, some sequencesfrom the full genomes are missing from the com-bined dataset.” and “SWISS-PROT curators apparentlychose not to replace existing SWISS-PROT sequenceswith sequences from complete-genome projects” arerather inaccurate. Karp et al. tried to establish corre-sponding sets of SWISS-PROT/TrEMBL proteins and
对Karp等人“SWISS-PROT和GenBank的数据库验证研究”的技术评论
Karp et al.(2001)在他们的论文“SWISS-PROT和GenBank的数据库验证研究”中得出结论:(1)“SWISS-PROT比我们预期的更不完整……”;(2)“即使我们将SWISS-PROTand TrEMBL组合在一起,一些全基因组序列也会在组合数据集中缺失”;(3)“在许多情况下,翻译的GenBank基因并不完全匹配相应的SWISS-PROT序列,…”;和(4)”……SWISS-PROT没有识别出大量实验表征的蛋白质”。在我们看来,这些结果,以及用来得出这些结果的方法,在某种程度上具有误导性。在此,我们只关注四点。首先,从来没有人声称SWISS-PROTis是全面的。因此,令人惊讶的是,Karp等人发现“SWISS-PROT比我们预期的更不完整……”。为了在不影响SWISS-PROT质量的情况下尽可能快地获得序列,1996年引入了补充数据库TrEMBL,该数据库包含DDBJ/EMBL/GenBank核苷酸序列数据库中除SWISS-PROT中已包含的外的所有编码序列(CDS)的翻译。瑞士- prot、TrEMBLand和TrEMBLnew数据库的快照每周发布一次,与DDBJ/EMBL/GenBank核苷酸序列数据库同步,并提供全面的覆盖范围(ftp://ftp.ebi.ac。Uk /pub/databases/sp tr nrdb/)。每周全面的SWISS-PROT/TrEMBL非冗余数据库(SPTR)已在EBI和ExPASy网络服务器和各种出版物上广泛宣传(例如Apweiler, 2000)。其次,作者断言,“即使我们将SWISS-PROT和TrEMBL结合起来,也会在合并后的数据集中缺少全基因组的一些序列。和“SWISS-PROT策展人显然没有选择用全基因组计划的序列取代现有的SWISS-PROT序列”是相当不准确的。Karp等人试图建立相应的SWISS-PROT/TrEMBL蛋白组和
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信