Differences in GenBank and RefSeq annotations may affect genomics data interpretation for Pseudomonas putida KT2440.

IF 3.1 2区生物学 Q2 MICROBIOLOGY

mSphere Pub Date : 2025-10-02 DOI:10.1128/msphere.00391-25

Guilherme Marcelino Viana de Siqueira, Thomas Eng, Aindrila Mukhopadhyay, María-Eugenia Guazzaroni

{"title":"Differences in GenBank and RefSeq annotations may affect genomics data interpretation for Pseudomonas putida KT2440.","authors":"Guilherme Marcelino Viana de Siqueira, Thomas Eng, Aindrila Mukhopadhyay, María-Eugenia Guazzaroni","doi":"10.1128/msphere.00391-25","DOIUrl":null,"url":null,"abstract":"Annotations of genomic features are cornerstone data that support routine workflows in conventional omics analyses in Pseudomonas putida KT2440 and other organisms. The GenBank and the RefSeq versions of the annotated KT2440 genome are two popular resources widely cited in the literature; yet, they originate from distinct prediction pipelines and possess potentially different biological information that is often overlooked. In this study, we systematically compared the features present in these resources and found that approximately 16% of the total of KT2440 open reading frames (ORFs) show differences in their predicted genomic positions across GenBank and RefSeq, despite sharing equivalent locus tag codes. Furthermore, we show that these discrepancies can affect the results of high-throughput analyses by processing a collection of RNAseq expression data sets utilizing both annotations. Our findings provide a comprehensive overview of the current state of available resources for genomics research in P. putida KT2440 and highlight a rarely addressed yet widespread potential pitfall in the literature on this organism, with possible implications for other prokaryotes.IMPORTANCEGenome annotation databases often rely on different statistical models for their function predictions and inherently carry biases propagated into studies using them. This work provides a quantitative assessment of two popular annotation resources for the model bacterium Pseudomonas putida KT2440 and their influence on data interpretation. As large-scale omics data sets are commonly used to inform experimental decisions, our results aim to promote awareness of the caveats associated with these computational resources and foster reproducibility and transparency in P. putida research.","PeriodicalId":19052,"journal":{"name":"mSphere","volume":" ","pages":"e0039125"},"PeriodicalIF":3.1000,"publicationDate":"2025-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"mSphere","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1128/msphere.00391-25","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MICROBIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Annotations of genomic features are cornerstone data that support routine workflows in conventional omics analyses in Pseudomonas putida KT2440 and other organisms. The GenBank and the RefSeq versions of the annotated KT2440 genome are two popular resources widely cited in the literature; yet, they originate from distinct prediction pipelines and possess potentially different biological information that is often overlooked. In this study, we systematically compared the features present in these resources and found that approximately 16% of the total of KT2440 open reading frames (ORFs) show differences in their predicted genomic positions across GenBank and RefSeq, despite sharing equivalent locus tag codes. Furthermore, we show that these discrepancies can affect the results of high-throughput analyses by processing a collection of RNAseq expression data sets utilizing both annotations. Our findings provide a comprehensive overview of the current state of available resources for genomics research in P. putida KT2440 and highlight a rarely addressed yet widespread potential pitfall in the literature on this organism, with possible implications for other prokaryotes.IMPORTANCEGenome annotation databases often rely on different statistical models for their function predictions and inherently carry biases propagated into studies using them. This work provides a quantitative assessment of two popular annotation resources for the model bacterium Pseudomonas putida KT2440 and their influence on data interpretation. As large-scale omics data sets are commonly used to inform experimental decisions, our results aim to promote awareness of the caveats associated with these computational resources and foster reproducibility and transparency in P. putida research.

查看原文本刊更多论文

GenBank和RefSeq注释的差异可能会影响恶臭假单胞菌KT2440的基因组学数据解释。

基因组特征的注释是支持恶臭假单胞菌KT2440和其他生物常规组学分析常规工作流程的基础数据。带注释的KT2440基因组的GenBank和RefSeq版本是文献中广泛引用的两个流行资源；然而，它们起源于不同的预测管道，并拥有经常被忽视的潜在不同的生物信息。在这项研究中，我们系统地比较了这些资源中存在的特征，发现大约16%的KT2440开放阅读框（orf）在GenBank和RefSeq上的预测基因组位置存在差异，尽管它们共享相同的位点标签代码。此外，我们表明，这些差异可以影响高通量分析的结果，通过处理RNAseq表达数据集的集合利用这两个注释。我们的研究结果提供了p.p putida KT2440基因组学研究现有资源的全面概述，并强调了该生物文献中很少解决但广泛存在的潜在缺陷，可能对其他原核生物有影响。IMPORTANCEGenome注释数据库通常依赖于不同的统计模型来进行功能预测，并且固有地携带偏见传播到使用它们的研究中。本研究对模型细菌恶臭假单胞菌KT2440两种流行的注释资源及其对数据解释的影响进行了定量评估。由于大规模组学数据集通常用于为实验决策提供信息，我们的研究结果旨在提高对与这些计算资源相关的注意事项的认识，并促进恶臭假单胞菌研究的可重复性和透明度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

mSphere Immunology and Microbiology-Microbiology

CiteScore

8.50

自引率

2.10%

发文量

192

审稿时长

11 weeks

期刊介绍： mSphere™ is a multi-disciplinary open-access journal that will focus on rapid publication of fundamental contributions to our understanding of microbiology. Its scope will reflect the immense range of fields within the microbial sciences, creating new opportunities for researchers to share findings that are transforming our understanding of human health and disease, ecosystems, neuroscience, agriculture, energy production, climate change, evolution, biogeochemical cycling, and food and drug production. Submissions will be encouraged of all high-quality work that makes fundamental contributions to our understanding of microbiology. mSphere™ will provide streamlined decisions, while carrying on ASM''s tradition for rigorous peer review.