利用 PacBio 测序技术获得 217 个封闭的沙门氏菌参考基因组。

IF 2.5 Q3 GENETICS & HEREDITY

BMC genomic data Pub Date : 2025-02-28 DOI:10.1186/s12863-025-01304-7

Yan Luo, Jae Hee Jang, Maria Balkey, Maria Hoffmann

{"title":"利用 PacBio 测序技术获得 217 个封闭的沙门氏菌参考基因组。","authors":"Yan Luo, Jae Hee Jang, Maria Balkey, Maria Hoffmann","doi":"10.1186/s12863-025-01304-7","DOIUrl":null,"url":null,"abstract":"Objectives: Whole Genome Sequencing (WGS) is widely used in food safety for the detection, investigation, and control of foodborne bacterial pathogens. However, the WGS data in most public databases, such as the National Center for Biotechnology Information (NCBI), primarily consist of Illumina short reads which lack some important information for repetitive regions, structural variations, and mobile genetic elements, and the genomic location of certain important genes like antimicrobial resistance genes (AMR) and virulence genes. To address this limitation, we have contributed 217 closed circular Salmonella enterica genomes that were generated using PacBio sequencing to the NCBI Pathogen Detection (PD) database and GenBank. This dataset provides a higher level of accuracy to genome representations in the database.Data description: High-quality complete reference genomes generated from PacBio long reads can provide essential details that are not available in draft genomes from short reads. A complete reference genome allows for more accurate data analysis and researchers to establish connections between genome variations and known genes, regulatory elements, and other genomic features. The addition of 217 complete genomes from 78 different Salmonella serovars, each representing either a distinct SNP cluster within the NCBI PD database or a unique strain, significantly enriches the diversity of the reference genome database.","PeriodicalId":72427,"journal":{"name":"BMC genomic data","volume":"26 1","pages":"15"},"PeriodicalIF":2.5000,"publicationDate":"2025-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11871702/pdf/","citationCount":"0","resultStr":"{\"title\":\"217 closed Salmonella reference genomes using PacBio sequencing.\",\"authors\":\"Yan Luo, Jae Hee Jang, Maria Balkey, Maria Hoffmann\",\"doi\":\"10.1186/s12863-025-01304-7\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Objectives: Whole Genome Sequencing (WGS) is widely used in food safety for the detection, investigation, and control of foodborne bacterial pathogens. However, the WGS data in most public databases, such as the National Center for Biotechnology Information (NCBI), primarily consist of Illumina short reads which lack some important information for repetitive regions, structural variations, and mobile genetic elements, and the genomic location of certain important genes like antimicrobial resistance genes (AMR) and virulence genes. To address this limitation, we have contributed 217 closed circular Salmonella enterica genomes that were generated using PacBio sequencing to the NCBI Pathogen Detection (PD) database and GenBank. This dataset provides a higher level of accuracy to genome representations in the database.Data description: High-quality complete reference genomes generated from PacBio long reads can provide essential details that are not available in draft genomes from short reads. A complete reference genome allows for more accurate data analysis and researchers to establish connections between genome variations and known genes, regulatory elements, and other genomic features. The addition of 217 complete genomes from 78 different Salmonella serovars, each representing either a distinct SNP cluster within the NCBI PD database or a unique strain, significantly enriches the diversity of the reference genome database.\",\"PeriodicalId\":72427,\"journal\":{\"name\":\"BMC genomic data\",\"volume\":\"26 1\",\"pages\":\"15\"},\"PeriodicalIF\":2.5000,\"publicationDate\":\"2025-02-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11871702/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BMC genomic data\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1186/s12863-025-01304-7\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"GENETICS & HEREDITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC genomic data","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1186/s12863-025-01304-7","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}

引用次数: 0

摘要

目的：全基因组测序（WGS）广泛应用于食品安全领域，用于食源性致病菌的检测、调查和控制。然而，大多数公共数据库，如国家生物技术信息中心（NCBI）的WGS数据主要由Illumina短读组成，缺乏重复区域、结构变异和移动遗传元件的一些重要信息，以及某些重要基因如抗菌素耐药基因（AMR）和毒力基因的基因组定位。为了解决这一限制，我们将使用PacBio测序产生的217个闭环肠沙门氏菌基因组提供给NCBI病原体检测（PD）数据库和GenBank。该数据集为数据库中的基因组表示提供了更高的准确性。数据描述：PacBio长读段生成的高质量完整参考基因组可以提供短读段基因组草稿中无法提供的基本细节。一个完整的参考基因组允许更准确的数据分析和研究人员建立基因组变异和已知基因，调控元件和其他基因组特征之间的联系。来自78个不同沙门氏菌血清型的217个完整基因组的加入，极大地丰富了参考基因组数据库的多样性，每个血清型代表NCBI PD数据库中的一个不同的SNP簇或一个独特的菌株。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

217 closed Salmonella reference genomes using PacBio sequencing.

Objectives: Whole Genome Sequencing (WGS) is widely used in food safety for the detection, investigation, and control of foodborne bacterial pathogens. However, the WGS data in most public databases, such as the National Center for Biotechnology Information (NCBI), primarily consist of Illumina short reads which lack some important information for repetitive regions, structural variations, and mobile genetic elements, and the genomic location of certain important genes like antimicrobial resistance genes (AMR) and virulence genes. To address this limitation, we have contributed 217 closed circular Salmonella enterica genomes that were generated using PacBio sequencing to the NCBI Pathogen Detection (PD) database and GenBank. This dataset provides a higher level of accuracy to genome representations in the database.

Data description: High-quality complete reference genomes generated from PacBio long reads can provide essential details that are not available in draft genomes from short reads. A complete reference genome allows for more accurate data analysis and researchers to establish connections between genome variations and known genes, regulatory elements, and other genomic features. The addition of 217 complete genomes from 78 different Salmonella serovars, each representing either a distinct SNP cluster within the NCBI PD database or a unique strain, significantly enriches the diversity of the reference genome database.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

BMC genomic data

CiteScore

4.90

自引率

0.00%

发文量