Natalia Zajac, Ioannis S. Vlachos, Sija Sajibu, Lennart Opitz, Shuoshuo Wang, Sridar V. Chittur, Christopher E. Mason, Kevin L. Knudtson, John M. Ashton, Hubert Rehrauer, Catharine Aquino
{"title":"The impact of PCR duplication on RNAseq data generated using NovaSeq 6000, NovaSeq X, AVITI, and G4 sequencers","authors":"Natalia Zajac, Ioannis S. Vlachos, Sija Sajibu, Lennart Opitz, Shuoshuo Wang, Sridar V. Chittur, Christopher E. Mason, Kevin L. Knudtson, John M. Ashton, Hubert Rehrauer, Catharine Aquino","doi":"10.1186/s13059-025-03613-7","DOIUrl":null,"url":null,"abstract":"Transcriptome sequencing (RNA-seq) is a powerful technology for gene expression profiling. Selection of optimal parameters for cDNA library generation is crucial for acquisition of high-quality data. In this study, we investigate the impact of the amount of RNA and the number of PCR cycles used for sample amplification on the rate of PCR duplication and, in consequence, on the RNA-seq data quality. For broader applicability, we sequenced the data on four short-read sequencing platforms: Illumina NovaSeq 6000, Illumina NovaSeq X, Element Biosciences AVITI, and Singular Genomics G4. The native Illumina libraries were converted for sequencing on AVITI and G4 to assess the effect of library conversion, containing additional PCR cycles. We find that the rate of PCR duplicates depends on the combined effect of RNA input material and the number of PCR cycles used for amplification. For input amounts lower than 125 ng, 34–96% of reads were discarded via deduplication with the percentage increasing with lower input amount and decreasing with increasing PCR cycles. The reduced read diversity for low input amounts leads to fewer genes detected and increased noise in expression counts. Data generated with each of the four sequencing platforms presents similar associations between starting material amount and the number of PCR cycles on PCR duplicates, a similar number of detected genes, and comparable gene expression profiles.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"172 1","pages":""},"PeriodicalIF":10.1000,"publicationDate":"2025-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genome Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s13059-025-03613-7","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOTECHNOLOGY & APPLIED MICROBIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Transcriptome sequencing (RNA-seq) is a powerful technology for gene expression profiling. Selection of optimal parameters for cDNA library generation is crucial for acquisition of high-quality data. In this study, we investigate the impact of the amount of RNA and the number of PCR cycles used for sample amplification on the rate of PCR duplication and, in consequence, on the RNA-seq data quality. For broader applicability, we sequenced the data on four short-read sequencing platforms: Illumina NovaSeq 6000, Illumina NovaSeq X, Element Biosciences AVITI, and Singular Genomics G4. The native Illumina libraries were converted for sequencing on AVITI and G4 to assess the effect of library conversion, containing additional PCR cycles. We find that the rate of PCR duplicates depends on the combined effect of RNA input material and the number of PCR cycles used for amplification. For input amounts lower than 125 ng, 34–96% of reads were discarded via deduplication with the percentage increasing with lower input amount and decreasing with increasing PCR cycles. The reduced read diversity for low input amounts leads to fewer genes detected and increased noise in expression counts. Data generated with each of the four sequencing platforms presents similar associations between starting material amount and the number of PCR cycles on PCR duplicates, a similar number of detected genes, and comparable gene expression profiles.
Genome BiologyBiochemistry, Genetics and Molecular Biology-Genetics
CiteScore
21.00
自引率
3.30%
发文量
241
审稿时长
2 months
期刊介绍:
Genome Biology stands as a premier platform for exceptional research across all domains of biology and biomedicine, explored through a genomic and post-genomic lens.
With an impressive impact factor of 12.3 (2022),* the journal secures its position as the 3rd-ranked research journal in the Genetics and Heredity category and the 2nd-ranked research journal in the Biotechnology and Applied Microbiology category by Thomson Reuters. Notably, Genome Biology holds the distinction of being the highest-ranked open-access journal in this category.
Our dedicated team of highly trained in-house Editors collaborates closely with our esteemed Editorial Board of international experts, ensuring the journal remains on the forefront of scientific advances and community standards. Regular engagement with researchers at conferences and institute visits underscores our commitment to staying abreast of the latest developments in the field.