A. Nederbragt, T. Rounge, Kyrre Kausrud, K. Jakobsen
{"title":"454个焦磷酸测序Reads序列中基因组重复序列的鉴定和定量及样品污染","authors":"A. Nederbragt, T. Rounge, Kyrre Kausrud, K. Jakobsen","doi":"10.1155/2010/782465","DOIUrl":null,"url":null,"abstract":"Contigs assembled from 454 reads from bacterial genomes demonstrate a range of read depths, with a number of contigs having a depth that is far higher than can be expected. For reference genome sequence datasets, there exists a high correlation between the contig specific read depth and the number of copies present in the genome. We developed a sequence of applied statistical analyses, which suggest that the number of copies present can be reliably estimated based on the read depth distribution in de novo genome assemblies. Read depths of contigs of de novo cyanobacterial genome assemblies were determined, and several high read depth contigs were identified. These contigs were shown to mainly contain genes that are known to be present in multiple copies in bacterial genomes. For these assemblies, a correlation between read depth and copy number was experimentally demonstrated using real-time PCR. Copy number estimates, obtained using the statistical analysis developed in this work, are presented. Per-contig read depth analysis of assemblies based on 454 reads therefore enables de novo detection of genomic repeats and estimation of the copy number of these repeats. \nAdditionally, our analysis efficiently identified contigs stemming from sample contamination, allowing for their removal from the assembly.","PeriodicalId":90934,"journal":{"name":"Next generation, sequencing & applications","volume":"188 1","pages":"1-12"},"PeriodicalIF":0.0000,"publicationDate":"2010-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":"{\"title\":\"Identification and Quantification of Genomic Repeats and Sample Contamination in Assemblies of 454 Pyrosequencing Reads\",\"authors\":\"A. Nederbragt, T. Rounge, Kyrre Kausrud, K. Jakobsen\",\"doi\":\"10.1155/2010/782465\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Contigs assembled from 454 reads from bacterial genomes demonstrate a range of read depths, with a number of contigs having a depth that is far higher than can be expected. For reference genome sequence datasets, there exists a high correlation between the contig specific read depth and the number of copies present in the genome. We developed a sequence of applied statistical analyses, which suggest that the number of copies present can be reliably estimated based on the read depth distribution in de novo genome assemblies. Read depths of contigs of de novo cyanobacterial genome assemblies were determined, and several high read depth contigs were identified. These contigs were shown to mainly contain genes that are known to be present in multiple copies in bacterial genomes. For these assemblies, a correlation between read depth and copy number was experimentally demonstrated using real-time PCR. Copy number estimates, obtained using the statistical analysis developed in this work, are presented. Per-contig read depth analysis of assemblies based on 454 reads therefore enables de novo detection of genomic repeats and estimation of the copy number of these repeats. \\nAdditionally, our analysis efficiently identified contigs stemming from sample contamination, allowing for their removal from the assembly.\",\"PeriodicalId\":90934,\"journal\":{\"name\":\"Next generation, sequencing & applications\",\"volume\":\"188 1\",\"pages\":\"1-12\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-01-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"16\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Next generation, sequencing & applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1155/2010/782465\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Next generation, sequencing & applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1155/2010/782465","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Identification and Quantification of Genomic Repeats and Sample Contamination in Assemblies of 454 Pyrosequencing Reads
Contigs assembled from 454 reads from bacterial genomes demonstrate a range of read depths, with a number of contigs having a depth that is far higher than can be expected. For reference genome sequence datasets, there exists a high correlation between the contig specific read depth and the number of copies present in the genome. We developed a sequence of applied statistical analyses, which suggest that the number of copies present can be reliably estimated based on the read depth distribution in de novo genome assemblies. Read depths of contigs of de novo cyanobacterial genome assemblies were determined, and several high read depth contigs were identified. These contigs were shown to mainly contain genes that are known to be present in multiple copies in bacterial genomes. For these assemblies, a correlation between read depth and copy number was experimentally demonstrated using real-time PCR. Copy number estimates, obtained using the statistical analysis developed in this work, are presented. Per-contig read depth analysis of assemblies based on 454 reads therefore enables de novo detection of genomic repeats and estimation of the copy number of these repeats.
Additionally, our analysis efficiently identified contigs stemming from sample contamination, allowing for their removal from the assembly.