Dilek Koptekin, Etka Yapar, Kıvılcım Başak Vural, Ekin Sağlıcan, N. Ezgi Altınışık, Anna-Sapfo Malaspinas, Can Alkan, Mehmet Somel
{"title":"Pre-processing of paleogenomes: mitigating reference bias and postmortem damage in ancient genome data","authors":"Dilek Koptekin, Etka Yapar, Kıvılcım Başak Vural, Ekin Sağlıcan, N. Ezgi Altınışık, Anna-Sapfo Malaspinas, Can Alkan, Mehmet Somel","doi":"10.1186/s13059-024-03462-w","DOIUrl":null,"url":null,"abstract":"We investigate alternative strategies against reference bias and postmortem damage in low coverage paleogenomes. Compared to alignment to the linear reference genome, we show that masking known polymorphic sites and graph alignment effectively remove reference bias, but only starting from raw read files. We next study approaches to overcome postmortem damage: trimming, rescaling, and our newly developed algorithm, bamRefine (github.com/etkayapar/bamRefine and zenodo.org/records/14234666), masking reads only at positions possibly affected by PMD. We propose graph alignment coupled with bamRefine as a simple strategy to minimize data loss and bias, and urge the community to publish FASTQ files.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"45 1","pages":""},"PeriodicalIF":10.1000,"publicationDate":"2025-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genome Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s13059-024-03462-w","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOTECHNOLOGY & APPLIED MICROBIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
We investigate alternative strategies against reference bias and postmortem damage in low coverage paleogenomes. Compared to alignment to the linear reference genome, we show that masking known polymorphic sites and graph alignment effectively remove reference bias, but only starting from raw read files. We next study approaches to overcome postmortem damage: trimming, rescaling, and our newly developed algorithm, bamRefine (github.com/etkayapar/bamRefine and zenodo.org/records/14234666), masking reads only at positions possibly affected by PMD. We propose graph alignment coupled with bamRefine as a simple strategy to minimize data loss and bias, and urge the community to publish FASTQ files.
Genome BiologyBiochemistry, Genetics and Molecular Biology-Genetics
CiteScore
21.00
自引率
3.30%
发文量
241
审稿时长
2 months
期刊介绍:
Genome Biology stands as a premier platform for exceptional research across all domains of biology and biomedicine, explored through a genomic and post-genomic lens.
With an impressive impact factor of 12.3 (2022),* the journal secures its position as the 3rd-ranked research journal in the Genetics and Heredity category and the 2nd-ranked research journal in the Biotechnology and Applied Microbiology category by Thomson Reuters. Notably, Genome Biology holds the distinction of being the highest-ranked open-access journal in this category.
Our dedicated team of highly trained in-house Editors collaborates closely with our esteemed Editorial Board of international experts, ensuring the journal remains on the forefront of scientific advances and community standards. Regular engagement with researchers at conferences and institute visits underscores our commitment to staying abreast of the latest developments in the field.