PeakMatcher

R. J. Nowling, C. R. Beal, Scott J. Emrich, S. Behura, M. Halfon, M. Duman-Scheel
{"title":"PeakMatcher","authors":"R. J. Nowling, C. R. Beal, Scott J. Emrich, S. Behura, M. Halfon, M. Duman-Scheel","doi":"10.1145/3388440.3414907","DOIUrl":null,"url":null,"abstract":"When reference genome assemblies are updated, the peaks from DNA enrichment assays such as ChIP-Seq and FAIRE-Seq need to be called again using the new genome assembly. PeakMatcher is an open-source package that aids in validation by matching peaks across two genome assemblies using the alignment of reads or within the same genome. PeakMatcher calculates recall and precision while also outputting lists of peak-to-peak matches. PeakMatcher uses read alignments to match peaks across genome assemblies. PeakMatcher finds all read aligned to one genome that overlap with a given list of peaks. PeakMatcher uses the read names to locate where those reads are aligned against a second genome. Lastly, all peaks called against the second genome that overlap with the aligned reads are found and output. PeakMatcher groups uses the peak-read-peak relationships to discover 1-to-1, 1-to-many, and many-to-many relationships. Overlap queries are performed with interval trees for maximum efficiency. We evaluated PeakMatcher on two data sets. The first data set was FAIRE-Seq (Formaldehyde-Assisted Isolation of Regulatory Elements Sequencing) of DNA isolated embyros of the mosquito Aedes aegypti [2, 4]. We implemented a peak calling pipeline and validated it on the older (highly fragmented) AaegL3 assembly [5]. PeakMatcher matched 92.9% (precision) of the 121,594 previously-called peaks from [2, 4] with 89.4% (recall) of the 124,959 peaks called with our new pipeline. Next, we applied the peak-calling pipeline to call FAIRE peaks using the newer, chromosome-complete AaegL5 assembly [3]. PeakMatcher found matches for 14 of the 16 experimentally-validated AaegL3 FAIRE peaks from [2, 4]. We validated the matches by comparing nearby genes across the genomes. Nearby genes were consistent for 11 of the 14 peaks; inconsistencies for at least two of the remaining peaks were clearly attributable to differences in assemblies. When applied to all of the peaks, Peak-Matcher matched 78.8% (precision) of the 124,959 AaegL3 peaks with 76.7% (recall) of the 128,307 AaegL5 peaks. The second data set was STARR-Seq (Self-Transcribing Active Regulatory Region Sequencing) of Drosophila melanogaster DNA in S2 culture cells [1]. We called STARR peaks against two versions (dm3 and r5.53) of the D. melanogaster genome [6]. PeakMatcher matched 77.4% (precision) of the 4,195 dm3 peaks with 94.8% (recall) of the 3,114 r5.53 peaks. PeakMatcher and associated documentation are available on GitHub (https://github.com/rnowling/peak-matcher) under the open-source Apache Software License v2. PeakMatcher was written in Python 3 using the intervaltree library.","PeriodicalId":411338,"journal":{"name":"Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3388440.3414907","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

When reference genome assemblies are updated, the peaks from DNA enrichment assays such as ChIP-Seq and FAIRE-Seq need to be called again using the new genome assembly. PeakMatcher is an open-source package that aids in validation by matching peaks across two genome assemblies using the alignment of reads or within the same genome. PeakMatcher calculates recall and precision while also outputting lists of peak-to-peak matches. PeakMatcher uses read alignments to match peaks across genome assemblies. PeakMatcher finds all read aligned to one genome that overlap with a given list of peaks. PeakMatcher uses the read names to locate where those reads are aligned against a second genome. Lastly, all peaks called against the second genome that overlap with the aligned reads are found and output. PeakMatcher groups uses the peak-read-peak relationships to discover 1-to-1, 1-to-many, and many-to-many relationships. Overlap queries are performed with interval trees for maximum efficiency. We evaluated PeakMatcher on two data sets. The first data set was FAIRE-Seq (Formaldehyde-Assisted Isolation of Regulatory Elements Sequencing) of DNA isolated embyros of the mosquito Aedes aegypti [2, 4]. We implemented a peak calling pipeline and validated it on the older (highly fragmented) AaegL3 assembly [5]. PeakMatcher matched 92.9% (precision) of the 121,594 previously-called peaks from [2, 4] with 89.4% (recall) of the 124,959 peaks called with our new pipeline. Next, we applied the peak-calling pipeline to call FAIRE peaks using the newer, chromosome-complete AaegL5 assembly [3]. PeakMatcher found matches for 14 of the 16 experimentally-validated AaegL3 FAIRE peaks from [2, 4]. We validated the matches by comparing nearby genes across the genomes. Nearby genes were consistent for 11 of the 14 peaks; inconsistencies for at least two of the remaining peaks were clearly attributable to differences in assemblies. When applied to all of the peaks, Peak-Matcher matched 78.8% (precision) of the 124,959 AaegL3 peaks with 76.7% (recall) of the 128,307 AaegL5 peaks. The second data set was STARR-Seq (Self-Transcribing Active Regulatory Region Sequencing) of Drosophila melanogaster DNA in S2 culture cells [1]. We called STARR peaks against two versions (dm3 and r5.53) of the D. melanogaster genome [6]. PeakMatcher matched 77.4% (precision) of the 4,195 dm3 peaks with 94.8% (recall) of the 3,114 r5.53 peaks. PeakMatcher and associated documentation are available on GitHub (https://github.com/rnowling/peak-matcher) under the open-source Apache Software License v2. PeakMatcher was written in Python 3 using the intervaltree library.
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信