使用 fibertools 进行 DNA-m6A 调用和综合长读数表观遗传学和基因分析。

IF 6.2 2区生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY

Genome research Pub Date : 2024-11-20 DOI:10.1101/gr.279095.124

Anupama Jha, Stephanie C Bohaczuk, Yizi Mao, Jane Ranchalis, Benjamin J Mallory, Alan T Min, Morgan O Hamm, Elliott Swanson, Danilo Dubocanin, Connor Finkbeiner, Tony Li, Dale Whittington, William Stafford Noble, Andrew B Stergachis, Mitchell R Vollger

{"title":"使用 fibertools 进行 DNA-m6A 调用和综合长读数表观遗传学和基因分析。","authors":"Anupama Jha, Stephanie C Bohaczuk, Yizi Mao, Jane Ranchalis, Benjamin J Mallory, Alan T Min, Morgan O Hamm, Elliott Swanson, Danilo Dubocanin, Connor Finkbeiner, Tony Li, Dale Whittington, William Stafford Noble, Andrew B Stergachis, Mitchell R Vollger","doi":"10.1101/gr.279095.124","DOIUrl":null,"url":null,"abstract":"Long-read DNA sequencing has recently emerged as a powerful tool for studying both genetic and epigenetic architectures at single-molecule and single-nucleotide resolution. Long-read epigenetic studies encompass both the direct identification of native cytosine methylation and the identification of exogenously placed DNA N 6 -methyladenine (DNA-m6A). However, detecting DNA-m6A modifications using single-molecule sequencing, as well as coprocessing single-molecule genetic and epigenetic architectures, is limited by computational demands and a lack of supporting tools. Here, we introduce fibertools, a state-of-the-art toolkit that features a semisupervised convolutional neural network for fast and accurate identification of m6A-marked bases using Pacific Biosciences (PacBio) single-molecule long-read sequencing, as well as the coprocessing of long-read genetic and epigenetic data produced using either the PacBio or Oxford Nanopore Technologies (ONT) sequencing platforms. We demonstrate accurate DNA-m6A identification (>90% precision and recall) along >20 kb long DNA molecules with an ∼1000-fold improvement in speed. In addition, we demonstrate that fibertools can readily integrate genetic and epigenetic data at single-molecule resolution, including the seamless conversion between molecular and reference coordinate systems, allowing for accurate genetic and epigenetic analyses of long-read data within structurally and somatically variable genomic regions.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":" ","pages":"1976-1986"},"PeriodicalIF":6.2000,"publicationDate":"2024-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11610455/pdf/","citationCount":"0","resultStr":"{\"title\":\"DNA-m6A calling and integrated long-read epigenetic and genetic analysis with fibertools.\",\"authors\":\"Anupama Jha, Stephanie C Bohaczuk, Yizi Mao, Jane Ranchalis, Benjamin J Mallory, Alan T Min, Morgan O Hamm, Elliott Swanson, Danilo Dubocanin, Connor Finkbeiner, Tony Li, Dale Whittington, William Stafford Noble, Andrew B Stergachis, Mitchell R Vollger\",\"doi\":\"10.1101/gr.279095.124\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Long-read DNA sequencing has recently emerged as a powerful tool for studying both genetic and epigenetic architectures at single-molecule and single-nucleotide resolution. Long-read epigenetic studies encompass both the direct identification of native cytosine methylation and the identification of exogenously placed DNA N 6 -methyladenine (DNA-m6A). However, detecting DNA-m6A modifications using single-molecule sequencing, as well as coprocessing single-molecule genetic and epigenetic architectures, is limited by computational demands and a lack of supporting tools. Here, we introduce fibertools, a state-of-the-art toolkit that features a semisupervised convolutional neural network for fast and accurate identification of m6A-marked bases using Pacific Biosciences (PacBio) single-molecule long-read sequencing, as well as the coprocessing of long-read genetic and epigenetic data produced using either the PacBio or Oxford Nanopore Technologies (ONT) sequencing platforms. We demonstrate accurate DNA-m6A identification (>90% precision and recall) along >20 kb long DNA molecules with an ∼1000-fold improvement in speed. In addition, we demonstrate that fibertools can readily integrate genetic and epigenetic data at single-molecule resolution, including the seamless conversion between molecular and reference coordinate systems, allowing for accurate genetic and epigenetic analyses of long-read data within structurally and somatically variable genomic regions.\",\"PeriodicalId\":12678,\"journal\":{\"name\":\"Genome research\",\"volume\":\" \",\"pages\":\"1976-1986\"},\"PeriodicalIF\":6.2000,\"publicationDate\":\"2024-11-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11610455/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Genome research\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1101/gr.279095.124\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genome research","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1101/gr.279095.124","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

长线DNA测序最近已成为以单分子和单核苷酸分辨率研究遗传和表观遗传结构的有力工具。长读表观遗传学研究既包括直接鉴定原生胞嘧啶甲基化，也包括鉴定外源 DNA N6-甲基腺嘌呤（DNA-m6A）。然而，利用单分子测序检测DNA-m6A修饰，以及共同处理单分子遗传和表观遗传结构，都受到计算需求和支持工具缺乏的限制。在这里，我们介绍了最先进的工具包 fibertools，它采用半监督卷积神经网络，利用 PacBio 单分子长读数测序技术快速准确地识别 m6A 标记碱基，并对利用 PacBio 或 Oxford Nanopore 测序平台产生的长读数遗传和表观遗传数据进行协同处理。我们展示了对长度大于 20 千碱基的 DNA 分子进行精确的 DNA-m6A 鉴定（精确度和召回率大于 90%），速度提高了约 1000 倍。此外，我们还证明了 fibertools 能以单分子分辨率轻松整合遗传和表观遗传数据，包括分子坐标系和参考坐标系之间的无缝转换，从而能在结构和体细胞可变的基因组区域内对长读数据进行准确的遗传和表观遗传分析。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

DNA-m6A calling and integrated long-read epigenetic and genetic analysis with fibertools.

Long-read DNA sequencing has recently emerged as a powerful tool for studying both genetic and epigenetic architectures at single-molecule and single-nucleotide resolution. Long-read epigenetic studies encompass both the direct identification of native cytosine methylation and the identification of exogenously placed DNA N ⁶ -methyladenine (DNA-m6A). However, detecting DNA-m6A modifications using single-molecule sequencing, as well as coprocessing single-molecule genetic and epigenetic architectures, is limited by computational demands and a lack of supporting tools. Here, we introduce fibertools, a state-of-the-art toolkit that features a semisupervised convolutional neural network for fast and accurate identification of m6A-marked bases using Pacific Biosciences (PacBio) single-molecule long-read sequencing, as well as the coprocessing of long-read genetic and epigenetic data produced using either the PacBio or Oxford Nanopore Technologies (ONT) sequencing platforms. We demonstrate accurate DNA-m6A identification (>90% precision and recall) along >20 kb long DNA molecules with an ∼1000-fold improvement in speed. In addition, we demonstrate that fibertools can readily integrate genetic and epigenetic data at single-molecule resolution, including the seamless conversion between molecular and reference coordinate systems, allowing for accurate genetic and epigenetic analyses of long-read data within structurally and somatically variable genomic regions.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Genome research 生物-生化与分子生物学

CiteScore

12.40

自引率

1.40%

发文量

140

审稿时长

6 months

期刊介绍： Launched in 1995, Genome Research is an international, continuously published, peer-reviewed journal that focuses on research that provides novel insights into the genome biology of all organisms, including advances in genomic medicine. Among the topics considered by the journal are genome structure and function, comparative genomics, molecular evolution, genome-scale quantitative and population genetics, proteomics, epigenomics, and systems biology. The journal also features exciting gene discoveries and reports of cutting-edge computational biology and high-throughput methodologies. New data in these areas are published as research papers, or methods and resource reports that provide novel information on technologies or tools that will be of interest to a broad readership. Complete data sets are presented electronically on the journal''s web site where appropriate. The journal also provides Reviews, Perspectives, and Insight/Outlook articles, which present commentary on the latest advances published both here and elsewhere, placing such progress in its broader biological context.