Aryana-bs：亚硫酸酯测序读取的上下文感知校准。

IF 3.3 3区生物学 Q2 BIOCHEMICAL RESEARCH METHODS

BMC Bioinformatics Pub Date : 2025-07-21 DOI:10.1186/s12859-025-06182-5

Hassan Nikaein, Ali Sharifi-Zarchi, Afsoon Afzal, Saeedeh Ezzati, Farzane Rasti, Hamidreza Chitsaz, Govindarajan Kunde-Ramamoorthy

{"title":"Aryana-bs：亚硫酸酯测序读取的上下文感知校准。","authors":"Hassan Nikaein, Ali Sharifi-Zarchi, Afsoon Afzal, Saeedeh Ezzati, Farzane Rasti, Hamidreza Chitsaz, Govindarajan Kunde-Ramamoorthy","doi":"10.1186/s12859-025-06182-5","DOIUrl":null,"url":null,"abstract":"Background: DNA methylation is essential in various biological processes, including imprinting, development, inflammation, and numerous disorders, such as cancer. Bisulfite sequencing (BS) serves as the gold standard for measuring DNA methylation at single-base resolution by converting unmethylated cytosines to thymines while leaving methylated cytosines intact. However, this C-to-T conversion presents a well-known challenge in conventional short-read aligners, which treat these conversions as substitutions. Many aligners that require seed sequences fail when frequent C-to-T conversions occur over short distances, resulting in reduced alignment accuracy. To address this challenge, two alignment methods have been well established: three-letter alignment and wildcard alignment. Three-letter alignment faces the significant issue of data loss by converting all thymines to cytosines, which obscures meaningful information. On the other hand, wildcard alignment introduces a biased alignment, failing to treat reads from unmethylated and methylated regions equally, leading to artifacts in methylation level estimation and inaccuracies in quantifying DNA methylation. This work introduces ARYANA-BS, a novel BS aligner that diverges from conventional DNA aligners by directly integrating BS-specific base alterations within its alignment engine. Leveraging known DNA methylation patterns across different genomic contexts, ARYANA-BS constructs five indexes from the reference genome, aligns each read to all indexes, and selects the alignment with the minimum penalty. To further refine alignment accuracy, an optional Expectation-Maximization (EM) step is incorporated, which integrates methylation probability information into the decision-making process for choosing the optimal index for each read. This approach aims to enhance BS read alignment accuracy by accommodating the complexities of DNA methylation patterns across diverse genomic contexts.Results: Experimental evaluations on both simulated and real data reveal that ARYANA-BS achieves state-of-the-art accuracy, maintaining competitive speed and memory efficiency.Conclusions: ARYANA-BS significantly improves alignment accuracy for bisulfite sequencing data by effectively integrating DNA methylation-specific alterations and genomic context. It outperforms existing methods, such as BSMAP, bwa-meth, Bismark, BSBolt, and abismal, particularly in robustness against genomic biases and alignment of longer, higher-error reads, demonstrating suitability for cancer research and cell-free DNA studies. While the Expectation-Maximization (EM) algorithm provides only modest initial improvements, it establishes a valuable framework for future refinement and potential enhancements in sensitive applications.","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"188"},"PeriodicalIF":3.3000,"publicationDate":"2025-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12281798/pdf/","citationCount":"0","resultStr":"{\"title\":\"Aryana-bs: context-aware alignment of bisulfite-sequencing reads.\",\"authors\":\"Hassan Nikaein, Ali Sharifi-Zarchi, Afsoon Afzal, Saeedeh Ezzati, Farzane Rasti, Hamidreza Chitsaz, Govindarajan Kunde-Ramamoorthy\",\"doi\":\"10.1186/s12859-025-06182-5\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: DNA methylation is essential in various biological processes, including imprinting, development, inflammation, and numerous disorders, such as cancer. Bisulfite sequencing (BS) serves as the gold standard for measuring DNA methylation at single-base resolution by converting unmethylated cytosines to thymines while leaving methylated cytosines intact. However, this C-to-T conversion presents a well-known challenge in conventional short-read aligners, which treat these conversions as substitutions. Many aligners that require seed sequences fail when frequent C-to-T conversions occur over short distances, resulting in reduced alignment accuracy. To address this challenge, two alignment methods have been well established: three-letter alignment and wildcard alignment. Three-letter alignment faces the significant issue of data loss by converting all thymines to cytosines, which obscures meaningful information. On the other hand, wildcard alignment introduces a biased alignment, failing to treat reads from unmethylated and methylated regions equally, leading to artifacts in methylation level estimation and inaccuracies in quantifying DNA methylation. This work introduces ARYANA-BS, a novel BS aligner that diverges from conventional DNA aligners by directly integrating BS-specific base alterations within its alignment engine. Leveraging known DNA methylation patterns across different genomic contexts, ARYANA-BS constructs five indexes from the reference genome, aligns each read to all indexes, and selects the alignment with the minimum penalty. To further refine alignment accuracy, an optional Expectation-Maximization (EM) step is incorporated, which integrates methylation probability information into the decision-making process for choosing the optimal index for each read. This approach aims to enhance BS read alignment accuracy by accommodating the complexities of DNA methylation patterns across diverse genomic contexts.Results: Experimental evaluations on both simulated and real data reveal that ARYANA-BS achieves state-of-the-art accuracy, maintaining competitive speed and memory efficiency.Conclusions: ARYANA-BS significantly improves alignment accuracy for bisulfite sequencing data by effectively integrating DNA methylation-specific alterations and genomic context. It outperforms existing methods, such as BSMAP, bwa-meth, Bismark, BSBolt, and abismal, particularly in robustness against genomic biases and alignment of longer, higher-error reads, demonstrating suitability for cancer research and cell-free DNA studies. While the Expectation-Maximization (EM) algorithm provides only modest initial improvements, it establishes a valuable framework for future refinement and potential enhancements in sensitive applications.\",\"PeriodicalId\":8958,\"journal\":{\"name\":\"BMC Bioinformatics\",\"volume\":\"26 1\",\"pages\":\"188\"},\"PeriodicalIF\":3.3000,\"publicationDate\":\"2025-07-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12281798/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BMC Bioinformatics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1186/s12859-025-06182-5\",\"RegionNum\":3,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s12859-025-06182-5","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

摘要

背景：DNA甲基化在各种生物过程中都是必不可少的，包括印迹、发育、炎症和许多疾病，如癌症。亚硫酸氢盐测序（BS）通过将未甲基化的胞嘧啶转化为胸腺嘧啶，同时保持甲基化的胞嘧啶完整，作为单碱基分辨率测量DNA甲基化的金标准。然而，这种C-to-T转换在传统的短读校准器中提出了一个众所周知的挑战，它将这些转换视为替换。许多需要种子序列的校准器在短距离内发生频繁的c到t转换时失败，导致校准精度降低。为了解决这个问题，已经建立了两种对齐方法：三字母对齐和通配符对齐。通过将所有胸腺嘧啶转换为胞嘧啶，三字母对齐面临着数据丢失的重大问题，这掩盖了有意义的信息。另一方面，通配符比对引入了偏差比对，不能平等地对待来自非甲基化和甲基化区域的读取，导致甲基化水平估计中的伪像和量化DNA甲基化的不准确性。这项工作介绍了ARYANA-BS，一种新型的BS比对器，它通过在其比对引擎中直接整合BS特异性碱基改变而与传统的DNA比对器不同。利用不同基因组背景下已知的DNA甲基化模式，ARYANA-BS从参考基因组构建了5个索引，将每个读取的数据与所有索引进行比对，并选择惩罚最小的比对。为了进一步提高比对精度，我们采用了一个可选的期望最大化（EM）步骤，该步骤将甲基化概率信息集成到决策过程中，以选择每次读取的最佳索引。该方法旨在通过适应不同基因组背景下DNA甲基化模式的复杂性来提高BS读取比对的准确性。结果：模拟和真实数据的实验评估表明，ARYANA-BS达到了最先进的精度，保持了具有竞争力的速度和内存效率。结论：ARYANA-BS通过有效整合DNA甲基化特异性改变和基因组背景，显著提高亚硫酸盐测序数据的比对准确性。它优于现有的方法，如BSMAP， bwa-meth, Bismark， BSBolt和abismal，特别是在抗基因组偏差和长，高误差读取的校准方面，证明了癌症研究和无细胞DNA研究的适用性。虽然期望最大化（EM）算法只提供了适度的初始改进，但它为敏感应用程序的未来改进和潜在增强建立了一个有价值的框架。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Aryana-bs: context-aware alignment of bisulfite-sequencing reads.

查看原文本刊更多论文

Aryana-bs: context-aware alignment of bisulfite-sequencing reads.

Background: DNA methylation is essential in various biological processes, including imprinting, development, inflammation, and numerous disorders, such as cancer. Bisulfite sequencing (BS) serves as the gold standard for measuring DNA methylation at single-base resolution by converting unmethylated cytosines to thymines while leaving methylated cytosines intact. However, this C-to-T conversion presents a well-known challenge in conventional short-read aligners, which treat these conversions as substitutions. Many aligners that require seed sequences fail when frequent C-to-T conversions occur over short distances, resulting in reduced alignment accuracy. To address this challenge, two alignment methods have been well established: three-letter alignment and wildcard alignment. Three-letter alignment faces the significant issue of data loss by converting all thymines to cytosines, which obscures meaningful information. On the other hand, wildcard alignment introduces a biased alignment, failing to treat reads from unmethylated and methylated regions equally, leading to artifacts in methylation level estimation and inaccuracies in quantifying DNA methylation. This work introduces ARYANA-BS, a novel BS aligner that diverges from conventional DNA aligners by directly integrating BS-specific base alterations within its alignment engine. Leveraging known DNA methylation patterns across different genomic contexts, ARYANA-BS constructs five indexes from the reference genome, aligns each read to all indexes, and selects the alignment with the minimum penalty. To further refine alignment accuracy, an optional Expectation-Maximization (EM) step is incorporated, which integrates methylation probability information into the decision-making process for choosing the optimal index for each read. This approach aims to enhance BS read alignment accuracy by accommodating the complexities of DNA methylation patterns across diverse genomic contexts.

Results: Experimental evaluations on both simulated and real data reveal that ARYANA-BS achieves state-of-the-art accuracy, maintaining competitive speed and memory efficiency.

Conclusions: ARYANA-BS significantly improves alignment accuracy for bisulfite sequencing data by effectively integrating DNA methylation-specific alterations and genomic context. It outperforms existing methods, such as BSMAP, bwa-meth, Bismark, BSBolt, and abismal, particularly in robustness against genomic biases and alignment of longer, higher-error reads, demonstrating suitability for cancer research and cell-free DNA studies. While the Expectation-Maximization (EM) algorithm provides only modest initial improvements, it establishes a valuable framework for future refinement and potential enhancements in sensitive applications.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

BMC Bioinformatics 生物-生化研究方法

CiteScore

5.70

自引率

3.30%

发文量

506

审稿时长

4.3 months

期刊介绍： BMC Bioinformatics is an open access, peer-reviewed journal that considers articles on all aspects of the development, testing and novel application of computational and statistical methods for the modeling and analysis of all kinds of biological data, as well as other areas of computational biology. BMC Bioinformatics is part of the BMC series which publishes subject-specific journals focused on the needs of individual research communities across all areas of biology and medicine. We offer an efficient, fair and friendly peer review service, and are committed to publishing all sound science, provided that there is some advance in knowledge presented by the work.