NanoMGT: Marker gene typing of low complexity mono-species metagenomic samples using noisy long reads.

IF 1.3 Q3 BIOCHEMICAL RESEARCH METHODS

Biology Methods and Protocols Pub Date : 2024-08-06 eCollection Date: 2024-01-01 DOI:10.1093/biomethods/bpae057

Malte B Hallgren, Philip T L C Clausen, Frank M Aarestrup

{"title":"NanoMGT: Marker gene typing of low complexity mono-species metagenomic samples using noisy long reads.","authors":"Malte B Hallgren, Philip T L C Clausen, Frank M Aarestrup","doi":"10.1093/biomethods/bpae057","DOIUrl":null,"url":null,"abstract":"<p><p>Rapid advancements in sequencing technologies have led to significant progress in microbial genomics, yet challenges persist in accurately identifying microbial strain diversity in metagenomic samples, especially when working with noisy long-read data from platforms like Oxford Nanopore Technologies (ONT). In this article, we introduce NanoMGT, a tool designed to enhance marker gene typing in low-complexity mono-species samples, leveraging the unique properties of long reads. NanoMGT excels in its ability to accurately identify mutations amidst high error rates, ensuring the reliable detection of multiple strain-specific marker genes. Our tool implements a novel scoring system that rewards mutations co-occurring across different reads and penalizes densely grouped, likely erroneous variants, thereby achieving a good balance between sensitivity and precision. A comparative evaluation of NanoMGT, using a simulated multi-strain sample of seven bacterial species, demonstrated superior performance relative to existing tools and the advantages of using a threshold-based filtering approach to calling minority variants in ONT's sequencing data. NanoMGT's potential as a post-binning tool in metagenomic pipelines is particularly notable, enabling researchers to more accurately determine specific alleles and understand strain diversity in microbial communities. Our findings have significant implications for clinical diagnostics, environmental microbiology, and the broader field of genomics. The findings offer a reliable and efficient approach to marker gene typing in complex metagenomic samples.</p>","PeriodicalId":36528,"journal":{"name":"Biology Methods and Protocols","volume":"9 1","pages":"bpae057"},"PeriodicalIF":1.3000,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11387619/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biology Methods and Protocols","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/biomethods/bpae057","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"Q3","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Rapid advancements in sequencing technologies have led to significant progress in microbial genomics, yet challenges persist in accurately identifying microbial strain diversity in metagenomic samples, especially when working with noisy long-read data from platforms like Oxford Nanopore Technologies (ONT). In this article, we introduce NanoMGT, a tool designed to enhance marker gene typing in low-complexity mono-species samples, leveraging the unique properties of long reads. NanoMGT excels in its ability to accurately identify mutations amidst high error rates, ensuring the reliable detection of multiple strain-specific marker genes. Our tool implements a novel scoring system that rewards mutations co-occurring across different reads and penalizes densely grouped, likely erroneous variants, thereby achieving a good balance between sensitivity and precision. A comparative evaluation of NanoMGT, using a simulated multi-strain sample of seven bacterial species, demonstrated superior performance relative to existing tools and the advantages of using a threshold-based filtering approach to calling minority variants in ONT's sequencing data. NanoMGT's potential as a post-binning tool in metagenomic pipelines is particularly notable, enabling researchers to more accurately determine specific alleles and understand strain diversity in microbial communities. Our findings have significant implications for clinical diagnostics, environmental microbiology, and the broader field of genomics. The findings offer a reliable and efficient approach to marker gene typing in complex metagenomic samples.

查看原文本刊更多论文

NanoMGT：使用噪声长读数对低复杂度单物种元基因组样本进行标记基因分型。

测序技术的飞速发展使微生物基因组学取得了重大进展，然而在元基因组样本中准确鉴定微生物菌株多样性的挑战依然存在，尤其是在处理牛津纳米孔技术公司（ONT）等平台的嘈杂长读数数据时。在本文中，我们将介绍 NanoMGT，这是一种旨在利用长读数的独特特性加强低复杂度单物种样本中标记基因分型的工具。NanoMGT 能够在高错误率中准确识别突变，确保可靠地检测多个菌株特异性标记基因。我们的工具采用了一种新颖的评分系统，奖励在不同读数中共同出现的突变，惩罚密集分组的、可能是错误的变异，从而在灵敏度和精确度之间实现了良好的平衡。利用七种细菌的模拟多菌株样本对 NanoMGT 进行了比较评估，结果表明它的性能优于现有工具，而且使用基于阈值的过滤方法来调用 ONT 测序数据中的少数变异具有优势。NanoMGT 作为元基因组管道中的后分选工具的潜力尤为显著，它能让研究人员更准确地确定特定等位基因，了解微生物群落中的菌株多样性。我们的研究结果对临床诊断、环境微生物学和更广泛的基因组学领域具有重要意义。这些发现为在复杂的元基因组样本中进行标记基因分型提供了一种可靠而高效的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊