Assessing accuracy and specificity of faecal source library for microbial source-tracking, using SourceTracker as case study.

IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY
Bioinformatics advances Pub Date : 2025-04-29 eCollection Date: 2025-01-01 DOI:10.1093/bioadv/vbaf103
Timothy J Y Lim, Yussi M Palacios Delgado, Anna Lintern, David T McCarthy, Rebekah Henry
{"title":"Assessing accuracy and specificity of faecal source library for microbial source-tracking, using SourceTracker as case study.","authors":"Timothy J Y Lim, Yussi M Palacios Delgado, Anna Lintern, David T McCarthy, Rebekah Henry","doi":"10.1093/bioadv/vbaf103","DOIUrl":null,"url":null,"abstract":"<p><strong>Motivation: </strong>Understanding the quality of the source library prior to undertaking library-dependent microbial source-tracking (MST) is an essential, but often overlooked, primary analysis step.</p><p><strong>Results: </strong>We propose an assessment approach to validate the quality of amplicon-derived faecal source libraries. This approach was demonstrated on a faecal source library consisting of 16S rRNA paired-end amplicon sequences, obtained from various animal types in Victoria, Australia. First, a leave-one-out (LOO) analysis was performed to assess the accuracy of source category groupings by identifying the number of samples incorrectly assigned to a different source category (i.e. animal type). Following a quality control procedure to decide retaining/removing/grouping incorrectly assigned samples, we then assessed if the sample sizes for each source type were sufficient to properly characterize the source fingerprints. Results from LOO demonstrated 15.5% of samples were incorrectly assigned, with high error rates in birds and wallabies within our source library. Increasing the sample size improved source identification accuracy. However, accuracy eventually plateaued in a source-specific manner. Importantly, this highlights the importance of conducting thorough assessments to understand the quality and limitations of the source library prior to library-dependent MST applications.</p><p><strong>Availability and implementation: </strong>QIIME2 is available via https://qiime2.org/; SourceTracker v2.0.1 is available via https://github.com/caporaso-lab/sourcetracker2; Pipeline for LOO is available via https://github.com/MonashOWL/Bioinformatics-IlluminaMGI/tree/main/16S/LOO; Pipeline for sample size assessment is available via https://github.com/MonashOWL/Bioinformatics-IlluminaMGI/tree/main/16S/Source%20variability.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf103"},"PeriodicalIF":2.8000,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12092083/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics advances","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioadv/vbaf103","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Motivation: Understanding the quality of the source library prior to undertaking library-dependent microbial source-tracking (MST) is an essential, but often overlooked, primary analysis step.

Results: We propose an assessment approach to validate the quality of amplicon-derived faecal source libraries. This approach was demonstrated on a faecal source library consisting of 16S rRNA paired-end amplicon sequences, obtained from various animal types in Victoria, Australia. First, a leave-one-out (LOO) analysis was performed to assess the accuracy of source category groupings by identifying the number of samples incorrectly assigned to a different source category (i.e. animal type). Following a quality control procedure to decide retaining/removing/grouping incorrectly assigned samples, we then assessed if the sample sizes for each source type were sufficient to properly characterize the source fingerprints. Results from LOO demonstrated 15.5% of samples were incorrectly assigned, with high error rates in birds and wallabies within our source library. Increasing the sample size improved source identification accuracy. However, accuracy eventually plateaued in a source-specific manner. Importantly, this highlights the importance of conducting thorough assessments to understand the quality and limitations of the source library prior to library-dependent MST applications.

Availability and implementation: QIIME2 is available via https://qiime2.org/; SourceTracker v2.0.1 is available via https://github.com/caporaso-lab/sourcetracker2; Pipeline for LOO is available via https://github.com/MonashOWL/Bioinformatics-IlluminaMGI/tree/main/16S/LOO; Pipeline for sample size assessment is available via https://github.com/MonashOWL/Bioinformatics-IlluminaMGI/tree/main/16S/Source%20variability.

以SourceTracker为例,评估粪便源库用于微生物源追踪的准确性和特异性。
动机:在进行依赖于文库的微生物源跟踪(MST)之前,了解源文库的质量是必不可少的,但经常被忽视的主要分析步骤。结果:我们提出了一种评估方法来验证扩增子衍生的粪便源库的质量。该方法在澳大利亚维多利亚州的一个由16S rRNA配对端扩增子序列组成的粪便源文库上得到了验证。首先,通过确定错误地分配到不同来源类别(即动物类型)的样本数量,进行了留一(LOO)分析,以评估来源类别分组的准确性。遵循质量控制程序来决定保留/移除/分组错误分配的样本,然后我们评估每个源类型的样本量是否足以正确表征源指纹。LOO的结果表明,15.5%的样本被错误分配,其中鸟类和小袋鼠的错误率很高。增加样本量提高了源识别的准确性。然而,准确度最终会以特定于源代码的方式趋于稳定。重要的是,这强调了在依赖于库的MST应用程序之前进行彻底评估以了解源库的质量和限制的重要性。可用性和实现:QIIME2可通过https://qiime2.org/获得;SourceTracker v2.0.1可通过https://github.com/caporaso-lab/sourcetracker2获得;LOO的管道可通过https://github.com/MonashOWL/Bioinformatics-IlluminaMGI/tree/main/16S/LOO获得;样本量评估的管道可通过https://github.com/MonashOWL/Bioinformatics-IlluminaMGI/tree/main/16S/Source%20variability获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
1.60
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信