Timothy J Y Lim, Yussi M Palacios Delgado, Anna Lintern, David T McCarthy, Rebekah Henry
{"title":"以SourceTracker为例,评估粪便源库用于微生物源追踪的准确性和特异性。","authors":"Timothy J Y Lim, Yussi M Palacios Delgado, Anna Lintern, David T McCarthy, Rebekah Henry","doi":"10.1093/bioadv/vbaf103","DOIUrl":null,"url":null,"abstract":"<p><strong>Motivation: </strong>Understanding the quality of the source library prior to undertaking library-dependent microbial source-tracking (MST) is an essential, but often overlooked, primary analysis step.</p><p><strong>Results: </strong>We propose an assessment approach to validate the quality of amplicon-derived faecal source libraries. This approach was demonstrated on a faecal source library consisting of 16S rRNA paired-end amplicon sequences, obtained from various animal types in Victoria, Australia. First, a leave-one-out (LOO) analysis was performed to assess the accuracy of source category groupings by identifying the number of samples incorrectly assigned to a different source category (i.e. animal type). Following a quality control procedure to decide retaining/removing/grouping incorrectly assigned samples, we then assessed if the sample sizes for each source type were sufficient to properly characterize the source fingerprints. Results from LOO demonstrated 15.5% of samples were incorrectly assigned, with high error rates in birds and wallabies within our source library. Increasing the sample size improved source identification accuracy. However, accuracy eventually plateaued in a source-specific manner. Importantly, this highlights the importance of conducting thorough assessments to understand the quality and limitations of the source library prior to library-dependent MST applications.</p><p><strong>Availability and implementation: </strong>QIIME2 is available via https://qiime2.org/; SourceTracker v2.0.1 is available via https://github.com/caporaso-lab/sourcetracker2; Pipeline for LOO is available via https://github.com/MonashOWL/Bioinformatics-IlluminaMGI/tree/main/16S/LOO; Pipeline for sample size assessment is available via https://github.com/MonashOWL/Bioinformatics-IlluminaMGI/tree/main/16S/Source%20variability.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf103"},"PeriodicalIF":2.8000,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12092083/pdf/","citationCount":"0","resultStr":"{\"title\":\"Assessing accuracy and specificity of faecal source library for microbial source-tracking, using SourceTracker as case study.\",\"authors\":\"Timothy J Y Lim, Yussi M Palacios Delgado, Anna Lintern, David T McCarthy, Rebekah Henry\",\"doi\":\"10.1093/bioadv/vbaf103\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Motivation: </strong>Understanding the quality of the source library prior to undertaking library-dependent microbial source-tracking (MST) is an essential, but often overlooked, primary analysis step.</p><p><strong>Results: </strong>We propose an assessment approach to validate the quality of amplicon-derived faecal source libraries. This approach was demonstrated on a faecal source library consisting of 16S rRNA paired-end amplicon sequences, obtained from various animal types in Victoria, Australia. First, a leave-one-out (LOO) analysis was performed to assess the accuracy of source category groupings by identifying the number of samples incorrectly assigned to a different source category (i.e. animal type). Following a quality control procedure to decide retaining/removing/grouping incorrectly assigned samples, we then assessed if the sample sizes for each source type were sufficient to properly characterize the source fingerprints. Results from LOO demonstrated 15.5% of samples were incorrectly assigned, with high error rates in birds and wallabies within our source library. Increasing the sample size improved source identification accuracy. However, accuracy eventually plateaued in a source-specific manner. Importantly, this highlights the importance of conducting thorough assessments to understand the quality and limitations of the source library prior to library-dependent MST applications.</p><p><strong>Availability and implementation: </strong>QIIME2 is available via https://qiime2.org/; SourceTracker v2.0.1 is available via https://github.com/caporaso-lab/sourcetracker2; Pipeline for LOO is available via https://github.com/MonashOWL/Bioinformatics-IlluminaMGI/tree/main/16S/LOO; Pipeline for sample size assessment is available via https://github.com/MonashOWL/Bioinformatics-IlluminaMGI/tree/main/16S/Source%20variability.</p>\",\"PeriodicalId\":72368,\"journal\":{\"name\":\"Bioinformatics advances\",\"volume\":\"5 1\",\"pages\":\"vbaf103\"},\"PeriodicalIF\":2.8000,\"publicationDate\":\"2025-04-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12092083/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Bioinformatics advances\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1093/bioadv/vbaf103\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"MATHEMATICAL & COMPUTATIONAL BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics advances","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioadv/vbaf103","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
Assessing accuracy and specificity of faecal source library for microbial source-tracking, using SourceTracker as case study.
Motivation: Understanding the quality of the source library prior to undertaking library-dependent microbial source-tracking (MST) is an essential, but often overlooked, primary analysis step.
Results: We propose an assessment approach to validate the quality of amplicon-derived faecal source libraries. This approach was demonstrated on a faecal source library consisting of 16S rRNA paired-end amplicon sequences, obtained from various animal types in Victoria, Australia. First, a leave-one-out (LOO) analysis was performed to assess the accuracy of source category groupings by identifying the number of samples incorrectly assigned to a different source category (i.e. animal type). Following a quality control procedure to decide retaining/removing/grouping incorrectly assigned samples, we then assessed if the sample sizes for each source type were sufficient to properly characterize the source fingerprints. Results from LOO demonstrated 15.5% of samples were incorrectly assigned, with high error rates in birds and wallabies within our source library. Increasing the sample size improved source identification accuracy. However, accuracy eventually plateaued in a source-specific manner. Importantly, this highlights the importance of conducting thorough assessments to understand the quality and limitations of the source library prior to library-dependent MST applications.
Availability and implementation: QIIME2 is available via https://qiime2.org/; SourceTracker v2.0.1 is available via https://github.com/caporaso-lab/sourcetracker2; Pipeline for LOO is available via https://github.com/MonashOWL/Bioinformatics-IlluminaMGI/tree/main/16S/LOO; Pipeline for sample size assessment is available via https://github.com/MonashOWL/Bioinformatics-IlluminaMGI/tree/main/16S/Source%20variability.