Computational Study Protocol: Leveraging Synthetic Data to Validate a Benchmark Study for Differential Abundance Tests for 16S Microbiome Sequencing Data.

Q2 Pharmacology, Toxicology and Pharmaceutics
F1000Research Pub Date : 2025-01-02 eCollection Date: 2024-01-01 DOI:10.12688/f1000research.155230.2
Eva Kohnert, Clemens Kreutz
{"title":"Computational Study Protocol: Leveraging Synthetic Data to Validate a Benchmark Study for Differential Abundance Tests for 16S Microbiome Sequencing Data.","authors":"Eva Kohnert, Clemens Kreutz","doi":"10.12688/f1000research.155230.2","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Synthetic data's utility in benchmark studies depends on its ability to closely mimic real-world conditions and reproduce results obtained from experimental data. Building on Nearing et al.'s study (1), who assessed 14 differential abundance tests using 38 experimental 16S rRNA datasets in a case-control design, we are generating synthetic datasets that mimic the experimental data to verify their findings. We will employ statistical tests to rigorously assess the similarity between synthetic and experimental data and to validate the conclusions on the performance of these tests drawn by Nearing et al. (1). This protocol adheres to the SPIRIT guidelines, demonstrating how established reporting frameworks can support robust, transparent, and unbiased study planning.</p><p><strong>Methods: </strong>We replicate Nearing et al.'s (1) methodology, incorporating synthetic data simulated using two distinct tools, mirroring the 38 experimental datasets. Equivalence tests will be conducted on a non-redundant subset of 46 data characteristics comparing synthetic and experimental data, complemented by principal component analysis for overall similarity assessment. The 14 differential abundance tests will be applied to synthetic and experimental datasets, evaluating the consistency of significant feature identification and the number of significant features per tool. Correlation analysis and multiple regression will explore how differences between synthetic and experimental data characteristics may affect the results.</p><p><strong>Conclusions: </strong>Synthetic data enables the validation of findings through controlled experiments. We assess how well synthetic data replicates experimental data, try to validate previous findings with the most recent versions of the DA methods and delineate the strengths and limitations of synthetic data in benchmark studies. Moreover, to our knowledge this is the first computational benchmark study to systematically incorporate synthetic data for validating differential abundance methods while strictly adhering to a pre-specified study protocol following SPIRIT guidelines, contributing to transparency, reproducibility, and unbiased research.</p>","PeriodicalId":12260,"journal":{"name":"F1000Research","volume":"13 ","pages":"1180"},"PeriodicalIF":0.0000,"publicationDate":"2025-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11757917/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"F1000Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.12688/f1000research.155230.2","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"Pharmacology, Toxicology and Pharmaceutics","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Synthetic data's utility in benchmark studies depends on its ability to closely mimic real-world conditions and reproduce results obtained from experimental data. Building on Nearing et al.'s study (1), who assessed 14 differential abundance tests using 38 experimental 16S rRNA datasets in a case-control design, we are generating synthetic datasets that mimic the experimental data to verify their findings. We will employ statistical tests to rigorously assess the similarity between synthetic and experimental data and to validate the conclusions on the performance of these tests drawn by Nearing et al. (1). This protocol adheres to the SPIRIT guidelines, demonstrating how established reporting frameworks can support robust, transparent, and unbiased study planning.

Methods: We replicate Nearing et al.'s (1) methodology, incorporating synthetic data simulated using two distinct tools, mirroring the 38 experimental datasets. Equivalence tests will be conducted on a non-redundant subset of 46 data characteristics comparing synthetic and experimental data, complemented by principal component analysis for overall similarity assessment. The 14 differential abundance tests will be applied to synthetic and experimental datasets, evaluating the consistency of significant feature identification and the number of significant features per tool. Correlation analysis and multiple regression will explore how differences between synthetic and experimental data characteristics may affect the results.

Conclusions: Synthetic data enables the validation of findings through controlled experiments. We assess how well synthetic data replicates experimental data, try to validate previous findings with the most recent versions of the DA methods and delineate the strengths and limitations of synthetic data in benchmark studies. Moreover, to our knowledge this is the first computational benchmark study to systematically incorporate synthetic data for validating differential abundance methods while strictly adhering to a pre-specified study protocol following SPIRIT guidelines, contributing to transparency, reproducibility, and unbiased research.

求助全文
约1分钟内获得全文 求助全文
来源期刊
F1000Research
F1000Research Pharmacology, Toxicology and Pharmaceutics-Pharmacology, Toxicology and Pharmaceutics (all)
CiteScore
5.00
自引率
0.00%
发文量
1646
审稿时长
1 weeks
期刊介绍: F1000Research publishes articles and other research outputs reporting basic scientific, scholarly, translational and clinical research across the physical and life sciences, engineering, medicine, social sciences and humanities. F1000Research is a scholarly publication platform set up for the scientific, scholarly and medical research community; each article has at least one author who is a qualified researcher, scholar or clinician actively working in their speciality and who has made a key contribution to the article. Articles must be original (not duplications). All research is suitable irrespective of the perceived level of interest or novelty; we welcome confirmatory and negative results, as well as null studies. F1000Research publishes different type of research, including clinical trials, systematic reviews, software tools, method articles, and many others. Reviews and Opinion articles providing a balanced and comprehensive overview of the latest discoveries in a particular field, or presenting a personal perspective on recent developments, are also welcome. See the full list of article types we accept for more information.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信