Lindsay Goulet, Florian Plaza Oñate, Alexandre Famechon, Benoît Quinquis, Eugeni Belda, Edi Prifti, Emmanuelle Le Chatelier, Guillaume Gautreau
{"title":"CroCoDeEL: accurate control-free detection of cross-sample contamination in metagenomic data.","authors":"Lindsay Goulet, Florian Plaza Oñate, Alexandre Famechon, Benoît Quinquis, Eugeni Belda, Edi Prifti, Emmanuelle Le Chatelier, Guillaume Gautreau","doi":"10.1038/s41467-026-72637-9","DOIUrl":null,"url":null,"abstract":"<p><p>Metagenomic sequencing provides insights into microbial communities, but it can be compromised by technical biases, including cross-sample contamination. This phenomenon arises when microbial content is inadvertently exchanged among concurrently processed samples, distorting microbial profiles and compromising the reliability of metagenomic data and downstream analyses. Existing detection methods rely on negative controls, which are insufficiently used and do not detect cross-contamination within non-control samples. Meanwhile, strain-level bioinformatics approaches do not distinguish contamination from natural strain sharing and lack sensitivity. To fill this gap, we introduce CroCoDeEL, a decision-support tool for detecting and quantifying cross-sample contamination. Leveraging linear modeling and a pre-trained supervised model, CroCoDeEL identifies specific contamination patterns in species abundance profiles. It requires no negative controls or prior knowledge of sample processing positions, offering improved accuracy and versatility. Benchmarks across three public datasets demonstrate that CroCoDeEL can detect contaminated samples and identify their contamination sources, even at low rates (<0.1%), provided sufficient sequencing depth. Application of CroCoDeEL to several existing studies reveals previously undetected contamination.</p>","PeriodicalId":19066,"journal":{"name":"Nature Communications","volume":" ","pages":""},"PeriodicalIF":15.7000,"publicationDate":"2026-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nature Communications","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1038/s41467-026-72637-9","RegionNum":1,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
Metagenomic sequencing provides insights into microbial communities, but it can be compromised by technical biases, including cross-sample contamination. This phenomenon arises when microbial content is inadvertently exchanged among concurrently processed samples, distorting microbial profiles and compromising the reliability of metagenomic data and downstream analyses. Existing detection methods rely on negative controls, which are insufficiently used and do not detect cross-contamination within non-control samples. Meanwhile, strain-level bioinformatics approaches do not distinguish contamination from natural strain sharing and lack sensitivity. To fill this gap, we introduce CroCoDeEL, a decision-support tool for detecting and quantifying cross-sample contamination. Leveraging linear modeling and a pre-trained supervised model, CroCoDeEL identifies specific contamination patterns in species abundance profiles. It requires no negative controls or prior knowledge of sample processing positions, offering improved accuracy and versatility. Benchmarks across three public datasets demonstrate that CroCoDeEL can detect contaminated samples and identify their contamination sources, even at low rates (<0.1%), provided sufficient sequencing depth. Application of CroCoDeEL to several existing studies reveals previously undetected contamination.
期刊介绍:
Nature Communications, an open-access journal, publishes high-quality research spanning all areas of the natural sciences. Papers featured in the journal showcase significant advances relevant to specialists in each respective field. With a 2-year impact factor of 16.6 (2022) and a median time of 8 days from submission to the first editorial decision, Nature Communications is committed to rapid dissemination of research findings. As a multidisciplinary journal, it welcomes contributions from biological, health, physical, chemical, Earth, social, mathematical, applied, and engineering sciences, aiming to highlight important breakthroughs within each domain.