Anticlustering for sample allocation to minimize batch effects.

IF 4.5 Q1 BIOCHEMICAL RESEARCH METHODS
Martin Papenberg, Cheng Wang, Maïgane Diop, Syed Hassan Bukhari, Boris Oskotsky, Brittany R Davidson, Kim Chi Vo, Binya Liu, Juan C Irwin, Alexis J Combes, Brice Gaudilliere, Jingjing Li, David K Stevenson, Gunnar W Klau, Linda C Giudice, Marina Sirota, Tomiko T Oskotsky
{"title":"Anticlustering for sample allocation to minimize batch effects.","authors":"Martin Papenberg, Cheng Wang, Maïgane Diop, Syed Hassan Bukhari, Boris Oskotsky, Brittany R Davidson, Kim Chi Vo, Binya Liu, Juan C Irwin, Alexis J Combes, Brice Gaudilliere, Jingjing Li, David K Stevenson, Gunnar W Klau, Linda C Giudice, Marina Sirota, Tomiko T Oskotsky","doi":"10.1016/j.crmeth.2025.101137","DOIUrl":null,"url":null,"abstract":"<p><p>High-throughput sequencing enables efficient processing of DNA and RNA samples in batches, but batch effects can obscure true biological signal. We propose using anticlustering as an automated method to assign samples to balanced batches, minimizing covariate imbalance and supporting user-defined constraints such as batch size, number of batches, and \"must-link\" assignments. In simulations, anticlustering outperforms existing methods in assigning balanced batches. We illustrate its utility using a real-life example from the University of California, San Francisco (UCSF)-Stanford Endometriosis Center for Discovery, Innovation, Training and Community Engagement (ENACT) Center, where multiple samples per individual required processing within the same batch to avoid confounding. The Two-Phase Must-Link (2PML) anticlustering algorithm realized the must-link restrictions while balancing disease stage, menstrual cycle phase, case vs. control, and clinical site. All methods are accessible via the free, open-source R package anticlust, with a companion RShiny web app for visualization and interactive batch assignment.</p>","PeriodicalId":29773,"journal":{"name":"Cell Reports Methods","volume":"5 8","pages":"101137"},"PeriodicalIF":4.5000,"publicationDate":"2025-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12461633/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cell Reports Methods","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1016/j.crmeth.2025.101137","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

High-throughput sequencing enables efficient processing of DNA and RNA samples in batches, but batch effects can obscure true biological signal. We propose using anticlustering as an automated method to assign samples to balanced batches, minimizing covariate imbalance and supporting user-defined constraints such as batch size, number of batches, and "must-link" assignments. In simulations, anticlustering outperforms existing methods in assigning balanced batches. We illustrate its utility using a real-life example from the University of California, San Francisco (UCSF)-Stanford Endometriosis Center for Discovery, Innovation, Training and Community Engagement (ENACT) Center, where multiple samples per individual required processing within the same batch to avoid confounding. The Two-Phase Must-Link (2PML) anticlustering algorithm realized the must-link restrictions while balancing disease stage, menstrual cycle phase, case vs. control, and clinical site. All methods are accessible via the free, open-source R package anticlust, with a companion RShiny web app for visualization and interactive batch assignment.

反聚类的样本分配,以尽量减少批量影响。
高通量测序能够有效地处理DNA和RNA样品的批量,但批量效应可以掩盖真正的生物信号。我们建议使用反聚类作为一种自动方法来分配样本到平衡批次,最小化协变量不平衡,并支持用户定义的约束,如批次大小、批次数量和“必须链接”分配。在模拟中,反聚类在分配平衡批次方面优于现有方法。我们用加州大学旧金山分校(UCSF)-斯坦福子宫内膜异位症发现、创新、培训和社区参与中心(ENACT)中心的一个现实例子来说明它的实用性,其中每个人需要在同一批次中处理多个样本以避免混淆。两阶段必须链接(2PML)反聚类算法在平衡疾病阶段、月经周期阶段、病例与对照组、临床部位的同时实现了必须链接的限制。所有的方法都可以通过免费的、开源的R包anticlust访问,还有一个配套的RShiny web应用程序,用于可视化和交互式批处理分配。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Cell Reports Methods
Cell Reports Methods Chemistry (General), Biochemistry, Genetics and Molecular Biology (General), Immunology and Microbiology (General)
CiteScore
3.80
自引率
0.00%
发文量
0
审稿时长
111 days
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信