Nelson J Johansen, Niklas Kempynck, Nathan R Zemke, Saroja Somasundaram, Seppe De Winter, Marcus Hooper, Deepanjali Dwivedi, Ruchi Lohia, Fabien Wehbe, Bocheng Li, Darina Abaffyová, Ethan J Armand, Julie De Man, Eren Can Eksi, Nikolai Hecker, Gert Hulselmans, Vasilis Konstantakos, David Mauduit, John K Mich, Gabriele Partel, Tanya L Daigle, Boaz P Levi, Kai Zhang, Yoshiaki Tanaka, Jesse Gillis, Jonathan T Ting, Yoav Ben-Simon, Jeremy Miller, Joseph R Ecker, Bing Ren, Stein Aerts, Ed S Lein, Bosiljka Tasic, Trygve E Bakken
{"title":"Evaluating Methods for the Prediction of Cell Type-Specific Enhancers in the Mammalian Cortex.","authors":"Nelson J Johansen, Niklas Kempynck, Nathan R Zemke, Saroja Somasundaram, Seppe De Winter, Marcus Hooper, Deepanjali Dwivedi, Ruchi Lohia, Fabien Wehbe, Bocheng Li, Darina Abaffyová, Ethan J Armand, Julie De Man, Eren Can Eksi, Nikolai Hecker, Gert Hulselmans, Vasilis Konstantakos, David Mauduit, John K Mich, Gabriele Partel, Tanya L Daigle, Boaz P Levi, Kai Zhang, Yoshiaki Tanaka, Jesse Gillis, Jonathan T Ting, Yoav Ben-Simon, Jeremy Miller, Joseph R Ecker, Bing Ren, Stein Aerts, Ed S Lein, Bosiljka Tasic, Trygve E Bakken","doi":"10.1101/2024.08.21.609075","DOIUrl":"10.1101/2024.08.21.609075","url":null,"abstract":"<p><p>Identifying cell type-specific enhancers in the brain is critical to building genetic tools for investigating the mammalian brain. Computational methods for functional enhancer prediction have been proposed and validated in the fruit fly and not yet the mammalian brain. We organized the 'Brain Initiative Cell Census Network (BICCN) Challenge: Predicting Functional Cell Type-Specific Enhancers from Cross-Species Multi-Omics' to assess machine learning and feature-based methods designed to nominate enhancer DNA sequences to target cell types in the mouse cortex. Methods were evaluated based on <i>in vivo</i> validation data from hundreds of cortical cell type-specific enhancers that were previously packaged into individual AAV vectors and retro-orbitally injected into mice. We find that open chromatin was a key predictor of functional enhancers, and sequence models improved prediction of non-functional enhancers that can be deprioritized as opposed to pursued for <i>in vivo</i> testing. Sequence models also identified cell type-specific transcription factor codes that can guide designs of <i>in silico</i> enhancers. This community challenge establishes a benchmark for enhancer prioritization algorithms and reveals computational approaches and molecular information that are crucial for identifying functional enhancers in mammalian cortical cell types. The results of this challenge bring us closer to understanding the complex gene regulatory landscape of the mammalian cortex and to designing more efficient genetic tools to target cortical cell types.</p>","PeriodicalId":519960,"journal":{"name":"bioRxiv : the preprint server for biology","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11370467/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142127998","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MuMu: a sample multiplexing protocol for droplet-based simultaneous single nuclei RNA- and ATAC-seq systems.","authors":"Zhen Li, Tarik F Haydar","doi":"10.1101/2024.11.27.625728","DOIUrl":"10.1101/2024.11.27.625728","url":null,"abstract":"<p><p>Sample multiplexing is a common approach to reduce experimental cost and technical batch effect. Here, we present a protocol that for the first time allows the pooling of single nuclei from multiple biological samples prior to performing simultaneous single nuclei RNA-seq and ATAC-seq, which we term <i>Mu</i> ltiplexed <i>Mu</i> ltiome (MuMu). We describe steps for assembling the custom Tn5 transposome, performing the transposition reaction, nuclei pooling, sequencing library preparation, and sequencing data pre-processing. This protocol will greatly reduce the cost of sn-Multiome.</p><p><strong>Graphical abstract: </strong></p>","PeriodicalId":519960,"journal":{"name":"bioRxiv : the preprint server for biology","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11623611/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142804405","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuanye Chi, Joshua M Mitchell, Shujian Zheng, Shuzhao Li
{"title":"Systematic pre-annotation explains the \"dark matter\" in LC-MS metabolomics.","authors":"Yuanye Chi, Joshua M Mitchell, Shujian Zheng, Shuzhao Li","doi":"10.1101/2025.02.04.636472","DOIUrl":"10.1101/2025.02.04.636472","url":null,"abstract":"<p><p>The majority of features in global metabolomics from high-resolution mass spectrometry are typically not identified, referred as the \"dark matter\". Are these features real compounds or junk? Understanding this problem is critical to the annotation and interpretation of metabolomics data and future development of the field. Recent debates also brought attention to in-source fragments, which appear to be prevalent in spectral databases. We report here a systematic analysis of 61 representative public datasets from LC-MS metabolomics, the most common data type in biomedical studies. The results indicate that in-source fragments contribute to less than 10% of features in LC-MS metabolomics. Khipu-based pre-annotation shows that majority of abundant features have identifiable ion patterns. This suggests that the \"dark matter\" in LC-MS metabolomics is explainable in an abundance dependent manner; most features are from real compounds; the number of compounds is much smaller than that of features; most compounds are yet to be identified.</p>","PeriodicalId":519960,"journal":{"name":"bioRxiv : the preprint server for biology","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11838597/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143461487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lore Depuydt, Omar Y Ahmed, Jan Fostier, Ben Langmead, Travis Gagie
{"title":"Run-length compressed metagenomic read classification with SMEM-finding and tagging.","authors":"Lore Depuydt, Omar Y Ahmed, Jan Fostier, Ben Langmead, Travis Gagie","doi":"10.1101/2025.02.25.640119","DOIUrl":"10.1101/2025.02.25.640119","url":null,"abstract":"<p><p>Metagenomic read classification is a fundamental task in computational biology, yet it remains challenging due to the scale, diversity, and complexity of sequencing datasets. We propose a novel, run-length compressed index based on the move structure that enables efficient multi-class metagenomic classification in <i>O</i> ( <i>r</i> ) space, where <i>r</i> is the number of character runs in the BWT of the reference text. Our method identifies all super-maximal exact matches (SMEMs) of length at least <i>L</i> between a read and the reference dataset and associates each SMEM with one class identifier using a sampled tag array. A consensus algorithm then compacts these SMEMs with their class identifier into a single classification per read. We are the first to perform run-length compressed read classification based on full SMEMs instead of semi-SMEMs. We evaluate our approach on both long and short reads in two conceptually distinct datasets: a large bacterial pan-genome with few metagenomic classes and a smaller 16S rRNA gene database spanning thousands of genera or classes. Our method consistently outperforms SPUMONI 2 in accuracy and runtime while maintaining the same asymptotic memory complexity of <i>O</i> ( <i>r</i> ). Compared to Cliffy, we demonstrate better memory efficiency while achieving superior accuracy on the simpler dataset and comparable performance on the more complex one. Overall, our implementation carefully balances accuracy, runtime, and memory usage, offering a versatile solution for metagenomic classification across diverse datasets. The open-source C++11 implementation is available at https://github.com/biointec/tagger under the AGPL-3.0 license.</p>","PeriodicalId":519960,"journal":{"name":"bioRxiv : the preprint server for biology","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11888359/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143589492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Eunah Chung, Fariba Nosrati, Mike Adam, Andrew Potter, Mohammed Sayed, Christopher Ahn, Benjamin D Humphreys, Hee-Woong Lim, Yueh-Chiang Hu, S Steven Potter, Joo-Seop Park
{"title":"Proximal tubule cells contribute to the thin descending limb of the loop of Henle during mouse kidney development.","authors":"Eunah Chung, Fariba Nosrati, Mike Adam, Andrew Potter, Mohammed Sayed, Christopher Ahn, Benjamin D Humphreys, Hee-Woong Lim, Yueh-Chiang Hu, S Steven Potter, Joo-Seop Park","doi":"10.1101/2025.01.14.633065","DOIUrl":"10.1101/2025.01.14.633065","url":null,"abstract":"<p><strong>Background: </strong>The thin descending limb of the loop of Henle is crucial for urine concentration, as it facilitates passive water reabsorption. Despite its importance, little is known about how this nephron segment forms during kidney development.</p><p><strong>Methods: </strong>We assembled a large single-cell RNA sequencing (scRNA-seq) dataset by integrating multiple datasets of non-mutant developing mouse kidneys to identify developing thin descending limb cells. To test whether those cells originate from proximal tubule cells, we generated a proximal tubule-specific Cre line, <i>Slc34a1eGFPCre</i> , and conducted lineage tracing. Additionally, given that the transcription factor Hnf4a directly binds to the <i>Aqp1</i> gene, we examined whether the loss of Hnf4a affects <i>Aqp1</i> expression in thin descending limb cells.</p><p><strong>Results: </strong>From our scRNA-seq dataset, we identified a small cluster of cells distinct from both the proximal tubule and the thick ascending limb of the loop of Henle. Those cells exhibited high expression of thin descending limb marker genes, including <i>Aqp1</i> and <i>Bst1</i> . Notably, a subset of proximal tubule cells also expressed thin descending limb marker genes, suggesting that proximal tubule cells may give rise to thin descending limb cells. Using lineage tracing with the <i>Slc34a1eGFPCre</i> line, we found that, at least, a subset of thin descending limb cells are descendants of proximal tubule cells. Furthermore, the loss of Hnf4a, a transcription factor essential for mature proximal tubule cell formation, disrupted proper <i>Aqp1</i> expression in thin descending limb cells, providing additional evidence of a developmental link between proximal tubule cells and thin descending limb cells.</p><p><strong>Conclusion: </strong>Our findings shed new light on the developmental origin of thin descending limb cells and highlight the importance of Hnf4a in regulating their formation.</p><p><strong>Key points: </strong>Reference single cell RNA-seq dataset of the developing mouse kidney was assembled and used to identify the thin descending limb of the loop of Henle.Lineage analysis of proximal tubules in the mouse kidney shows that proximal tubule cells give rise to the thin descending limb of the loop of Henle. Deletion of Hnf4a disrupts the expression of <i>Aqp1</i> in the thin descending limb of the loop of Henle, highlighting a developmental link between proximal tubules and the loop of Henle.</p>","PeriodicalId":519960,"journal":{"name":"bioRxiv : the preprint server for biology","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11761803/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143049527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Chemoproteomic Profiling of PKA Substrates with Kinase-catalyzed Crosslinking and Immunoprecipitation (K-CLIP).","authors":"H J Bremer, M K H Pflum","doi":"10.1101/2025.03.23.644825","DOIUrl":"https://doi.org/10.1101/2025.03.23.644825","url":null,"abstract":"<p><p>Phosphorylation is a highly regulated protein post-translational modification catalyzed by kinases. Kinases and phosphorylated proteins are key players in a myriad of cellular events, including cell signaling. When cell signaling networks are improperly regulated by kinases, various pathologies can arise, such as cancers and neurodegenerative disease. With critical roles in normal and disease biology, kinase-substrate interactions must be thoroughly characterized. Previously, the chemoproteomic method, kinase-catalyzed crosslinking and immunoprecipitation (K-CLIP), was developed to identify the kinases of a phosphoprotein substrate of interest. Here, K-CLIP was modified to profile the substrates of a kinase of interest. Specifically, the substrate profile of cAMP-dependent protein kinase (PKA) was studied with K-CLIP using a new ATP analog, ATP-alkyne aryl azide. Kinase-focused K-CLIP discovered SMC3 as a PKA substrate. With versatility for any kinase or phosphoprotein substrate of interest, K-CLIP will expand our understanding of kinase-mediated cell biology in healthy and diseased states.</p>","PeriodicalId":519960,"journal":{"name":"bioRxiv : the preprint server for biology","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11957104/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143756951","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alekhya M Govindaraju, Norma Cecilia Martinez-Gomez
{"title":"Aromatic acid metabolism in <i>Methylobacterium extorquens</i> reveals interplay between methylotrophic and heterotrophic pathways.","authors":"Alekhya M Govindaraju, Norma Cecilia Martinez-Gomez","doi":"10.1101/2025.03.22.644763","DOIUrl":"https://doi.org/10.1101/2025.03.22.644763","url":null,"abstract":"<p><p>Efforts towards microbial conversion of lignin to value-added products face many challenges because lignin's methoxylated aromatic monomers release toxic C <sub>1</sub> byproducts such as formaldehyde. The ability to grow on methoxylated aromatic acids (e.g., vanillic acid) has recently been identified in certain clades of methylotrophs, bacteria characterized by their unique ability to tolerate and metabolize high concentrations of formaldehyde. Here, we use a phyllosphere methylotroph isolate, <i>Methylobacterium extorquens</i> SLI 505, as a model to identify the fate of formaldehyde during methylotrophic growth on vanillic acids. <i>M. extorquens</i> SLI 505 displays concentration-dependent growth phenotypes on vanillic acid without concomitant formaldehyde accumulation. We conclude that <i>M. extorquens</i> SLI 505 overcomes potential metabolic bottlenecks from simultaneous assimilation of multicarbon and C <sub>1</sub> intermediates by allocating formaldehyde towards dissimilation and assimilating the ring carbons of vanillic acid heterotrophically. We correlate this strategy with maximization of bioenergetic yields and demonstrate that formaldehyde dissimilation for energy generation rather than formaldehyde detoxification is advantageous for growth on aromatic acids. <i>M. extorquens</i> SLI 505 also exhibits catabolite repression during growth on methanol and low concentrations of vanillic acid, but no diauxie during growth on methanol and high concentrations of vanillic acid. Results from this study outline metabolic strategies employed by <i>M. extorquens</i> SLI 505 for growth on a complex single substrate that generates both C <sub>1</sub> and multicarbon intermediates and emphasizes the robustness of <i>M. extorquens</i> for biotechnological applications for lignin valorization.</p>","PeriodicalId":519960,"journal":{"name":"bioRxiv : the preprint server for biology","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11957125/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143756997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Graham Gower, Nathaniel S Pope, Murillo F Rodrigues, Silas Tittes, Linh N Tran, Ornob Alam, Maria Izabel A Cavassim, Peter D Fields, Benjamin C Haller, Xin Huang, Ben Jeffrey, Kevin Korfmann, Christopher C Kyriazis, Jiseon Min, Inés Rebollo, Clara T Rehmann, Scott T Small, Chris C R Smith, Georgia Tsambos, Yan Wong, Yu Zhang, Christian D Huber, Gregor Gorjanc, Aaron P Ragsdale, Ilan Gronau, Ryan N Gutenkunst, Jerome Kelleher, Kirk E Lohmueller, Daniel R Schrider, Peter L Ralph, Andrew D Kern
{"title":"Accessible, realistic genome simulation with selection using stdpopsim.","authors":"Graham Gower, Nathaniel S Pope, Murillo F Rodrigues, Silas Tittes, Linh N Tran, Ornob Alam, Maria Izabel A Cavassim, Peter D Fields, Benjamin C Haller, Xin Huang, Ben Jeffrey, Kevin Korfmann, Christopher C Kyriazis, Jiseon Min, Inés Rebollo, Clara T Rehmann, Scott T Small, Chris C R Smith, Georgia Tsambos, Yan Wong, Yu Zhang, Christian D Huber, Gregor Gorjanc, Aaron P Ragsdale, Ilan Gronau, Ryan N Gutenkunst, Jerome Kelleher, Kirk E Lohmueller, Daniel R Schrider, Peter L Ralph, Andrew D Kern","doi":"10.1101/2025.03.23.644823","DOIUrl":"https://doi.org/10.1101/2025.03.23.644823","url":null,"abstract":"<p><p>Selection is a fundamental evolutionary force that shapes patterns of genetic variation across species. However, simulations incorporating realistic selection along heterogeneous genomes in complex demographic histories are challenging, limiting our ability to benchmark statistical methods aimed at detecting selection and to explore theoretical predictions. stdpopsim is a community-maintained simulation library that already provides an extensive catalog of species-specific population genetic models. Here we present a major extension to the stdpopsim framework that enables simulation of various modes of selection, including background selection, selective sweeps, and arbitrary distributions of fitness effects (DFE) acting on annotated subsets of the genome (for instance, exons). This extension maintains stdpopsim 's core principles of reproducibility and accessibility while adding support for species-specific genomic annotations and published DFE estimates. We demonstrate the utility of this framework by benchmarking methods for demographic inference, DFE estimation, and selective sweep detection across several species and scenarios. Our results demonstrate the robustness of demographic inference methods to selection on linked sites, reveal the sensitivity of DFE-inference methods to model assumptions, and show how genomic features, like recombination rate and functional sequence density, influence power to detect selective sweeps. This extension to stdpopsim provides a powerful new resource for the population genetics community to explore the interplay between selection and other evolutionary forces in a reproducible, low-barrier framework.</p>","PeriodicalId":519960,"journal":{"name":"bioRxiv : the preprint server for biology","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11957135/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143757390","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kuan-Hao Chao, Alan Mao, Anqi Liu, Steven L Salzberg, Mihaela Pertea
{"title":"OpenSpliceAI: An efficient, modular implementation of SpliceAI enabling easy retraining on non-human species.","authors":"Kuan-Hao Chao, Alan Mao, Anqi Liu, Steven L Salzberg, Mihaela Pertea","doi":"10.1101/2025.03.20.644351","DOIUrl":"https://doi.org/10.1101/2025.03.20.644351","url":null,"abstract":"<p><p>The SpliceAI deep learning system is currently one of the most accurate methods for identifying splicing signals directly from DNA sequences. However, its utility is limited by its reliance on older software frameworks and human-centric training data. Here we introduce OpenSpliceAI, a trainable, open-source version of SpliceAI implemented in PyTorch to address these challenges. OpenSpliceAI supports both training from scratch and transfer learning, enabling seamless re-training on species-specific datasets and mitigating human-centric biases. Our experiments show that it achieves faster processing speeds and lower memory usage than the original SpliceAI code, allowing large-scale analyses of extensive genomic regions on a single GPU. Additionally, OpenSpliceAI's flexible architecture makes for easier integration with established machine learning ecosystems, simplifying the development of custom splicing models for different species and applications. We demonstrate that OpenSpliceAI's output is highly concordant with SpliceAI. <i>In silico</i> mutagenesis (ISM) analyses confirm that both models rely on similar sequence features, and calibration experiments demonstrate similar score probability estimates.</p>","PeriodicalId":519960,"journal":{"name":"bioRxiv : the preprint server for biology","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11957165/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143757338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nathan K Schaefer, Bryan J Pavlovic, Alex A Pollen
{"title":"CellBouncer, A Unified Toolkit for Single-Cell Demultiplexing and Ambient RNA Analysis, Reveals Hominid Mitochondrial Incompatibilities.","authors":"Nathan K Schaefer, Bryan J Pavlovic, Alex A Pollen","doi":"10.1101/2025.03.23.644821","DOIUrl":"https://doi.org/10.1101/2025.03.23.644821","url":null,"abstract":"<p><p>Pooled processing, in which cells from multiple sources are cultured or captured together, is an increasingly popular strategy for droplet-based single cell sequencing studies. This design allows efficient scaling of experiments, isolation of cell-intrinsic differences, and mitigation of batch effects. We present CellBouncer, a computational toolkit for demultiplexing and analyzing single-cell sequencing data from pooled experiments. We demonstrate that CellBouncer can separate and quantify multi-species and multi-individual cell mixtures, identify unknown mitochondrial haplotypes in cells, assign treatments from lipid-conjugated barcodes or CRISPR sgRNAs, and infer pool composition, outperforming existing methods. We also introduce methods to quantify ambient RNA contamination per cell, infer individual donors' contributions to the ambient RNA pool, and determine a consensus doublet rate harmonized across data types. Applying these tools to tetraploid composite cells, we identify a competitive advantage of human over chimpanzee mitochondria across 10 cell fusion lines and provide evidence for inter-mitochondrial incompatibility and mito-nuclear incompatibility between species.</p>","PeriodicalId":519960,"journal":{"name":"bioRxiv : the preprint server for biology","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11957168/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143756566","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}