Bioinformatics (Oxford, England)最新文献

筛选
英文 中文
NLSDeconv: an efficient cell-type deconvolution method for spatial transcriptomics data. NLSDeconv:一种有效的细胞型反褶积方法,用于空间转录组学数据。
Bioinformatics (Oxford, England) Pub Date : 2024-12-26 DOI: 10.1093/bioinformatics/btae747
Yunlu Chen, Feng Ruan, Ji-Ping Wang
{"title":"NLSDeconv: an efficient cell-type deconvolution method for spatial transcriptomics data.","authors":"Yunlu Chen, Feng Ruan, Ji-Ping Wang","doi":"10.1093/bioinformatics/btae747","DOIUrl":"10.1093/bioinformatics/btae747","url":null,"abstract":"<p><strong>Summary: </strong>Spatial transcriptomics (ST) allows gene expression profiling within intact tissue samples but lacks single-cell resolution. This necessitates computational deconvolution methods to estimate the contributions of distinct cell types. This article introduces NLSDeconv, a novel cell-type deconvolution method based on non-negative least squares, along with an accompanying Python package. Benchmarking against 18 existing deconvolution methods on various ST datasets demonstrates NLSDeconv's competitive statistical performance and superior computational efficiency.</p><p><strong>Availability and implementation: </strong>NLSDeconv is freely available at https://github.com/tinachentc/NLSDeconv as a Python package.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11696698/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142869804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
QuICSeedR: an R package for analyzing fluorophore-assisted seed amplification assay data. QuICSeedR:用于分析荧光团辅助种子扩增分析数据的R包。
Bioinformatics (Oxford, England) Pub Date : 2024-12-26 DOI: 10.1093/bioinformatics/btae752
Manci Li, Damani N Bryant, Sarah Gresch, Marissa S Milstein, Peter R Christenson, Stuart S Lichtenberg, Peter A Larsen, Sang-Hyun Oh
{"title":"QuICSeedR: an R package for analyzing fluorophore-assisted seed amplification assay data.","authors":"Manci Li, Damani N Bryant, Sarah Gresch, Marissa S Milstein, Peter R Christenson, Stuart S Lichtenberg, Peter A Larsen, Sang-Hyun Oh","doi":"10.1093/bioinformatics/btae752","DOIUrl":"10.1093/bioinformatics/btae752","url":null,"abstract":"<p><strong>Motivation: </strong>Fluorophore-assisted seed amplification assays (F-SAAs), such as real-time quaking-induced conversion (RT-QuIC) and fluorophore-assisted protein misfolding cyclic amplification (F-PMCA), have become indispensable tools for studying protein misfolding in neurodegenerative diseases. However, analyzing data generated by these techniques often requires complex and time-consuming manual processes. In addition, the lack of standardization in F-SAA data analysis presents a significant challenge to the interpretation and reproducibility of F-SAA results across different laboratories and studies. There is a need for automated, standardized analysis tools that can efficiently process F-SAA data while ensuring consistency and reliability across different research settings.</p><p><strong>Results: </strong>Here, we present QuICSeedR (pronounced as \"quick seeder\"), an R package that addresses these challenges by providing a comprehensive toolkit for the automated processing, analysis, and visualization of F-SAA data. Importantly, QuICSeedR also establishes the foundation for building an F-SAA data management and analysis framework, enabling more consistent and comparable results across different research groups.</p><p><strong>Availability and implementation: </strong>QuICSeedR is freely available at: https://CRAN.R-project.org/package=QuICSeedR. Data and code used in this manuscript are provided in Supplementary Materials.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11742141/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142883903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BetaAlign: a deep learning approach for multiple sequence alignment. BetaAlign:用于多序列对齐的深度学习方法。
Bioinformatics (Oxford, England) Pub Date : 2024-12-26 DOI: 10.1093/bioinformatics/btaf009
Edo Dotan, Elya Wygoda, Noa Ecker, Michael Alburquerque, Oren Avram, Yonatan Belinkov, Tal Pupko
{"title":"BetaAlign: a deep learning approach for multiple sequence alignment.","authors":"Edo Dotan, Elya Wygoda, Noa Ecker, Michael Alburquerque, Oren Avram, Yonatan Belinkov, Tal Pupko","doi":"10.1093/bioinformatics/btaf009","DOIUrl":"10.1093/bioinformatics/btaf009","url":null,"abstract":"<p><strong>Motivation: </strong>Multiple sequence alignments (MSAs) are extensively used in biology, from phylogenetic reconstruction to structure and function prediction. Here, we suggest an out-of-the-box approach for the inference of MSAs, which relies on algorithms developed for processing natural languages. We show that our artificial intelligence (AI)-based methodology can be trained to align sequences by processing alignments that are generated via simulations, and thus different aligners can be easily generated for datasets with specific evolutionary dynamics attributes. We expect that natural language processing (NLP) solutions will replace or augment classic solutions for computing alignments, and more generally, challenging inference tasks in phylogenomics.</p><p><strong>Results: </strong>The MSA problem is a fundamental pillar in bioinformatics, comparative genomics, and phylogenetics. Here, we characterize and improve BetaAlign, the first deep learning aligner, which substantially deviates from conventional algorithms of alignment computation. BetaAlign draws on NLP techniques and trains transformers to map a set of unaligned biological sequences to an MSA. We show that our approach is highly accurate, comparable and sometimes better than state-of-the-art alignment tools. We characterize the performance of BetaAlign and the effect of various aspects on accuracy; for example, the size of the training data, the effect of different transformer architectures, and the effect of learning on a subspace of indel-model parameters (subspace learning). We also introduce a new technique that leads to improved performance compared to our previous approach. Our findings further uncover the potential of NLP-based methods for sequence alignment, highlighting that AI-based algorithms can substantially challenge classic approaches in phylogenomics and bioinformatics.</p><p><strong>Availability and implementation: </strong>Datasets used in this work are available on HuggingFace (Wolf et al. Transformers: state-of-the-art natural language processing. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. p.38-45. 2020) at: https://huggingface.co/dotan1111. Source code is available at: https://github.com/idotan286/SimulateAlignments.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11758787/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142960263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PHIStruct: improving phage-host interaction prediction at low sequence similarity settings using structure-aware protein embeddings. PHIStruct:利用结构感知蛋白包埋在低序列相似设置下改善噬菌体-宿主相互作用预测。
Bioinformatics (Oxford, England) Pub Date : 2024-12-26 DOI: 10.1093/bioinformatics/btaf016
Mark Edward M Gonzales, Jennifer C Ureta, Anish M S Shrestha
{"title":"PHIStruct: improving phage-host interaction prediction at low sequence similarity settings using structure-aware protein embeddings.","authors":"Mark Edward M Gonzales, Jennifer C Ureta, Anish M S Shrestha","doi":"10.1093/bioinformatics/btaf016","DOIUrl":"10.1093/bioinformatics/btaf016","url":null,"abstract":"<p><strong>Motivation: </strong>Recent computational approaches for predicting phage-host interaction have explored the use of sequence-only protein language models to produce embeddings of phage proteins without manual feature engineering. However, these embeddings do not directly capture protein structure information and structure-informed signals related to host specificity.</p><p><strong>Results: </strong>We present PHIStruct, a multilayer perceptron that takes in structure-aware embeddings of receptor-binding proteins, generated via the structure-aware protein language model SaProt, and then predicts the host from among the ESKAPEE genera. Compared against recent tools, PHIStruct exhibits the best balance of precision and recall, with the highest and most stable F1 score across a wide range of confidence thresholds and sequence similarity settings. The margin in performance is most pronounced when the sequence similarity between the training and test sets drops below 40%, wherein, at a relatively high-confidence threshold of above 50%, PHIStruct presents a 7%-9% increase in class-averaged F1 over machine learning tools that do not directly incorporate structure information, as well as a 5%-6% increase over BLASTp.</p><p><strong>Availability and implementation: </strong>The data and source code for our experiments and analyses are available at https://github.com/bioinfodlsu/PHIStruct.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11783280/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142973825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Funmap: integrating high-dimensional functional annotations to improve fine-mapping. Funmap:集成高维函数注释,改进精细映射。
Bioinformatics (Oxford, England) Pub Date : 2024-12-26 DOI: 10.1093/bioinformatics/btaf017
Yuekai Li, Jiashun Xiao, Jingsi Ming, Yicheng Zeng, Mingxuan Cai
{"title":"Funmap: integrating high-dimensional functional annotations to improve fine-mapping.","authors":"Yuekai Li, Jiashun Xiao, Jingsi Ming, Yicheng Zeng, Mingxuan Cai","doi":"10.1093/bioinformatics/btaf017","DOIUrl":"10.1093/bioinformatics/btaf017","url":null,"abstract":"<p><strong>Motivation: </strong>Fine-mapping aims to prioritize causal variants underlying complex traits by accounting for the linkage disequilibrium of genome-wide association study risk locus. The expanding resources of functional annotations serve as auxiliary evidence to improve the power of fine-mapping. However, existing fine-mapping methods tend to generate many false positive results when integrating a large number of annotations.</p><p><strong>Results: </strong>In this study, we propose a unified method to integrate high-dimensional functional annotations with fine-mapping (Funmap). Funmap can effectively improve the power of fine-mapping by borrowing information from hundreds of functional annotations. Meanwhile, it relates the annotation to the causal probability with a random effects model that avoids the over-fitting issue, thereby producing a well-controlled false positive rate. Paired with a fast algorithm, Funmap enables scalable integration of a large number of annotations to facilitate prioritizing multiple causal single nucleotide polymorphisms. Our comprehensive simulations across a wide range of annotation relevance settings demonstrate that Funmap is the only method that produces well-calibrated false discovery rate under the setting of high-dimensional annotations while achieving better or comparable power gains as compared to existing methods. By integrating genome-wide association studies of 4 lipid traits with 187 functional annotations, Funmap consistently identified more variants that can be replicated in an independent cohort, achieving 15.5%-26.2% improvement over the runner-up in terms of replication rate.</p><p><strong>Availability and implementation: </strong>The Funmap software and all analysis code are available at https://github.com/LeeHITsz/Funmap.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11769679/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142973821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A hypercubic Mk model framework for capturing reversibility in disease, cancer, and evolutionary accumulation modelling. 在疾病、癌症和进化积累建模中捕捉可逆性的超立方 Mk 模型框架。
Bioinformatics (Oxford, England) Pub Date : 2024-12-26 DOI: 10.1093/bioinformatics/btae737
Iain G Johnston, Ramon Diaz-Uriarte
{"title":"A hypercubic Mk model framework for capturing reversibility in disease, cancer, and evolutionary accumulation modelling.","authors":"Iain G Johnston, Ramon Diaz-Uriarte","doi":"10.1093/bioinformatics/btae737","DOIUrl":"10.1093/bioinformatics/btae737","url":null,"abstract":"<p><strong>Motivation: </strong>Accumulation models, where a system progressively acquires binary features over time, are common in the study of cancer progression, evolutionary biology, and other fields. Many approaches have been developed to infer the accumulation pathways by which features (e.g. mutations) are acquired over time. However, most of these approaches do not support reversibility: the loss of a feature once it has been acquired (e.g. the clearing of a mutation from a tumor or population).</p><p><strong>Results: </strong>Here, we demonstrate how the well-established Mk model from evolutionary biology, embedded on a hypercubic transition graph, can be used to infer the dynamics of accumulation processes, including the possibility of reversible transitions, from data which may be uncertain and cross-sectional, longitudinal, or phylogenetically/phylogenomically embedded. Positive and negative interactions between arbitrary sets of features (not limited to pairwise interactions) are supported. We demonstrate this approach with synthetic datasets and real data on bacterial drug resistance and cancer progression. While this implementation is limited in the number of features that can be considered, we discuss how this limitation may be relaxed to deal with larger systems.</p><p><strong>Availability and implementation: </strong>The code implementing this setup in R is freely available at https://github.com/StochasticBiology/hypermk.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11681934/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142820215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MOLGENIS Armadillo: a lightweight server for federated analysis using DataSHIELD. MOLGENIS Armadillo:使用 DataSHIELD 进行联合分析的轻量级服务器。
Bioinformatics (Oxford, England) Pub Date : 2024-12-26 DOI: 10.1093/bioinformatics/btae726
Tim Cadman, Mariska K Slofstra, Marije A van der Geest, Demetris Avraam, Tom R P Bishop, Tommy de Boer, Liesbeth Duijts, Sido Haakma, Eleanor Hyde, Vincent Jaddoe, Tarik Karramass, Fleur Kelpin, Yannick Marcon, Angela Pinot de Moira, Dick Postma, Clemens Tolboom, Ruben L Veenstra, Stuart Wheater, Marieke Welten, Rebecca C Wilson, Erik Zwart, Morris Swertz
{"title":"MOLGENIS Armadillo: a lightweight server for federated analysis using DataSHIELD.","authors":"Tim Cadman, Mariska K Slofstra, Marije A van der Geest, Demetris Avraam, Tom R P Bishop, Tommy de Boer, Liesbeth Duijts, Sido Haakma, Eleanor Hyde, Vincent Jaddoe, Tarik Karramass, Fleur Kelpin, Yannick Marcon, Angela Pinot de Moira, Dick Postma, Clemens Tolboom, Ruben L Veenstra, Stuart Wheater, Marieke Welten, Rebecca C Wilson, Erik Zwart, Morris Swertz","doi":"10.1093/bioinformatics/btae726","DOIUrl":"10.1093/bioinformatics/btae726","url":null,"abstract":"<p><strong>Summary: </strong>Extensive human health data from cohort studies, national registries, and biobanks can reveal lifecourse risk factors impacting health. Combining these sources offers increased statistical power, rare outcome detection, replication of findings, and extended study periods. Traditionally, this required data transfer to a central location or separate partner analyses with pooled summary statistics, posing ethical, legal, and time constraints. Federated analysis-which involves remote data analysis without sharing individual-level data-is a promising alternative. One promising solution is DataSHIELD (https://datashield.org/), an open-source R based implementation. To enable federated analysis, data owners need a user-friendly way to install the federated infrastructure and manage users and data. Here, we present MOLGENIS Armadillo: a lightweight server for federated analysis solutions such as DataSHIELD.</p><p><strong>Availability and implementation: </strong>Armadillo is implemented as a collection of three packages freely available under the open source licence LGPLv3: two R packages downloadable from the Comprehensive R Archive Network (CRAN) (\"MolgenisArmadillo\" and \"DSMolgenisArmdillo\") and one Java application (\"ArmadilloService\") as jar and docker images via Github (https://github.com/molgenis/molgenis-service-armadillo).</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11734753/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142824874","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FlexLMM: a Nextflow linear mixed model framework for GWAS. FlexLMM:用于GWAS的nextflow线性混合模型框架。
Bioinformatics (Oxford, England) Pub Date : 2024-12-26 DOI: 10.1093/bioinformatics/btaf021
Saul Pierotti, Tomas Fitzgerald, Ewan Birney
{"title":"FlexLMM: a Nextflow linear mixed model framework for GWAS.","authors":"Saul Pierotti, Tomas Fitzgerald, Ewan Birney","doi":"10.1093/bioinformatics/btaf021","DOIUrl":"10.1093/bioinformatics/btaf021","url":null,"abstract":"<p><strong>Summary: </strong>Linear mixed models (LMMs) are a commonly used statistical approach in genome-wide association studies when population structure is present. However, naive permutations of the phenotype to empirically estimate the null distribution of a statistic of interest are not appropriate in the presence of population structure or covariates. This is because the samples are not exchangeable with each other under the null hypothesis, and because permuting the phenotypes breaks the relationship among those and eventual covariates. For this reason, we developed FlexLMM, a Nextflow pipeline that can perform appropriate permutations in LMMs while allowing for flexibility in the definition of the exact statistical model to be used. FlexLMM can set a significance threshold via permutations, thanks to a two-step process where the population structure is first regressed out, and only then are the permutations performed on the uncorrelated residuals. We envision this pipeline will be particularly useful for researchers working on multi-parental crosses among inbred lines of model organisms or farm animals and plants.</p><p><strong>Availability and implementation: </strong>The source code and documentation for the FlexLMM is available at https://github.com/birneylab/flexlmm.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11783306/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143018149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The scalable variant call representation: enabling genetic analysis beyond one million genomes. 可扩展的变体呼叫表示:实现超过一百万个基因组的遗传分析。
Bioinformatics (Oxford, England) Pub Date : 2024-12-26 DOI: 10.1093/bioinformatics/btae746
Timothy Poterba, Christopher Vittal, Daniel King, Daniel Goldstein, Jacqueline I Goldstein, Patrick Schultz, Konrad J Karczewski, Cotton Seed, Benjamin M Neale
{"title":"The scalable variant call representation: enabling genetic analysis beyond one million genomes.","authors":"Timothy Poterba, Christopher Vittal, Daniel King, Daniel Goldstein, Jacqueline I Goldstein, Patrick Schultz, Konrad J Karczewski, Cotton Seed, Benjamin M Neale","doi":"10.1093/bioinformatics/btae746","DOIUrl":"10.1093/bioinformatics/btae746","url":null,"abstract":"<p><strong>Motivation: </strong>The Variant Call Format (VCF) is widely used in genome sequencing but scales poorly. For instance, we estimate a 150 000 genome VCF would occupy 900 TiB, making it costly and complicated to produce, analyze, and store. The issue stems from VCF's requirement to densely represent both reference-genotypes and allele-indexed arrays. These requirements lead to unnecessary data duplication and, ultimately, very large files.</p><p><strong>Results: </strong>To address these challenges, we introduce the Scalable Variant Call Representation (SVCR). This representation reduces file sizes by ensuring they scale linearly with samples. SVCR's linear scaling relies on two techniques, both necessary for linearity: local allele indices and reference blocks, which were first introduced by the Genomic Variant Call Format. SVCR is also lossless and mergeable, allowing for N + 1 and N + K incremental joint-calling. We present two implementations of SVCR: SVCR-VCF, which encodes SVCR in VCF format, and VDS, which uses Hail's native format. Our experiments confirm the linear scalability of SVCR-VCF and VDS, in contrast to the super-linear growth seen with standard VCF files. We also discuss the VDS Combiner, a scalable, open-source tool for producing a VDS from GVCFs and unique features of VDS which enable rapid data analysis. SVCR, and VDS in particular, ensure the scientific community can generate, analyze, and disseminate genetics datasets with millions of samples.</p><p><strong>Availability and implementation: </strong>https://github.com/hail-is/hail/.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11745898/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142883360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
VarNMF: non-negative probabilistic factorization with source variation. VarNMF:具有源变异的非负概率分解。
Bioinformatics (Oxford, England) Pub Date : 2024-12-26 DOI: 10.1093/bioinformatics/btae758
Ela Fallik, Nir Friedman
{"title":"VarNMF: non-negative probabilistic factorization with source variation.","authors":"Ela Fallik, Nir Friedman","doi":"10.1093/bioinformatics/btae758","DOIUrl":"10.1093/bioinformatics/btae758","url":null,"abstract":"<p><strong>Motivation: </strong>Non-negative matrix factorization (NMF) is a powerful tool often applied to genomic data to identify non-negative latent components that constitute linearly mixed samples. It is useful when the observed signal combines contributions from multiple sources, such as cell types in bulk measurements of heterogeneous tissue. NMF accounts for two types of variation between samples - disparities in the proportions of sources and observation noise. However, in many settings, there is also a non-trivial variation between samples in the contribution of each source to the mixed data. This variation cannot be accurately modeled using the NMF framework.</p><p><strong>Results: </strong>We present VarNMF, a probabilistic extension of NMF that explicitly models this variation in source values. We show that by modeling sources as non-negative distributions, we can recover source variation directly from mixed samples without observing any of the sources directly. We apply VarNMF to a cell-free ChIP-seq dataset of two cancer cohorts and a healthy cohort, demonstrating that VarNMF provides a better estimation of the data distribution. Moreover, VarNMF extracts cancer-associated source distributions that decouple the tumor characteristics from the amount of tumor contribution, and identify patient-specific disease behaviors. This decomposition highlights the inter-tumor variability that is obscured in the mixed samples.</p><p><strong>Availability and implementation: </strong>Code is available at https://github.com/Nir-Friedman-Lab/VarNMF.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142900896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信