Bioinformatics (Oxford, England)最新文献

筛选
英文 中文
ViraLM: Empowering Virus Discovery through the Genome Foundation Model. ViraLM:通过基因组基金会模式促进病毒发现。
Bioinformatics (Oxford, England) Pub Date : 2024-11-23 DOI: 10.1093/bioinformatics/btae704
Cheng Peng, Jiayu Shang, Jiaojiao Guan, Donglin Wang, Yanni Sun
{"title":"ViraLM: Empowering Virus Discovery through the Genome Foundation Model.","authors":"Cheng Peng, Jiayu Shang, Jiaojiao Guan, Donglin Wang, Yanni Sun","doi":"10.1093/bioinformatics/btae704","DOIUrl":"https://doi.org/10.1093/bioinformatics/btae704","url":null,"abstract":"<p><strong>Motivation: </strong>Viruses, with their ubiquitous presence and high diversity, play pivotal roles in ecological systems and public health. Accurate identification of viruses in various ecosystems is essential for comprehending their variety and assessing their ecological influence. Metagenomic sequencing has become a major strategy to survey the viruses in various ecosystems. However, accurate and comprehensive virus detection in metagenomic data remains difficult. Limited reference sequences prevent alignment-based methods from identifying novel viruses. Machine learning-based tools are more promising in novel virus detection but often miss short viral contigs, which are abundant in typical metagenomic data. The inconsistency in virus search results produced by available tools further highlights the urgent need for a more robust tool for virus identification.</p><p><strong>Results: </strong>In this work, we develop ViraLM for identifying novel viral contigs in metagenomic data. By employing the latest genome foundation model as the backbone and training on a rigorously constructed dataset, the model is able to distinguish viruses from other organisms based on the learned genomic characteristics. We thoroughly tested ViraLM on multiple datasets and the experimental results show that ViraLM outperforms available tools in different scenarios. In particular, ViraLM improves the F1-score on short contigs by 22%.</p><p><strong>Availability: </strong>The source code of ViraLM is available via: https://github.com/ChengPENG-wolf/ViraLM.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142696061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RUCova: Removal of Unwanted Covariance in mass cytometry data. RUCova:去除质量细胞测量数据中不必要的协方差。
Bioinformatics (Oxford, England) Pub Date : 2024-11-23 DOI: 10.1093/bioinformatics/btae669
Rosario Astaburuaga-García, Thomas Sell, Samet Mutlu, Anja Sieber, Kirsten Lauber, Nils Blüthgen
{"title":"RUCova: Removal of Unwanted Covariance in mass cytometry data.","authors":"Rosario Astaburuaga-García, Thomas Sell, Samet Mutlu, Anja Sieber, Kirsten Lauber, Nils Blüthgen","doi":"10.1093/bioinformatics/btae669","DOIUrl":"https://doi.org/10.1093/bioinformatics/btae669","url":null,"abstract":"<p><strong>Motivation: </strong>High dimensional single-cell mass cytometry data are confounded by unwanted covariance due to variations in cell size and staining efficiency, making analysis and interpretation challenging.</p><p><strong>Results: </strong>We present RUCova, a novel method designed to address confounding factors in mass cytometry data. RUCova removes unwanted covariance from measured markers applying multivariate linear regression based on Surrogates of sources Unwanted Covariance (SUCs) and principal component analysis (PCA). We exemplify the use of RUCova and show that it effectively removes unwanted covariance while preserving genuine biological signals. Our results demonstrate the efficacy of RUCova in elucidating complex data patterns, facilitating the identification of activated signalling pathways, and improving the classification of important cell populations such as apoptotic cells. By providing a robust framework for data normalization and interpretation, RUCova enhances the accuracy and reliability of mass cytometry analyses, contributing to advances in our understanding of cellular biology and disease mechanisms.</p><p><strong>Availability and implementation: </strong>The R package is available on https://github.com/molsysbio/RUCova. Detailed documentation, data, and the code required to reproduce the results are available on https://doi.org/10.5281/zenodo.10913464.</p><p><strong>Supplementary information: </strong>Available at Bioinformatics online (PDF).</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142696059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CVR-BBI: An Open-Source VR Platform for Multi-User Collaborative Brain to Brain Interfaces. CVR-BBI:多用户协作脑对脑接口的开源虚拟现实平台。
Bioinformatics (Oxford, England) Pub Date : 2024-11-22 DOI: 10.1093/bioinformatics/btae676
Di Liu, Yina Wei
{"title":"CVR-BBI: An Open-Source VR Platform for Multi-User Collaborative Brain to Brain Interfaces.","authors":"Di Liu, Yina Wei","doi":"10.1093/bioinformatics/btae676","DOIUrl":"https://doi.org/10.1093/bioinformatics/btae676","url":null,"abstract":"<p><strong>Summary: </strong>As brain imaging and neurofeedback technologies advance, the brain-to-brain interface (BBI) has emerged as an innovative filed, enabling in-depth exploration of cross-brain information exchange and enhancing our understanding of collaborative intelligence. However, no open-source virtual reality (VR) platform currently supports the rapid and efficient configuration of multi-user, collaborative BBIs. To address this gap, we introduce the Collaborative Virtual Reality Brain-to-Brain Interface (CVR-BBI), an open-source platform consisting of a client and server. The CVR-BBI client enables users to participate in collaborative experiments, collect electroencephalogram (EEG) data and manage interactive multisensory stimuli within the VR environment. Meanwhile, the CVR-BBI server manages multi-user collaboration paradigms, and performs real-time analysis of the EEG data. We evaluated the CVR-BBI platform using the SSVEP paradigm and observed that collaborative decoding outperformed individual decoding, validating the platform's effectiveness in collaborative settings. The CVR-BBI offers a pioneering platform that facilitates the development of innovative BBI applications within collaborative VR environments, thereby enhancing the understanding of brain collaboration and cognition.</p><p><strong>Availability and implementation: </strong>CVR-BBI is released as an open-source platform, with its source code being available at https://github.com/DILIU1/CVR-BBI.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142690014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FungiFun3: Systemic gene set enrichment analysis for fungal species. FungiFun3:真菌物种的系统基因组富集分析。
Bioinformatics (Oxford, England) Pub Date : 2024-11-22 DOI: 10.1093/bioinformatics/btae620
Albert Garcia Lopez, Daniela Albrecht-Eckardt, Gianni Panagiotou, Sascha Schäuble
{"title":"FungiFun3: Systemic gene set enrichment analysis for fungal species.","authors":"Albert Garcia Lopez, Daniela Albrecht-Eckardt, Gianni Panagiotou, Sascha Schäuble","doi":"10.1093/bioinformatics/btae620","DOIUrl":"https://doi.org/10.1093/bioinformatics/btae620","url":null,"abstract":"<p><strong>Summary: </strong>The ever-growing amount of genome-wide omics data paved the way for solving life science problems in a data-driven manner. Among others, enrichment analysis is part of the standard analysis arsenal to determine systemic signals in any given transcriptomic or proteomic data. Only a part of the members of the fungal kingdom, however, can be analyzed via public web applications, despite the global rise of fungal pathogens and their increasing resistance to antimycotics. We present FungiFun3, a major update of our user-friendly gene set enrichment web application dedicated to fungi. FungiFun3 was rebuilt from scratch to support a modern and easy-to-use web interface and supports more than four-fold more fungal strains (n = 1,287 in total) than its predecessor. In addition, it also allows ranked gene set enrichment analysis at the genomic scale. FungiFun3 thus serves as a starting hub for identifying molecular signals in omics data sets related to a vast amount of available fungal strains including human fungal pathogens of the WHO's priority list and far beyond.</p><p><strong>Availability and implementation: </strong>FungiFun3, including sample data and FAQ, is freely available at https://fungifun3.hki-jena.de/.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142690021","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Expert-guided protein Language Models enable accurate and blazingly fast fitness prediction. 以专家为指导的蛋白质语言模型能够准确快速地预测适合度。
Bioinformatics (Oxford, England) Pub Date : 2024-11-22 DOI: 10.1093/bioinformatics/btae621
Céline Marquet, Julius Schlensok, Marina Abakarova, Burkhard Rost, Elodie Laine
{"title":"Expert-guided protein Language Models enable accurate and blazingly fast fitness prediction.","authors":"Céline Marquet, Julius Schlensok, Marina Abakarova, Burkhard Rost, Elodie Laine","doi":"10.1093/bioinformatics/btae621","DOIUrl":"https://doi.org/10.1093/bioinformatics/btae621","url":null,"abstract":"<p><strong>Motivation: </strong>Exhaustive experimental annotation of the effect of all known protein variants remains daunting and expensive, stressing the need for scalable effect predictions. We introduce VespaG, a blazingly fast missense amino acid variant effect predictor, leveraging protein Language Model (pLM) embeddings as input to a minimal deep learning model.</p><p><strong>Results: </strong>To overcome the sparsity of experimental training data, we created a dataset of 39 million single amino acid variants from the human proteome applying the multiple sequence alignment-based effect predictor GEMME as a pseudo standard-of-truth. This setup increases interpretability compared to the baseline pLM and is easily retrainable with novel or updated pLMs. Assessed against the ProteinGym benchmark(217 multiplex assays of variant effect- MAVE- with 2.5 million variants), VespaG achieved a mean Spearman correlation of 0.48±0.02, matching top-performing methods evaluated on the same data. VespaG has the advantage of being orders of magnitude faster, predicting all mutational landscapes of all proteins in proteomes such as Homo sapiens or Drosophila melanogaster in under 30 minutes on a consumer laptop (12-core CPU, 16 GB RAM).</p><p><strong>Availability: </strong>VespaG is available freely at https://github.com/jschlensok/vespag. The associated training data and predictions are available at https://doi.org/10.5281/zenodo.11085958.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142690017","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FastTENET: an accelerated TENET algorithm based on manycore computing in Python. FastTENET:基于 Python 多核计算的 TENET 加速算法。
Bioinformatics (Oxford, England) Pub Date : 2024-11-21 DOI: 10.1093/bioinformatics/btae699
Rakbin Sung, Hyeonkyu Kim, Junil Kim, Daewon Lee
{"title":"FastTENET: an accelerated TENET algorithm based on manycore computing in Python.","authors":"Rakbin Sung, Hyeonkyu Kim, Junil Kim, Daewon Lee","doi":"10.1093/bioinformatics/btae699","DOIUrl":"https://doi.org/10.1093/bioinformatics/btae699","url":null,"abstract":"<p><strong>Summary: </strong>TENET reconstructs gene regulatory networks from single-cell RNA sequencing (scRNAseq) data using the transfer entropy, and works successfully on a variety of scRNAseq data. However, TENET is limited by its long computation time for large datasets. To address this limitation, we propose FastTENET, an array-computing version of TENET algorithm optimized for acceleration on manycore processors such as GPUs. FastTENET counts the unique patterns of joint events to compute the transfer entropy based on array computing. Compared to TENET, FastTENET achieves up to 973× performance improvement.</p><p><strong>Availability and implementation: </strong>FastTENET is available on GitHub at https://github.com/cxinsys/fasttenet.</p><p><strong>Supplementary information: </strong>Supplementary data is available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142683860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improved prediction of post-translational modification crosstalk within proteins using DeepPCT. 利用 DeepPCT 改进蛋白质翻译后修饰串扰的预测。
Bioinformatics (Oxford, England) Pub Date : 2024-11-21 DOI: 10.1093/bioinformatics/btae675
Yu-Xiang Huang, Rong Liu
{"title":"Improved prediction of post-translational modification crosstalk within proteins using DeepPCT.","authors":"Yu-Xiang Huang, Rong Liu","doi":"10.1093/bioinformatics/btae675","DOIUrl":"https://doi.org/10.1093/bioinformatics/btae675","url":null,"abstract":"<p><strong>Motivation: </strong>Post-translational modification (PTM) crosstalk events play critical roles in biological processes. Several machine learning methods have been developed to identify PTM crosstalk within proteins, but the accuracy is still far from satisfactory. Recent breakthroughs in deep learning and protein structure prediction could provide a potential solution to this issue.</p><p><strong>Results: </strong>We proposed DeepPCT, a deep learning algorithm to identify PTM crosstalk using AlphaFold2-based structures. In this algorithm, one deep learning classifier was constructed for sequence-based prediction by combining the residue and residue pair embeddings with cross-attention techniques, while the other classifier was established for structure-based prediction by integrating the structural embedding and a graph neural network. Meanwhile, a machine learning classifier was developed using novel structural descriptors and a random forest model to complement the structural deep learning classifier. By integrating the three classifiers, DeepPCT outperformed existing algorithms in different evaluation scenarios and showed better generalizability on new data owing to its less distance dependency.</p><p><strong>Availability: </strong>Datasets, codes, and models of DeepPCT are freely accessible at https://github.com/hzau-liulab/DeepPCT/.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142683862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
OneSC: A computational platform for recapitulating cell state transitions. OneSC:重现细胞状态转换的计算平台。
Bioinformatics (Oxford, England) Pub Date : 2024-11-21 DOI: 10.1093/bioinformatics/btae703
Da Peng, Patrick Cahan
{"title":"OneSC: A computational platform for recapitulating cell state transitions.","authors":"Da Peng, Patrick Cahan","doi":"10.1093/bioinformatics/btae703","DOIUrl":"https://doi.org/10.1093/bioinformatics/btae703","url":null,"abstract":"<p><strong>Motivation: </strong>Computational modelling of cell state transitions has been a great interest of many in the field of developmental biology, cancer biology and cell fate engineering because it enables performing perturbation experiments in silico more rapidly and cheaply than could be achieved in a lab. Recent advancements in single-cell RNA sequencing (scRNA-seq) allow the capture of high-resolution snapshots of cell states as they transition along temporal trajectories. Using these high-throughput datasets, we can train computational models to generate in silico 'synthetic' cells that faithfully mimic the temporal trajectories.</p><p><strong>Results: </strong>Here we present OneSC, a platform that can simulate cell state transitions using systems of stochastic differential equations govern by a regulatory network of core transcription factors (TFs). Different from many current network inference methods, OneSC prioritizes on generating Boolean network that produces faithful cell state transitions and terminal cell states that mimic real biological systems. Applying OneSC to real data, we inferred a core TF network using a mouse myeloid progenitor scRNA-seq dataset and showed that the dynamical simulations of that network generate synthetic single-cell expression profiles that faithfully recapitulate the four myeloid differentiation trajectories going into differentiated cell states (erythrocytes, megakaryocytes, granulocytes and monocytes). Finally, through the in silico perturbations of the mouse myeloid progenitor core network, we showed that OneSC can accurately predict cell fate decision biases of TF perturbations that closely match with previous experimental observations.</p><p><strong>Availability: </strong>OneSC is implemented as a Python package on GitHub (https://github.com/CahanLab/oneSC) and on Zenodo (https://zenodo.org/records/14052421).</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142683863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Accurate and Transferable Drug-Target Interaction Prediction with DrugLAMP. 利用 DrugLAMP 进行准确、可转移的药物-靶点相互作用预测。
Bioinformatics (Oxford, England) Pub Date : 2024-11-21 DOI: 10.1093/bioinformatics/btae693
Zhengchao Luo, Wei Wu, Qichen Sun, Jinzhuo Wang
{"title":"Accurate and Transferable Drug-Target Interaction Prediction with DrugLAMP.","authors":"Zhengchao Luo, Wei Wu, Qichen Sun, Jinzhuo Wang","doi":"10.1093/bioinformatics/btae693","DOIUrl":"https://doi.org/10.1093/bioinformatics/btae693","url":null,"abstract":"<p><strong>Motivation: </strong>Accurate prediction of drug-target interactions (DTIs), especially for novel targets or drugs, is crucial for accelerating drug discovery. Recent advances in pretrained language models (PLMs) and multi-modal learning present new opportunities to enhance DTI prediction by leveraging vast unlabeled molecular data and integrating complementary information from multiple modalities.</p><p><strong>Results: </strong>We introduce DrugLAMP (PLM-Assisted Multi-modal Prediction), a PLM-based multi-modal framework for accurate and transferable DTI prediction. DrugLAMP integrates molecular graph and protein sequence features extracted by PLMs and traditional feature extractors. We introduce two novel multi-modal fusion modules: (1) Pocket-guided Co-Attention (PGCA), which uses protein pocket information to guide the attention mechanism on drug features, and (2) Paired Multi-modal Attention (PMMA), which enables effective cross-modal interactions between drug and protein features. These modules work together to enhance the model's ability to capture complex drug-protein interactions. Moreover, the Contrastive Compound-Protein Pre-training (2C2P) module enhances the model's generalization to real-world scenarios by aligning features across modalities and conditions. Comprehensive experiments demonstrate DrugLAMP's state-of-the-art performance on both standard benchmarks and challenging settings simulating real-world drug discovery, where test drugs/targets are unseen during training. Visualizations of attention maps and application to predict cryptic pockets and drug side effects further showcase DrugLAMP's strong interpretability and generalizability. Ablation studies confirm the contributions of the proposed modules.</p><p><strong>Availability: </strong>Source code and datasets are freely available at https://github.com/Lzcstan/DrugLAMP. All data originate from public sources.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142683859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sparse Neighbor Joining: rapid phylogenetic inference using a sparse distance matrix. 稀疏邻接:使用稀疏距离矩阵快速进行系统发育推断。
Bioinformatics (Oxford, England) Pub Date : 2024-11-21 DOI: 10.1093/bioinformatics/btae701
Semih Kurt, Alexandre Bouchard-Côté, Jens Lagergren
{"title":"Sparse Neighbor Joining: rapid phylogenetic inference using a sparse distance matrix.","authors":"Semih Kurt, Alexandre Bouchard-Côté, Jens Lagergren","doi":"10.1093/bioinformatics/btae701","DOIUrl":"https://doi.org/10.1093/bioinformatics/btae701","url":null,"abstract":"<p><strong>Motivation: </strong>Phylogenetic reconstruction is a fundamental problem in computational biology. The Neighbor Joining (NJ) algorithm offers an efficient distance-based solution to this problem, which often serves as the foundation for more advanced statistical methods. Despite prior efforts to enhance the speed of NJ, the computation of the n  2 entries of the distance matrix, where n is the number of phylogenetic tree leaves, continues to pose a limitation in scaling NJ to larger datasets.</p><p><strong>Results: </strong>In this work, we propose a new algorithm which does not require computing a dense distance matrix. Instead, it dynamically determines a sparse set of at most O(n log n) distance matrix entries to be computed in its basic version, and up to O(n log 2n) entries in an enhanced version. We show by experiments that this approach reduces the execution time of NJ for large datasets, with a trade-off in accuracy.</p><p><strong>Availability and implementation: </strong>Sparse Neighbor Joining is implemented in Python and freely available at https://github.com/kurtsemih/SNJ.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142683865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信