Giuseppe Defazio, Marco Antonio Tangaro, Graziano Pesole, Bruno Fosso
{"title":"kMetaShot: a fast and reliable taxonomy classifier for metagenome-assembled genomes.","authors":"Giuseppe Defazio, Marco Antonio Tangaro, Graziano Pesole, Bruno Fosso","doi":"10.1093/bib/bbae680","DOIUrl":"10.1093/bib/bbae680","url":null,"abstract":"<p><p>The advent of high-throughput sequencing (HTS) technologies unlocked the complexity of the microbial world through the development of metagenomics, which now provides an unprecedented and comprehensive overview of its taxonomic and functional contribution in a huge variety of macro- and micro-ecosystems. In particular, shotgun metagenomics allows the reconstruction of microbial genomes, through the assembly of reads into MAGs (metagenome-assembled genomes). In fact, MAGs represent an information-rich proxy for inferring the taxonomic composition and the functional contribution of microbiomes, even if the relevant analytical approaches are not trivial and still improvable. In this regard, tools like CAMITAX and GTDBtk have implemented complex approaches, relying on marker gene identification and sequence alignments, requiring a large processing time. With the aim of deploying an effective tool for fast and reliable MAG taxonomic classification, we present here kMetaShot, a taxonomy classifier based on k-mer/minimizer counting. We benchmarked kMetaShot against CAMITAX and GTDBtk by using both in silico and real mock communities and demonstrated how, while implementing a fast and concise algorithm, it outperforms the other tools in terms of classification accuracy. Additionally, kMetaShot is an easy-to-install and easy-to-use bioinformatic tool that is also suitable for researchers with few command-line skills. It is available and documented at https://github.com/gdefazio/kMetaShot.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11695915/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142920868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Identifying cancer prognosis genes through causal learning.","authors":"Siwei Wu, Chaoyi Yin, Yuezhu Wang, Huiyan Sun","doi":"10.1093/bib/bbae721","DOIUrl":"10.1093/bib/bbae721","url":null,"abstract":"<p><p>Accurate identification of causal genes for cancer prognosis is critical for estimating disease progression and guiding treatment interventions. In this study, we propose CPCG (Cancer Prognosis's Causal Gene), a two-stage framework identifying gene sets causally associated with patient prognosis across diverse cancer types using transcriptomic data. Initially, an ensemble approach models gene expression's impact on survival with parametric and semiparametric hazard models. Subsequently, an iterative conditional independence test combined with graph pruning is utilized to infer the causal skeleton, thereby pinpointing prognosis-related genes. Experiments on transcriptomic data from 18 cancer types sourced from The Cancer Genome Atlas Project demonstrate CPCG's effectiveness in predicting prognosis under four evaluation metrics. Validations on 24 additional datasets covering 12 cancer types from the Gene Expression Omnibus and the Chinese Glioma Genome Atlas Project further demonstrate CPCG's robustness and generalizability. CPCG identifies a concise but reliable set of genes, obviating the need for gene combination enumeration for survival time estimation. These genes are also proved closely linked to crucial biological processes in cancer. Moreover, CPCG constructs a stable causal skeleton and exhibits insensitivity to the order of data shuffling. Overall, CPCG is a powerful tool for extracting cancer prognostic biomarkers, offering interpretability, generalizability, and robustness. CPCG holds promise for facilitating targeted interventions in clinical treatment strategies.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11729728/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142977610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yongjie Xu, Zelin Zang, Bozhen Hu, Yue Yuan, Cheng Tan, Jun Xia, Stan Z Li
{"title":"Complex hierarchical structures analysis in single-cell data with Poincaré deep manifold transformation.","authors":"Yongjie Xu, Zelin Zang, Bozhen Hu, Yue Yuan, Cheng Tan, Jun Xia, Stan Z Li","doi":"10.1093/bib/bbae687","DOIUrl":"10.1093/bib/bbae687","url":null,"abstract":"<p><p>Single-cell RNA sequencing (scRNA-seq) offers remarkable insights into cellular development and differentiation by capturing the gene expression profiles of individual cells. The role of dimensionality reduction and visualization in the interpretation of scRNA-seq data has gained widely acceptance. However, current methods face several challenges, including incomplete structure-preserving strategies and high distortion in embeddings, which fail to effectively model complex cell trajectories with multiple branches. To address these issues, we propose the Poincaré deep manifold transformation (PoincaréDMT) method, which maps high-dimensional scRNA-seq data to a hyperbolic Poincaré disk. This approach preserves global structure from a graph Laplacian matrix while achieving local structure correction through a structure module combined with data augmentation. Additionally, PoincaréDMT alleviates batch effects by integrating a batch graph that accounts for batch labels into the low-dimensional embeddings during network training. Furthermore, PoincaréDMT introduces the Shapley additive explanations method based on trained model to identify the important marker genes in specific clusters and cell differentiation process. Therefore, PoincaréDMT provides a unified framework for multiple key tasks essential for scRNA-seq analysis, including trajectory inference, pseudotime inference, batch correction, and marker gene selection. We validate PoincaréDMT through extensive evaluations on both simulated and real scRNA-seq datasets, demonstrating its superior performance in preserving global and local data structures compared to existing methods.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11757945/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143027967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Identify potential drug candidates within a high-quality compound search space.","authors":"Xiaoqing Ru, Shulin Zhao, Quan Zou, Lifeng Xu","doi":"10.1093/bib/bbaf024","DOIUrl":"10.1093/bib/bbaf024","url":null,"abstract":"<p><p>The identification of potential effective drug candidates is a fundamental step in new drug discovery, with profound implications for pharmaceutical research and the healthcare sector. While many computational methods have been developed for such predictions and have yielded promising results, two challenges persist: (i) The cold start problem of new drugs, which increases the difficulty of prediction due to lack of historical data or prior knowledge. (ii) The vastness of the compound search space for potential drug candidates. In this study, we present a promising method that not only enhances the accuracy of identifying potential novel drug candidates but also refines the search space. Drawing inspiration from solutions to the cold start problem in recommender systems, we apply 'learning to rank' techniques to the field of new drug discovery. Furthermore, we propose using three similarity metrics to condense the compound search space into compact yet high-quality spaces, allowing for more efficient screening of potential drug candidates. Experimental results from two widely used datasets demonstrate that our method outperforms other state-of-the-art approaches in the new drug cold-start scenario. Additionally, we have verified that it is feasible to identify potential drug candidates within these high-quality compound search spaces. To our knowledge, this study is the first to address drug cold-start problem in such a confined space, potentially providing valuable insights and guidance for drug screening.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11758506/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143032245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deciphering cell states and the cellular ecosystem to improve risk stratification in acute myeloid leukemia.","authors":"Zheyang Zhang, Ronghan Tang, Ming Zhu, Zhijuan Zhu, Jiali Zhu, Hua Li, Mengsha Tong, Nainong Li, Jialiang Huang","doi":"10.1093/bib/bbaf028","DOIUrl":"10.1093/bib/bbaf028","url":null,"abstract":"<p><p>Acute myeloid leukemia (AML) demonstrates significant cellular heterogeneity in both leukemic and immune cells, providing valuable insights into clinical outcomes. Here, we constructed an AML single-cell transcriptome atlas and proposed sciNMF workflow to systematically dissect underlying cellular heterogeneity. Notably, sciNMF identified 26 leukemic and immune cell states that linked to clinical variables, mutations, and prognosis. By examining the co-existence patterns among these cell states, we highlighted a unique AML cellular ecosystem (ACE) that signifies aberrant tumor milieu and poor survival, which is confirmed by public RNA-seq cohorts. We further developed the ACE signature (ACEsig), comprising 12 genes, which accurately predicts AML prognosis, and outperforms existing signatures. When applied to cytogenetically normal AML or intensively treated patients, the ACEsig continues to demonstrate strong performance. Our results demonstrate that large-scale systematic characterization of cellular heterogeneity has the potential to enhance our understanding of AML heterogeneity and contribute to more precise risk stratification strategy.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11770069/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143045571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Predicting transcriptional changes induced by molecules with MiTCP.","authors":"Kaiyuan Yang, Jiabei Cheng, Shenghao Cao, Xiaoyong Pan, Hong-Bin Shen, Ye Yuan","doi":"10.1093/bib/bbaf006","DOIUrl":"10.1093/bib/bbaf006","url":null,"abstract":"<p><p>Studying the changes in cellular transcriptional profiles induced by small molecules can significantly advance our understanding of cellular state alterations and response mechanisms under chemical perturbations, which plays a crucial role in drug discovery and screening processes. Considering that experimental measurements need substantial time and cost, we developed a deep learning-based method called Molecule-induced Transcriptional Change Predictor (MiTCP) to predict changes in transcriptional profiles (CTPs) of 978 landmark genes induced by molecules. MiTCP utilizes graph neural network-based approaches to simultaneously model molecular structure representation and gene co-expression relationships, and integrates them for CTP prediction. After training on the L1000 dataset, MiTCP achieves an average Pearson correlation coefficient (PCC) of 0.482 on the test set and an average PCC of 0.801 for predicting the top 50 differentially expressed genes, which outperforms other existing methods. Furthermore, we used MiTCP to predict CTPs of three cancer drugs, palbociclib, irinotecan and goserelin, and performed gene enrichment analysis on the top differentially expressed genes and found that the enriched pathways and Gene Ontology terms are highly relevant to the corresponding diseases, which reveals the potential of MiTCP in drug development.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11756340/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143022250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Han Wang, Zilin Ren, Jinghong Sun, Yongbing Chen, Xiaochen Bo, JiGuo Xue, Jingyang Gao, Ming Ni
{"title":"DeepPFP: a multi-task-aware architecture for protein function prediction.","authors":"Han Wang, Zilin Ren, Jinghong Sun, Yongbing Chen, Xiaochen Bo, JiGuo Xue, Jingyang Gao, Ming Ni","doi":"10.1093/bib/bbae579","DOIUrl":"10.1093/bib/bbae579","url":null,"abstract":"<p><p>Deriving protein function from protein sequences poses a significant challenge due to the intricate relationship between sequence and function. Deep learning has made remarkable strides in predicting sequence-function relationships. However, models tailored for specific tasks or protein types encounter difficulties when using transfer learning across domains. This is attributed to the fact that protein function relies heavily on structural characteristics rather than mere sequence information. Consequently, there is a pressing need for a model capable of capturing shared features among diverse sequence-function mapping tasks to address the generalization issue. In this study, we explore the potential of Model-Agnostic Meta-Learning combined with a protein language model called Evolutionary Scale Modeling to tackle this challenge. Our approach involves training the architecture on five out-domain deep mutational scanning (DMS) datasets and evaluating its performance across four key dimensions. Our findings demonstrate that the proposed architecture exhibits satisfactory performance in terms of generalization and employs an effective few-shot learning strategy. To explain further, Compared to the best results, the Pearson's correlation coefficient (PCC) in the final stage increased by ~0.31%. Furthermore, we leverage the trained architecture to predict binding affinity scores of the DMS dataset of SARS-CoV-2 using transfer learning. Notably, training on a subset of the Ube4b dataset with 500 samples resulted in a notable improvement of 0.11 in the PCC. These results underscore the potential of our conceptual architecture as a promising methodology for multi-task protein function prediction.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11794456/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143188336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yiru Pan, Xingyu Ji, Jiaqi You, Lu Li, Zhenping Liu, Xianlong Zhang, Zeyu Zhang, Maojun Wang
{"title":"CSGDN: contrastive signed graph diffusion network for predicting crop gene-phenotype associations.","authors":"Yiru Pan, Xingyu Ji, Jiaqi You, Lu Li, Zhenping Liu, Xianlong Zhang, Zeyu Zhang, Maojun Wang","doi":"10.1093/bib/bbaf062","DOIUrl":"https://doi.org/10.1093/bib/bbaf062","url":null,"abstract":"<p><p>Positive and negative association prediction between gene and phenotype helps to illustrate the underlying mechanism of complex traits in organisms. The transcription and regulation activity of specific genes will be adjusted accordingly in different cell types, developmental timepoints, and physiological states. There are the following two problems in obtaining the positive/negative associations between gene and phenotype: (1) high-throughput DNA/RNA sequencing and phenotyping are expensive and time-consuming due to the need to process large sample sizes; (2) experiments introduce both random and systematic errors, and, meanwhile, calculations or predictions using software or models may produce noise. To address these two issues, we propose a Contrastive Signed Graph Diffusion Network, CSGDN, to learn robust node representations with fewer training samples to achieve higher link prediction accuracy. CSGDN uses a signed graph diffusion method to uncover the underlying regulatory associations between genes and phenotypes. Then, stochastic perturbation strategies are used to create two views for both original and diffusive graphs. Lastly, a multiview contrastive learning paradigm loss is designed to unify the node presentations learned from the two views to resist interference and reduce noise. We perform experiments to validate the performance of CSGDN in three crop datasets: Gossypium hirsutum, Brassica napus, and Triticum turgidum. The results show that the proposed model outperforms state-of-the-art methods by up to 9. 28% AUC for the prediction of link sign in the G. hirsutum dataset. The source code of our model is available at https://github.com/Erican-Ji/CSGDN.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143456952","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xianxiao Zhou, Ling Wu, Minghui Wang, Guojun Wu, Bin Zhang
{"title":"iDOMO: identification of drug combinations via multi-set operations for treating diseases.","authors":"Xianxiao Zhou, Ling Wu, Minghui Wang, Guojun Wu, Bin Zhang","doi":"10.1093/bib/bbaf054","DOIUrl":"https://doi.org/10.1093/bib/bbaf054","url":null,"abstract":"<p><p>Combination therapy has become increasingly important for treating complex diseases which often involve multiple pathways and targets. However, experimental screening of drug combinations is costly and time-consuming. The availability of large-scale transcriptomic datasets (e.g. CMap and LINCS) from in vitro drug treatment experiments makes it possible to computationally predict drug combinations with synergistic effects. Towards this end, we developed a computational approach, termed Identification of Drug Combinations via Multi-Set Operations (iDOMO), to predict drug synergy based on multi-set operations of drug and disease gene signatures. iDOMO quantifies the synergistic effect of a pair of drugs by taking into account the combination's beneficial and detrimental effects on treating a disease. We evaluated iDOMO, in a DREAM Challenge dataset with the matched, pre- and post-treatment gene expression data and cell viability information. We further evaluated the performance of iDOMO by concordance index and Spearman correlation on predicting the Highest Single Agency (HSA) synergy scores for four most common cancer types in two large-scale drug combination databases, showing that iDOMO significantly outperformed two existing popular drug combination approaches including the Therapeutic Score and the SynergySeq Orthogonality Score. Application of iDOMO to triple-negative breast cancer (TNBC) identified drug pairs with potential synergistic effects, with the combination of trifluridine and monobenzone being the most synergistic. Our in vitro experiments confirmed that the top predicted drug combination exerted a significant synergistic effect in inhibiting TNBC cell growth. In summary, iDOMO is an effective method for the in silico screening of synergistic drug combinations and will be a valuable tool for the development of novel therapeutics for complex diseases.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143457043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"TriTan: an efficient triple nonnegative matrix factorization method for integrative analysis of single-cell multiomics data.","authors":"Xin Ma, Lijing Lin, Qian Zhao, Mudassar Iqbal","doi":"10.1093/bib/bbae615","DOIUrl":"10.1093/bib/bbae615","url":null,"abstract":"<p><p>Single-cell multiomics have opened up tremendous opportunities for understanding gene regulatory networks underlying cell states by simultaneously profiling transcriptomes, epigenomes, and proteomes of the same cell. However, existing computational methods for integrative analysis of these high-dimensional multiomics data are either computationally expensive or limited in interpretation. These limitations pose challenges in the implementation of these methods in large-scale studies and hinder a more in-depth understanding of the underlying regulatory mechanisms. Here, we propose TriTan (Triple inTegrative fast non-negative matrix factorization), an efficient joint factorization method for single-cell multiomics data. TriTan implements a highly efficient factorization algorithm, greatly improving its computational performance. Three matrix factorization produced by TriTan helps in clustering cells, identifying signature features for each cell type, and uncovering feature associations across omics, which facilitates the identification of domains of regulatory chromatin and the prediction of cell-type-specific regulatory networks. We applied TriTan to the single-cell multiomics data obtained from different technologies and benchmarked it against the state-of-the-art methods where it shows highly competitive performance. Furthermore, we showed a range of downstream analyses conducted utilizing TriTan outputs, highlighting its capacity to facilitate interpretation in biological discovery.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11586128/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142709258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}