{"title":"Regularly updated benchmark sets for statistically correct evaluations of AlphaFold applications.","authors":"Laszlo Dobson, Gábor E Tusnády, Peter Tompa","doi":"10.1093/bib/bbaf104","DOIUrl":"10.1093/bib/bbaf104","url":null,"abstract":"<p><p>AlphaFold2 changed structural biology by providing high-quality structure predictions for all possible proteins. Since its inception, a plethora of applications were built on AlphaFold2, expediting discoveries in virtually all areas related to protein science. In many cases, however, optimism seems to have made scientists forget about data leakage, a serious issue that needs to be addressed when evaluating machine learning methods. Here we provide a rigorous benchmark set that can be used in a broad range of applications built around AlphaFold2/3.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 2","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11894802/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143603126","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mira Koul, Shalini Kaushik, Kavya Singh, Deepak Sharma
{"title":"VITALdb: to select the best viroinformatics tools for a desired virus or application.","authors":"Mira Koul, Shalini Kaushik, Kavya Singh, Deepak Sharma","doi":"10.1093/bib/bbaf084","DOIUrl":"10.1093/bib/bbaf084","url":null,"abstract":"<p><p>The recent pandemics of viral diseases, COVID-19/mpox (humans) and lumpy skin disease (cattle), have kept us glued to viral research. These pandemics along with the recent human metapneumovirus outbreak have exposed the urgency for early diagnosis of viral infections, vaccine development, and discovery of novel antiviral drugs and therapeutics. To support this, there is an armamentarium of virus-specific computational tools that are currently available. VITALdb (VIroinformatics Tools and ALgorithms database) is a resource of ~360 viroinformatics tools encompassing all major viruses (SARS-CoV-2, influenza virus, human immunodeficiency virus, papillomavirus, herpes simplex virus, hepatitis virus, dengue virus, Ebola virus, Zika virus, etc.) and several diverse applications [structural and functional annotation, antiviral peptides development, subspecies characterization, recognition of viral recombination, inhibitors identification, phylogenetic analysis, virus-host prediction, viral metagenomics, detection of mutation(s), primer designing, etc.]. Resources, tools, and other utilities mentioned in this article will not only facilitate further developments in the realm of viroinformatics but also provide tremendous fillip to translate fundamental knowledge into applied research. Most importantly, VITALdb is an inevitable tool for selecting the best tool(s) to carry out a desired task and hence will prove to be a vital database (VITALdb) for the scientific community. Database URL: https://compbio.iitr.ac.in/vitaldb.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 2","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11892104/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143596338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Scaling up drug combination surface prediction.","authors":"Riikka Huusari, Tianduanyi Wang, Sandor Szedmak, Diogo Dias, Tero Aittokallio, Juho Rousu","doi":"10.1093/bib/bbaf099","DOIUrl":"10.1093/bib/bbaf099","url":null,"abstract":"<p><p>Drug combinations are required to treat advanced cancers and other complex diseases. Compared with monotherapy, combination treatments can enhance efficacy and reduce toxicity by lowering the doses of single drugs-and there especially synergistic combinations are of interest. Since drug combination screening experiments are costly and time-consuming, reliable machine learning models are needed for prioritizing potential combinations for further studies. Most of the current machine learning models are based on scalar-valued approaches, which predict individual response values or synergy scores for drug combinations. We take a functional output prediction approach, in which full, continuous dose-response combination surfaces are predicted for each drug combination on the cell lines. We investigate the predictive power of the recently proposed comboKR method, which is based on a powerful input-output kernel regression technique and functional modeling of the response surface. In this work, we develop a scaled-up formulation of the comboKR, which also implements improved modeling choices: we (1) incorporate new modeling choices for the output drug combination response surfaces to the comboKR framework, and (2) propose a projected gradient descent method to solve the challenging pre-image problem that is traditionally solved with simple candidate set approaches. We provide thorough experimental analysis of comboKR 2.0 with three real-word datasets within various challenging experimental settings, including cases where drugs or cell lines have not been encountered in the training data. Our comparison with synergy score prediction methods further highlights the relevance of dose-response prediction approaches, instead of relying on simple scoring methods.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 2","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11904408/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143623603","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qi Dong, Yingying Guo, Chen Lv, Lingxue Ren, Bo Chen, Yan Wang, Yang Liu, Mingyue Liu, Kaidong Liu, Nan Zhang, Linzhu Wang, Shaocong Sang, Xin Li, Yang Hui, Haihai Liang, Yunyan Gu
{"title":"Unveiling a novel cancer hallmark by evaluation of neural infiltration in cancer.","authors":"Qi Dong, Yingying Guo, Chen Lv, Lingxue Ren, Bo Chen, Yan Wang, Yang Liu, Mingyue Liu, Kaidong Liu, Nan Zhang, Linzhu Wang, Shaocong Sang, Xin Li, Yang Hui, Haihai Liang, Yunyan Gu","doi":"10.1093/bib/bbaf082","DOIUrl":"10.1093/bib/bbaf082","url":null,"abstract":"<p><p>Cancer cells acquire necessary functional capabilities for malignancy through the influence of the nervous system. We evaluate the extent of neural infiltration within the tumor microenvironment (TME) across multiple cancer types, highlighting its role as a cancer hallmark. We identify cancer-related neural genes using 40 bulk RNA-seq datasets across 10 cancer types, developing a predictive score for cancer-related neural infiltration (C-Neural score). Cancer samples with elevated C-Neural scores exhibit perineural invasion, recurrence, metastasis, higher stage or grade, or poor prognosis. Epithelial cells show the highest C-Neural scores among all cell types in 55 single-cell RNA sequencing datasets. The epithelial cells with high C-Neural scores (epi-highCNs) characterized by increased copy number variation, reduced cell differentiation, higher epithelial-mesenchymal transition scores, and elevated metabolic level. Epi-highCNs frequently communicate with Schwann cells by FN1 signaling pathway. The co-culture experiment indicates that Schwann cells may facilitate cancer progression through upregulation of VDAC1. Moreover, C-Neural scores positively correlate with the infiltration of antitumor immune cells, indicating potential response for immunotherapy. Melanoma patients with high C-Neural scores may benefit from trametinib. These analyses illuminate the extent of neural influence within TME, suggesting potential role as a cancer hallmark and offering implications for effective therapeutic strategies against cancer.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 2","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11886572/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143572162","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yixuan Jin, Juanjuan Huang, Xu Sun, Yabo Fang, Jiageng Wu, Jianshi Du, Jiwei Jia, Guoqing Wang
{"title":"GiGs: graph-based integrated Gaussian kernel similarity for virus-drug association prediction.","authors":"Yixuan Jin, Juanjuan Huang, Xu Sun, Yabo Fang, Jiageng Wu, Jianshi Du, Jiwei Jia, Guoqing Wang","doi":"10.1093/bib/bbaf117","DOIUrl":"10.1093/bib/bbaf117","url":null,"abstract":"<p><p>The prediction of virus-drug associations (VDAs) is crucial for drug repositioning, contributing to the identification of latent antiviral drugs. In this study, we developed a graph-based integrated Gaussian kernel similarity (GiGs) method for predicting potential VDAs in drug repositioning. The GiGs model comprises three components: (i) collection of experimentally validated VDA information and calculation virus sequence, drug chemical structure, and drug side effect similarity; (ii) integration of viruses and drugs similarity based on the above information and Gaussian interaction profile kernel (GIPK); and (iii) utilization of similarity-constrained weight graph normalization matrix factorization to predict antiviral drugs. The GiGs model enhances correlation matrix quality through the integration of multiple biological data, improves performance via similarity constraints, and prevents overfitting and predicts missing data more accurately through graph regularization. Extensive experimental results indicated that the GiGs model outperforms five other advanced association prediction methods. A case study identified broad-spectrum drugs for treating highly pathogenic human coronavirus infections, with molecular docking experiments confirming the model's accuracy.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 2","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11924387/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143668976","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Cox-Sage: enhancing Cox proportional hazards model with interpretable graph neural networks for cancer prognosis.","authors":"Ruijun Mao, Li Wan, Minghao Zhou, Dongxi Li","doi":"10.1093/bib/bbaf108","DOIUrl":"10.1093/bib/bbaf108","url":null,"abstract":"<p><p>High-throughput sequencing technologies have facilitated a deeper exploration of prognostic biomarkers. While many deep learning (DL) methods primarily focus on feature extraction or employ simplistic fully connected layers within prognostic modules, the interpretability of DL-extracted features can be challenging. To address these challenges, we propose an interpretable cancer prognosis model called Cox-Sage. Specifically, we first propose an algorithm to construct a patient similarity graph from heterogeneous clinical data, and then extract protein-coding genes from the patient's gene expression data to embed them as features into the graph nodes. We utilize multilayer graph convolution to model proportional hazards pattern and introduce a mathematical method to clearly explain the meaning of our model's parameters. Based on this approach, we propose two metrics for measuring gene importance from different perspectives: mean hazard ratio and reciprocal of the mean hazard ratio. These metrics can be used to discover two types of important genes: genes whose low expression levels are associated with high cancer prognosis risk, and genes whose high expression levels are associated with high cancer prognosis risk. We conducted experiments on seven datasets from TCGA, and our model achieved superior prognostic performance compared with some state-of-the-art methods. As a primary research, we performed prognostic biomarker discovery on the LIHC (Liver Hepatocellular Carcinoma) dataset. Our code and dataset can be found at https://github.com/beeeginner/Cox-sage.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 2","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11894944/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143603906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dissecting genetic regulation of metabolic coordination.","authors":"Emily C Hector, Daiwei Zhang, Leqi Tian, Junning Feng, Xianyong Yin, Tianyi Xu, Markku Laakso, Yun Bai, Jiashun Xiao, Jian Kang, Tianwei Yu","doi":"10.1093/bib/bbaf095","DOIUrl":"10.1093/bib/bbaf095","url":null,"abstract":"<p><p>Understanding genetic regulation of metabolism is critical for gaining insights into the causes of metabolic diseases. Traditional metabolome-based genome-wide association studies (mGWAS) focus on static associations between single nucleotide polymorphisms (SNPs) and metabolite levels, overlooking the changing relationships caused by genotypes within the metabolic network. Notably, some metabolites exhibit changes in correlation patterns with other metabolites under certain physiological conditions while maintaining their overall abundance level. In this manuscript, we develop Metabolic Differential-coordination GWAS (mdGWAS), an innovative framework that detects SNPs associated with the changing correlation patterns between metabolites and metabolic pathways. This approach transcends and complements conventional mean-based analyses by identifying latent regulatory factors that govern the system-level metabolic coordination. Through comprehensive simulation studies, mdGWAS demonstrated robust performance in detecting SNP-metabolite-metabolite associations. Applying mdGWAS to genotyping and mass spectrometry (MS)-based metabolomics data of the METabolic Syndrome In Men (METSIM) Study revealed novel SNPs and genes potentially involved in the regulation of the coordination between metabolic pathways.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 2","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11894804/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143603909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Ensemble learning based on matrix completion improves microbe-disease association prediction.","authors":"Hailin Chen, Kuan Chen","doi":"10.1093/bib/bbaf075","DOIUrl":"10.1093/bib/bbaf075","url":null,"abstract":"<p><p>Microbes have a profound impact on human health. Identifying disease-associated microbes would provide helpful guidance for drug development and disease treatment. Through an enormous experimental effort, limited disease-associated microbes have been determined. Accurate computational approaches are needed to predict potential microbe-disease associations for biomedical screening. In this study, we present an ensemble learning framework entitled SABMDA to improve microbe-disease association inference. We first integrate multi-source of information from both microbes and diseases, and develop two matrix completion algorithms to predict microbe-disease associations successively. Ablation tests show combining the two matrix completion algorithms can receive better prediction performance. Moreover, comprehensive experiments, including cross-validations and independent test, demonstrate that SABMDA outperforms seven recent baseline methods significantly. Finally, we apply SABMDA to three diseases to predict their associated microbes, and results show SABMDA's remarkable prediction ability in real situations.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 2","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11879468/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143555425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Elise Jorge, Sylvain Foissac, Pierre Neuvial, Matthias Zytnicki, Nathalie Vialaneix
{"title":"A comprehensive review and benchmark of differential analysis tools for Hi-C data.","authors":"Elise Jorge, Sylvain Foissac, Pierre Neuvial, Matthias Zytnicki, Nathalie Vialaneix","doi":"10.1093/bib/bbaf074","DOIUrl":"10.1093/bib/bbaf074","url":null,"abstract":"<p><strong>Motivation: </strong>The 3D organization of the genome plays a crucial role in various biological processes. Hi-C technology is widely used to investigate chromosome structures by quantifying 3D proximity between genomic regions. While numerous computational tools exist for detecting differences in Hi-C data between conditions, a comprehensive review and benchmark comparing their effectiveness is lacking.</p><p><strong>Results: </strong>This study offers a comprehensive review and benchmark of 10 generic tools for differential analysis of Hi-C matrices at the interaction count level. The benchmark assesses the statistical methods, usability, and performance (in terms of precision and power) of these tools, using both real and simulated Hi-C data. Results reveal a striking variability in performance among the tools, highlighting the substantial impact of preprocessing filters and the difficulty all tools encounter in effectively controlling the false discovery rate across varying resolutions and chromosome sizes.</p><p><strong>Availability: </strong>The complete benchmark is available at https://forgemia.inra.fr/scales/replication-chrocodiff using processed data deposited at https://doi.org/10.57745/LR0W9R.</p><p><strong>Contact: </strong>nathalie.vialaneix@inrae.fr.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 2","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11879411/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143555446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Meng Yuan, Seppe Goovaerts, Myoung K Lee, Jay Devine, Stephen Richmond, Susan Walsh, Mark D Shriver, John R Shaffer, Mary L Marazita, Hilde Peeters, Seth M Weinberg, Peter Claes
{"title":"Optimized phenotyping of complex morphological traits: enhancing discovery of common and rare genetic variants.","authors":"Meng Yuan, Seppe Goovaerts, Myoung K Lee, Jay Devine, Stephen Richmond, Susan Walsh, Mark D Shriver, John R Shaffer, Mary L Marazita, Hilde Peeters, Seth M Weinberg, Peter Claes","doi":"10.1093/bib/bbaf090","DOIUrl":"10.1093/bib/bbaf090","url":null,"abstract":"<p><p>Genotype-phenotype (G-P) analyses for complex morphological traits typically utilize simple, predetermined anatomical measures or features derived via unsupervised dimension reduction techniques (e.g. principal component analysis (PCA) or eigen-shapes). Despite the popularity of these approaches, they do not necessarily reveal axes of phenotypic variation that are genetically relevant. Therefore, we introduce a framework to optimize phenotyping for G-P analyses, such as genome-wide association studies (GWAS) of common variants or rare variant association studies (RVAS) of rare variants. Our strategy is two-fold: (i) we construct a multidimensional feature space spanning a wide range of phenotypic variation, and (ii) within this feature space, we use an optimization algorithm to search for directions or feature combinations that are genetically enriched. To test our approach, we examine human facial shape in the context of GWAS and RVAS. In GWAS, we optimize for phenotypes exhibiting high heritability, estimated from either family data or genomic relatedness measured in unrelated individuals. In RVAS, we optimize for the skewness of phenotype distributions, aiming to detect commingled distributions that suggest single or few genomic loci with major effects. We compare our approach with eigen-shapes as baseline in GWAS involving 8246 individuals of European ancestry and in gene-based tests of rare variants with a subset of 1906 individuals. After applying linkage disequilibrium score regression to our GWAS results, heritability-enriched phenotypes yielded the highest SNP heritability, followed by eigen-shapes, while commingling-based traits displayed the lowest SNP heritability. Heritability-enriched phenotypes also exhibited higher discovery rates, identifying the same number of independent genomic loci as eigen-shapes with a smaller effective number of traits. For RVAS, commingling-based traits resulted in more genes passing the exome-wide significance threshold than eigen-shapes, while heritability-enriched phenotypes lead to only a few associations. Overall, our results demonstrate that optimized phenotyping allows for the extraction of genetically relevant traits that can specifically enhance discovery efforts of common and rare variants, as evidenced by their increased power in facial GWAS and RVAS.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 2","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11891655/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143584575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}