Bioinformatics (Oxford, England)最新文献_第7页

MSNGO: multi-species protein function annotation based on 3D protein structure and network propagation. MSNGO：基于三维蛋白质结构和网络传播的多物种蛋白质功能标注。

Bioinformatics (Oxford, England) Pub Date : 2025-05-06 DOI: 10.1093/bioinformatics/btaf285

Beibei Wang, Boyue Cui, Shiqu Chen, Xuan Wang, Yadong Wang, Junyi Li

{"title":"MSNGO: multi-species protein function annotation based on 3D protein structure and network propagation.","authors":"Beibei Wang, Boyue Cui, Shiqu Chen, Xuan Wang, Yadong Wang, Junyi Li","doi":"10.1093/bioinformatics/btaf285","DOIUrl":"10.1093/bioinformatics/btaf285","url":null,"abstract":"Motivation: In recent years, protein function prediction has broken through the bottleneck of sequence features, significantly improving prediction accuracy using high-precision protein structures predicted by AlphaFold2. While single-species protein function prediction methods have achieved remarkable success, multi-species approaches still face challenges such as difficulties in multi-source data integration and insufficient knowledge transfer between distantly-related species. How to integrate large-scale data and provide effective cross-species label propagation for species with sparse protein annotations remains a critical and unresolved challenge. To address this problem, we propose the MSNGO (Multi-species protein Structures and Network to predict GO terms) model, which integrates structural features and network propagation methods. Our validation shows that using structural features can significantly improve the accuracy of multi-species protein function prediction.Results: We employ graph representation learning techniques to extract amino acid representations from protein structure contact maps and train a structural model using a graph convolution pooling module to derive protein-level structural features. After incorporating the sequence features from ESM-2, we apply a network propagation algorithm to aggregate information and update node representations within a heterogeneous network. The results demonstrate that MSNGO outperforms previous multi-species protein function prediction methods that rely on sequence features and protein-protein networks.Availability and implementation: https://github.com/blingbell/MSNGO.","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12122197/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144045001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Charting the structure-sequence landscape of light chain amyloids. 绘制轻链淀粉样蛋白的结构序列图。

Bioinformatics (Oxford, England) Pub Date : 2025-05-06 DOI: 10.1093/bioinformatics/btaf167

Gabriele Orlando, Rodrigo Gallardo, Alicia Colla, Joost Schymkowitz, Frederic Rousseau

{"title":"Charting the structure-sequence landscape of light chain amyloids.","authors":"Gabriele Orlando, Rodrigo Gallardo, Alicia Colla, Joost Schymkowitz, Frederic Rousseau","doi":"10.1093/bioinformatics/btaf167","DOIUrl":"10.1093/bioinformatics/btaf167","url":null,"abstract":"Motivation: Light chain amyloidosis is a disease where misfolded antibody light chains (LCs) form toxic amyloid fibrils, leading to organ damage. Although LC overproduction occurs in all cases, only certain individuals develop the disease, suggesting that specific LC sequences and properties drive amyloid formation. This process is complex, involving both protein sequence and environmental factors, but mutations that destabilize the LC fold are linked to amyloid aggregation. Despite the significance of the disease, our understanding of LC fibril formation remains limited due to the lack of extensive data and technical challenges in studying amyloid structures. To address this, a tool is needed to compare unknown LC sequences with known structures and predict which amyloids are likely to adopt new conformations, guiding experimental investigations.Results: HMMSTUFF addresses this by using a Hidden Markov Model to generate similarity scores between LC sequences and existing PDB templates, eventually modeling the LC amyloid structures similar enough to known templates. HMMSTUFF on one side expands our understanding of LC amyloid fibril conformations, and on the other highlights the gaps in our current knowledge of LC structural space.Availability and implementation: HMMSTUFF is available as pypi package and as source code at https://github.com/grogdrinker/hmmstuff.","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12102067/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144038312","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

TissueViewer: a web-based multiplexed image viewer. TissueViewer：一个基于web的多路图像查看器。

Bioinformatics (Oxford, England) Pub Date : 2025-05-06 DOI: 10.1093/bioinformatics/btaf246

David Gerardus Pieter van IJzendoorn, Magdalena Matusiak, Rob West, Matt van de Rijn

引用次数: 0

Foldclass and Merizo-search: scalable structural similarity search for single- and multi-domain proteins using geometric learning. 折叠类和merizo搜索：使用几何学习对单域和多域蛋白质进行可扩展的结构相似性搜索。

Bioinformatics (Oxford, England) Pub Date : 2025-05-06 DOI: 10.1093/bioinformatics/btaf277

Shaun M Kandathil, Andy M Lau, Daniel W A Buchan, David T Jones

{"title":"Foldclass and Merizo-search: scalable structural similarity search for single- and multi-domain proteins using geometric learning.","authors":"Shaun M Kandathil, Andy M Lau, Daniel W A Buchan, David T Jones","doi":"10.1093/bioinformatics/btaf277","DOIUrl":"10.1093/bioinformatics/btaf277","url":null,"abstract":"Motivation: The availability of very large numbers of protein structures from accurate computational methods poses new challenges in storing, searching and detecting relationships between these structures. In particular, the new-found abundance of multi-domain structures in the AlphaFold structure database introduces challenges for traditional structure comparison methods.Results: We address these challenges using a fast, embedding-based structure comparison method called Foldclass which detects structural similarity between protein domains. We demonstrate the accuracy of Foldclass embeddings for homology detection. In combination with a recently developed deep learning-based automatic domain segmentation tool Merizo, we develop Merizo-search, which first segments multi-domain query structures into domains, and then searches a Foldclass embedding database to determine the top matches for each constituent domain. Combining the ability of Merizo to accurately segment complete chains into domains, and Foldclass to embed and detect similar domains, the Merizo-search tool can be used to rapidly detect per-domain similarities for complete chains, taking as little as 2 min to search all 365 million domains from the Encyclopedia of Domains. We anticipate that these tools will enable many analyses using the wealth of predicted structural data now available.Availability and implementation: Foldclass and Merizo-search are available at https://github.com/psipred/merizo_search. The version used in this publication is archived at https://doi.org/10.5281/zenodo.15120830. Merizo-search is also available on the PSIPRED web server at http://bioinf.cs.ucl.ac.uk/psipred.","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12122203/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144061732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Nonparametric IPSS: fast, flexible feature selection with false discovery control. 非参数IPSS：快速，灵活的特征选择与错误发现控制。

Bioinformatics (Oxford, England) Pub Date : 2025-05-06 DOI: 10.1093/bioinformatics/btaf299

Omar Melikechi, David B Dunson, Jeffrey W Miller

{"title":"Nonparametric IPSS: fast, flexible feature selection with false discovery control.","authors":"Omar Melikechi, David B Dunson, Jeffrey W Miller","doi":"10.1093/bioinformatics/btaf299","DOIUrl":"10.1093/bioinformatics/btaf299","url":null,"abstract":"Motivation: Feature selection is a critical task in machine learning and statistics. However, existing feature selection methods either (i) rely on parametric methods such as linear or generalized linear models, (ii) lack theoretical false discovery control, or (iii) identify few true positives.Results: We introduce a general feature selection method with finite-sample false discovery control based on applying integrated path stability selection (IPSS) to arbitrary feature importance scores. The method is nonparametric whenever the importance scores are nonparametric, and it estimates q-values, which are better suited to high-dimensional data than P-values. We focus on two special cases using importance scores from gradient boosting (IPSSGB) and random forests (IPSSRF). Extensive nonlinear simulations with RNA sequencing data show that both methods accurately control the false discovery rate and detect more true positives than existing methods. Both methods are also efficient, running in under 20 s when there are 500 samples and 5000 features. We apply IPSSGB and IPSSRF to detect microRNAs and genes related to cancer, finding that they yield better predictions with fewer features than existing approaches.Availability and implementation: All code and data used in this work are available on GitHub (https://github.com/omelikechi/ipss_bioinformatics) and permanently archived on Zenodo (https://doi.org/10.5281/zenodo.15335289). A Python package for implementing IPSS is available on GitHub (https://github.com/omelikechi/ipss) and PyPI (https://pypi.org/project/ipss/). An R implementation of IPSS is also available on GitHub (https://github.com/omelikechi/ipssR).","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12119134/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144000439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

argNorm: normalization of antibiotic resistance gene annotations to the Antibiotic Resistance Ontology (ARO). argNorm：抗生素耐药基因注释归一化到抗生素耐药本体（antibiotic resistance Ontology， ARO）。

Bioinformatics (Oxford, England) Pub Date : 2025-05-06 DOI: 10.1093/bioinformatics/btaf173

Svetlana Ugarcina Perovic, Vedanth Ramji, Hui Chong, Yiqian Duan, Finlay Maguire, Luis Pedro Coelho

引用次数: 0

Colora: a Snakemake workflow for complete chromosome-scale de novo genome assembly. 科罗拉多：一个完整的染色体尺度从头基因组组装的蛇尾工作流程。

Bioinformatics (Oxford, England) Pub Date : 2025-05-06 DOI: 10.1093/bioinformatics/btaf175

Lia Obinu, Timothy Booth, Heleen De Weerd, Urmi Trivedi, Andrea Porceddu

{"title":"Colora: a Snakemake workflow for complete chromosome-scale de novo genome assembly.","authors":"Lia Obinu, Timothy Booth, Heleen De Weerd, Urmi Trivedi, Andrea Porceddu","doi":"10.1093/bioinformatics/btaf175","DOIUrl":"https://doi.org/10.1093/bioinformatics/btaf175","url":null,"abstract":"Motivation: De novo assembly creates reference genomes that underpin many modern biodiversity and conservation studies. Large numbers of new genomes are being assembled by labs around the world. To avoid duplication of efforts and variable data quality, we desire a best-practice assembly process, implemented as an automated portable workflow.Results: Here, we present Colora, a Snakemake workflow that produces chromosome-scale de novo primary or phased genome assemblies complete with organelles using Pacific Biosciences HiFi, Hi-C, and optionally Oxford Nanopore Technologies reads as input. Colora is a user-friendly, versatile, and reproducible pipeline that is ready to use by researchers looking for an automated way to obtain high-quality de novo genome assemblies.Availability and implementation: The source code of Colora is available on GitHub (https://github.com/LiaOb21/colora) and has been deposited in Zenodo under DOI https://doi.org/10.5281/zenodo.13321576. Colora is also available at the Snakemake Workflow Catalog (https://snakemake.github.io/snakemake-workflow-catalog/? usage=LiaOb21%2Fcolora).","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":"41 5","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12065627/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144002080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

In silico identification of archaeal DNA-binding proteins. 古细菌dna结合蛋白的计算机鉴定。

Bioinformatics (Oxford, England) Pub Date : 2025-05-06 DOI: 10.1093/bioinformatics/btaf169

Linus Donvil, Joëlle A J Housmans, Eveline Peeters, Wim Vranken, Gabriele Orlando

{"title":"In silico identification of archaeal DNA-binding proteins.","authors":"Linus Donvil, Joëlle A J Housmans, Eveline Peeters, Wim Vranken, Gabriele Orlando","doi":"10.1093/bioinformatics/btaf169","DOIUrl":"https://doi.org/10.1093/bioinformatics/btaf169","url":null,"abstract":"Motivation: The rapid advancement of next-generation sequencing technologies has generated an immense volume of genetic data. However, these data are unevenly distributed, with well-studied organisms being disproportionately represented, while other organisms, such as from archaea, remain significantly underexplored. The study of archaea is particularly challenging due to the extreme environments they inhabit and the difficulties associated with culturing them in the laboratory. Despite these challenges, archaea likely represent a crucial evolutionary link between eukaryotic and prokaryotic organisms, and their investigation could shed light on the early stages of life on Earth. Yet, a significant portion of archaeal proteins are annotated with limited or inaccurate information. Among the various classes of archaeal proteins, DNA-binding proteins are of particular importance. While they represent a large portion of every known proteome, their identification in archaea is complicated by the substantial evolutionary divergence between archaeal and the other better studied organisms.Results: To address the challenges of identifying DNA-binding proteins in archaea, we developed Xenusia, a neural network-based tool capable of screening entire archaeal proteomes to identify DNA-binding proteins. Xenusia has proven effective across diverse datasets, including metagenomics data, successfully identifying novel DNA-binding proteins, with experimental validation of its predictions.Availability and implementation: Xenusia is available as a PyPI package, with source code accessible at https://github.com/grogdrinker/xenusia, and as a Google Colab web server application at xenusia.ipynb.","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":"41 5","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12065626/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144010068","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Missense variants pathogenicity annotation from homologous proteins. 同源蛋白的错义变异致病性注释。

Bioinformatics (Oxford, England) Pub Date : 2025-05-06 DOI: 10.1093/bioinformatics/btaf305

Gabriel Ruiz-Alías, Sergi Soldevila, Xavier Altafaj, Arnau Cordomí, Mireia Olivella

{"title":"Missense variants pathogenicity annotation from homologous proteins.","authors":"Gabriel Ruiz-Alías, Sergi Soldevila, Xavier Altafaj, Arnau Cordomí, Mireia Olivella","doi":"10.1093/bioinformatics/btaf305","DOIUrl":"10.1093/bioinformatics/btaf305","url":null,"abstract":"Motivation: High-throughput DNA sequencing has revealed millions of single nucleotide variants (SNVs) in the human genome, with a small fraction linked to disease. The effect of missense variants, which alter the protein sequence, is particularly challenging to interpret due to the scarcity of clinical annotations and experimental information. While using conservation and structural information, current prediction tools still struggle to predict variant pathogenicity. In this study, we explored the pathogenicity of homologous missense variants-variants in equivalent positions across homologous proteins-focusing on proteins involved in autosomal dominant diseases.Results: Our analysis of 2976 pathogenic and 17 555 non-pathogenic homologous variants demonstrated that pathogenicity can be extrapolated with 95% accuracy within a family, or up to 98% for closer homologs. Remarkably, the evaluation of 27 commonly used mutation predictor methods revealed that they were not fully capturing this biological feature. To facilitate the exploration of homologous variants, we created HomolVar, a web server that computationally predicts the pathogenesis of missense variants using annotations from homologous variants, freely available at https://rarevariants.org/HomolVar. Overall, these findings and the accompanying tool offer a robust method for predicting the pathogenicity of unannotated variants, enhancing genotype-phenotype correlations, and contributing to diagnosing rare genetic disorders.Availability and implementation: HomolVar is freely available at https://rarevariants.org/HomolVar.","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12122210/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144061733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MS1FA: Shiny app for the annotation of redundant features in untargeted metabolomics datasets. MS1FA：闪亮的应用程序，用于注释非目标代谢组学数据集中的冗余特征。

Bioinformatics (Oxford, England) Pub Date : 2025-05-06 DOI: 10.1093/bioinformatics/btaf161

Ruibing Shi, Frank Klawonn, Mark Brönstrup, Raimo Franke

{"title":"MS1FA: Shiny app for the annotation of redundant features in untargeted metabolomics datasets.","authors":"Ruibing Shi, Frank Klawonn, Mark Brönstrup, Raimo Franke","doi":"10.1093/bioinformatics/btaf161","DOIUrl":"10.1093/bioinformatics/btaf161","url":null,"abstract":"Motivation: Untargeted metabolomics, the comprehensive analysis of small molecules in biological systems, has become an invaluable tool for understanding physiology and metabolism. However, the annotation of metabolomic data is often confounded by the presence of redundant features, which can arise from e.g. multimerization, in-source fragments (ISFs), and adducts.Results: MS1FA uniquely integrates all major annotation approaches for redundant features within a single interactive platform. It combines correlation-based grouping with reliable ISF annotation using MS2 data and operates with MS1 data only, MS2 data only, or both. Additionally, it offers a distinctive method for grouping features based on relational criteria. As the only web-based platform with these capabilities, MS1FA provides easy access and allows users to explore and annotate the feature table interactively, with options to download the results.Availability and implementation: MS1FA is freely accessible at https://ms1fa.helmholtz-hzi.de. The source code and data are available at https://github.com/RuibingS/MS1FA_RShiny_dashboard and are archived with the DOI 10.5281/zenodo.15118962.","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":"41 5","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12069231/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144031090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0