Bioinformatics advances最新文献_第10页

MSCI: an open-source Python package for information content assessment of peptide fragmentation spectra. MSCI：一个开源的Python包，用于肽片段谱的信息内容评估。

IF 2.4

Bioinformatics advances Pub Date : 2025-05-24 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf125

Zahra Elhamraoui, Eva Borràs, Mathias Wilhelm, Eduard Sabidó

{"title":"MSCI: an open-source Python package for information content assessment of peptide fragmentation spectra.","authors":"Zahra Elhamraoui, Eva Borràs, Mathias Wilhelm, Eduard Sabidó","doi":"10.1093/bioadv/vbaf125","DOIUrl":"10.1093/bioadv/vbaf125","url":null,"abstract":"Motivation: In mass spectrometry-based proteomics, the availability of peptide prior knowledge has improved our ability to assign fragmentation spectra to specific peptide sequences. However, some peptides exhibit similar analytical values and fragmentation patterns, which makes them nearly indistinguishable with current data analysis tools.Results: Here we developed the Mass Spectrometry Content Information (MSCI) Python package to tackle the challenges of peptide identification in mass spectrometry-based proteomics, particularly regarding indistinguishable peptides. MSCI provides a comprehensive toolset that streamlines the workflow from data import to spectral analysis, enabling researchers to effectively evaluate fragmentation similarity scores among peptide sequences and pinpoint indistinguishable peptide pairs in a given proteome.Availability and implementation: MSCI is implemented in Python and it is released under a permissive MIT license. The source code and the installers are available on GitHub at https://github.com/proteomicsunitcrg/MSCI.","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf125"},"PeriodicalIF":2.4,"publicationDate":"2025-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12204179/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144531371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

jdispatcher-viewers: interactive visualizations of sequence similarity search results and domain predictions. jdispatcher-viewer：序列相似性搜索结果和领域预测的交互式可视化。

IF 2.4

Bioinformatics advances Pub Date : 2025-05-23 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf122

Fábio Madeira, Joonheung Lee, Nandana Madhusoodanan, Alberto Eusebi, Ania Niewielska, Sarah Butcher

{"title":"jdispatcher-viewers: interactive visualizations of sequence similarity search results and domain predictions.","authors":"Fábio Madeira, Joonheung Lee, Nandana Madhusoodanan, Alberto Eusebi, Ania Niewielska, Sarah Butcher","doi":"10.1093/bioadv/vbaf122","DOIUrl":"10.1093/bioadv/vbaf122","url":null,"abstract":"Motivation: Biological visualization is an important technique for researchers to make sense of complex biological data. Functional prediction and the discovery of novel proteins remain central objectives in biology, as they provide insights into molecular mechanisms with significant applications in health and disease. Visualizing sequence similarity search results and domain predictions is essential for exploring protein function, identifying conserved elements, and drawing meaningful connections between sequences, ultimately accelerating discovery.Results: The new website for the EMBL-EBI Job Dispatcher bioinformatics tools framework, was released in 2023. Along with improvements and new features, the website has since integrated interactive visualizations designed to aid researchers further and enrich the user experience. Here, we describe jdispatcher-viewers, a library for the interactive visualization of sequence similarity search results from BLAST and FASTA, and interactive visualizations of domain predictions and annotations provided by InterPro.Availability and implementation: The jdispatcher-viewers library and documentation which includes a demo webpage are available from https://github.com/ebi-jdispatcher/jdispatcher-viewers. Interactive visualizations provided among the result pages of sequence similarity search tools in Job Dispatcher have been implemented using jdispatcher-viewers, and are available at https://www.ebi.ac.uk/jdispatcher/sss. The library is distributed under the Apache 2.0 license.","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf122"},"PeriodicalIF":2.4,"publicationDate":"2025-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12133269/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144217682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Phenotype driven data augmentation methods for transcriptomic data. 表型驱动的转录组数据增强方法。

IF 2.4

Bioinformatics advances Pub Date : 2025-05-23 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf124

Nikita Janakarajan, Mara Graziani, María Rodríguez Martínez

{"title":"Phenotype driven data augmentation methods for transcriptomic data.","authors":"Nikita Janakarajan, Mara Graziani, María Rodríguez Martínez","doi":"10.1093/bioadv/vbaf124","DOIUrl":"10.1093/bioadv/vbaf124","url":null,"abstract":"Summary: The application of machine learning methods to biomedical applications has seen many successes. However, working with transcriptomic data on supervised learning tasks is challenging due to its high dimensionality, low patient numbers, and class imbalances. Machine learning models tend to overfit these data and do not generalize well on out-of-distribution samples. Data augmentation strategies help alleviate this by introducing synthetic data points and acting as regularizers. However, existing approaches are either computationally intensive, require population parametric estimates, or generate insufficiently diverse samples. To address these challenges, we introduce two classes of phenotype-driven data augmentation approaches-signature-dependent and signature-independent. The signature-dependent methods assume the existence of distinct gene signatures describing some phenotype and are simple, non-parametric, and novel data augmentation methods. The signature-independent methods are a modification of the established Gamma-Poisson and Poisson sampling methods for gene expression data. As case studies, we apply our augmentation methods to transcriptomic data of colorectal and breast cancer. Through discriminative and generative experiments with external validation, we show that our methods improve patient stratification by <math><mrow><mn>5</mn> <mo>-</mo> <mn>15</mn> <mi>%</mi></mrow> </math> over other augmentation methods in their respective cases. The study additionally provides insights into the limited benefits of over-augmenting data.Availability and implementation: Code for reproducibility is available on GitHub.","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf124"},"PeriodicalIF":2.4,"publicationDate":"2025-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12141816/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144251090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

RaptGen-UI: an integrated platform for exploring and analyzing the sequence landscape of HT-SELEX experiments. RaptGen-UI：探索和分析HT-SELEX实验序列景观的集成平台。

IF 2.4

Bioinformatics advances Pub Date : 2025-05-23 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf120

Ryota Nakano, Natsuki Iwano, Akiko Ichinose, Michiaki Hamada

{"title":"RaptGen-UI: an integrated platform for exploring and analyzing the sequence landscape of HT-SELEX experiments.","authors":"Ryota Nakano, Natsuki Iwano, Akiko Ichinose, Michiaki Hamada","doi":"10.1093/bioadv/vbaf120","DOIUrl":"10.1093/bioadv/vbaf120","url":null,"abstract":"Summary: RaptGen-UI provides intuitive graphical user-interface of the system exploring and analyzing the sequence landscape of high-throughput (HT)-SELEX (Systematic Evolution of Ligands by EXponential enrichment) experiments through machine learning-driven visualization with optimization capabilities. This software enables wet-lab researchers to efficiently analyze HT-SELEX dataset and optimize RNA aptamers without requiring extensive computational expertise. The containerized architecture ensures secure local deployment and supports both of high-performance Graphics Processing Unit (GPU) acceleration and CPU-only environments, making it suitable for various research settings.Availability and implementation: This software is a web-based application running locally on the user's PC. The frontend is constructed using Next.js and Plotly.js with TypeScript, while the backend is developed using FastAPI, Celery, PostgreSQL RDBMS, and Redis with Python. Each module is encapsulated within Docker containers and deployed via Docker Compose. The system supports both CUDA GPU and CPU-only environments. Source code and documentation are freely available at https://github.com/hmdlab/RaptGen-UI.","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf120"},"PeriodicalIF":2.4,"publicationDate":"2025-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12245399/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144610446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

nf-core/marsseq: systematic preprocessing pipeline for MARS-seq experiments. nf-core/marsseq：用于MARS-seq实验的系统化预处理管道。

IF 2.4

Bioinformatics advances Pub Date : 2025-05-23 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf089

Martin Proks, Jose Alejandro Romero Herrera, Jakub Sedzinski, Joshua M Brickman

{"title":"nf-core/marsseq: systematic preprocessing pipeline for MARS-seq experiments.","authors":"Martin Proks, Jose Alejandro Romero Herrera, Jakub Sedzinski, Joshua M Brickman","doi":"10.1093/bioadv/vbaf089","DOIUrl":"10.1093/bioadv/vbaf089","url":null,"abstract":"Motivation: Single sequencing technology (scRNA-seq) enables the study of gene regulation at a single cell level. Although many sc-RNA-seq protocols have been established, they have varied in technical complexity, sequencing depth and multimodal capabilities leading to shared limitations in data interpretation due to a lack of standardized preprocessing and consistent data reproducibility. While plate based techniques such as Massively Parallel RNA Single cell Sequencing (MARS-seq2.0) provide reference data on the cells that will be sequenced, the data format limits the possible analysis. Here, we focus on the standardization of MARS-seq analysis and its applicability to RNA velocity.Results: We have taken the original MARS-seq2.0 pipeline and revised it to enable implementation using the nf-core framework. By doing so, we have simplified pipeline execution, enabling a streamlined application with increased transparency and scalability. We have incorporated additional checkpoints to verify experimental metadata and improved the pipeline by implementing a custom workflow for RNA velocity estimation. The pipeline is part of the nf-core bioinformatics community and is freely available at https://github.com/nfcore/marsseq with data analysis at https://github.com/brickmanlab/proks-et-al-2023.Availability and implementation: We introduce an updated preprocessing pipeline for MARS-seq experiments following state-of-the-art guidelines for scientific software development with the added ability to infer RNA velocity.","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf089"},"PeriodicalIF":2.4,"publicationDate":"2025-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12117365/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144176067","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Enrichment analysis for spatial and single-cell metabolomics accounting for molecular ambiguity. 空间和单细胞代谢组学的富集分析解释了分子模糊性。

IF 2.4

Bioinformatics advances Pub Date : 2025-05-21 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf100

Bishoy Wadie, Martijn R Molenaar, Lucas M Vieira, Theodore Alexandrov

引用次数: 0

MULAN: multimodal protein language model for sequence and structure encoding. MULAN：序列和结构编码的多模态蛋白质语言模型。

IF 2.8

Bioinformatics advances Pub Date : 2025-05-20 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf117

Daria Frolova, Marina Pak, Anna Litvin, Ilya Sharov, Dmitry Ivankov, Ivan Oseledets

{"title":"MULAN: multimodal protein language model for sequence and structure encoding.","authors":"Daria Frolova, Marina Pak, Anna Litvin, Ilya Sharov, Dmitry Ivankov, Ivan Oseledets","doi":"10.1093/bioadv/vbaf117","DOIUrl":"10.1093/bioadv/vbaf117","url":null,"abstract":"Motivation: Most protein language models (PLMs) produce high-quality representations using only protein sequences. However, incorporating known protein structures is important for many prediction tasks, leading to increased interest in structure-aware PLMs. Currently, structure-aware PLMs are either trained from scratch or add significant parameter overhead for the structure encoder.Results: In this study, we propose MULAN, a MULtimodal PLM for both sequence and ANgle-based structure encoding. MULAN has a pre-trained sequence encoder and an introduced parameter-efficient Structure Adapter, which are then fused and trained together. Based on the evaluation of nine downstream tasks, MULAN models of various sizes show a quality improvement compared to both sequence-only ESM2 and structure-aware SaProt. The main improvements are shown for the protein-protein interaction prediction (up to 0.12 in AUROC). Importantly, unlike other models, MULAN offers a cheap increase in structural awareness of protein representations because of the finetuning of existing PLMs instead of training from scratch. We perform a detailed analysis of the proposed model and demonstrate its awareness of the protein structure.Availability and implementation: The implementation, training data, and model checkpoints are available at https://github.com/DFrolova/MULAN.","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf117"},"PeriodicalIF":2.8,"publicationDate":"2025-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12452268/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145132955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

EzSEA: an interactive web interface for enzyme sequence evolution analysis. EzSEA：一个用于酶序列进化分析的交互式网络界面。

IF 2.4

Bioinformatics advances Pub Date : 2025-05-20 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf118

Angela K Jiang, Jerry Zhao, Xiaofang Jiang

{"title":"EzSEA: an interactive web interface for enzyme sequence evolution analysis.","authors":"Angela K Jiang, Jerry Zhao, Xiaofang Jiang","doi":"10.1093/bioadv/vbaf118","DOIUrl":"10.1093/bioadv/vbaf118","url":null,"abstract":"Motivation: Enzymes catalyze essential chemical reactions, driving metabolism, immunity, and growth. Understanding their evolution requires identifying mutations that shaped their functions and substrate interactions. Current methods lack integration of evolutionary history and intuitive visualization tools.Results: We develop Enzyme Sequence Evolution Analysis (EzSEA), a web interface that identifies putative functionally important mutations by performing the following steps: structural prediction, homology search, multiple sequence alignment and trimming, phylogenetic tree inference, ancestral sequence reconstruction, and enzyme delineation. The EzSEA web application enables intuitive visualization of results, highlighting key mutations and phylogenetic tree branches that putatively delineate the enzyme of interest. Finally, we validate EzSEA by identifying previously experimentally verified key mutations in the gut bacteria enzyme bilirubin reductase.Availability and implementation: EzSEA is freely available on the web at https://jianglabnlm.com/ezsea/.","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf118"},"PeriodicalIF":2.4,"publicationDate":"2025-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12179385/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144478033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

TROPPO: tissue-specific reconstruction and phenotype prediction using omics data. TROPPO：使用组学数据进行组织特异性重建和表型预测。

IF 2.4

Bioinformatics advances Pub Date : 2025-05-19 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf113

Alexandre Oliveira, Jorge Ferreira, Vítor Vieira, Bruno Sá, Miguel Rocha

引用次数: 0

Benchmarking accelerated next-generation sequencing analysis pipelines. 对标加速下一代测序分析管道。

IF 2.4

Bioinformatics advances Pub Date : 2025-05-15 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf085

Pubudu Saneth Samarakoon, Ghislain Fournous, Lars T Hansen, Ashen Wijesiri, Sen Zhao, Rodriguez Alex A, Tarak Nath Nandi, Ravi Madduri, Alexander D Rowe, Gard Thomassen, Eivind Hovig, Sabry Razick

{"title":"Benchmarking accelerated next-generation sequencing analysis pipelines.","authors":"Pubudu Saneth Samarakoon, Ghislain Fournous, Lars T Hansen, Ashen Wijesiri, Sen Zhao, Rodriguez Alex A, Tarak Nath Nandi, Ravi Madduri, Alexander D Rowe, Gard Thomassen, Eivind Hovig, Sabry Razick","doi":"10.1093/bioadv/vbaf085","DOIUrl":"10.1093/bioadv/vbaf085","url":null,"abstract":"Motivation: Industry-standard central processing unit (CPU)-based next-generation sequencing (NGS) analysis tools have led to longer runtimes, affecting their utility in time-sensitive clinical practices and population-scale research studies. To address this, researchers have developed accelerated NGS platforms like DRAGEN and Parabricks, which have significantly reduced runtimes-from days to hours. However, these studies have evaluated accelerated platforms independently without sufficiently assessing computational resource usage or thoroughly investigating speedup scalability, a gap our study is designed to address.Results: Corroborating previous studies, accelerated pipelines demonstrated shorter runtimes than CPU-only approaches, with Parabricks-H100 demonstrating the highest speedups, followed by DRAGEN. In mapping, DRAGEN outperformed Parabricks (L4 and A100) and matched H100 speedups. Parabricks (A100 and H100) variant calling demonstrated higher speedups than DRAGEN. Moreover, DRAGEN and Parabricks-H100 mapping showed positive trends in the coverage-based scalability analysis, while other configurations failed to scale effectively. Our profiler analysis provided new insights into the relationships between Parabricks' performances and resource usage patterns, revealing its potential for further improvements. Our findings and cost comparison help researchers select accelerated platforms based on coverage needs, timeframes, and budget, while suggesting optimization strategies.Availability and implementation: Datasets are described in the 'Data availability' section. Our NGS pipelines are available at https://github.com/NAICNO/accelerated_genomics.","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf085"},"PeriodicalIF":2.4,"publicationDate":"2025-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12092081/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144112841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0