BMC Bioinformatics最新文献

筛选
英文 中文
Integrating genetic and gene expression data in network-based stratification analysis of cancers. 在基于网络的癌症分层分析中整合遗传和基因表达数据。
IF 2.9 3区 生物学
BMC Bioinformatics Pub Date : 2025-05-13 DOI: 10.1186/s12859-025-06143-y
Kenny Liou, Ji-Ping Wang
{"title":"Integrating genetic and gene expression data in network-based stratification analysis of cancers.","authors":"Kenny Liou, Ji-Ping Wang","doi":"10.1186/s12859-025-06143-y","DOIUrl":"10.1186/s12859-025-06143-y","url":null,"abstract":"<p><strong>Background: </strong>Cancers are complex diseases that have heterogeneous genetic drivers and varying clinical outcomes. A critical area of cancer research is organizing patient cohorts into subtypes and associating subtypes with clinical and biological outcomes for more effective prognosis and treatment. Large-scale studies have collected a plethora of omics data across multiple tumor types, providing an extensive dataset for stratifying patient cohorts. Network-based stratification (NBS) approaches have been presented to classify cancer tumors using somatic mutation data. A challenge in cancer stratification is integrating omics data to yield clinically meaningful subtypes. In this study, we investigate a novel approach to the NBS framework by integrating somatic mutation data with RNA sequencing data and investigating the effectiveness of integrated NBS on three cancers: ovarian, bladder, and uterine cancer.</p><p><strong>Results: </strong>We show that integrated NBS subtypes are more significantly associated with overall survival or histology. Specifically, we observe that integrated NBS subtypes for ovarian and bladder cancer were more significantly associated with patient survival than single-data type NBS subtypes, even when accounting for covariates. In addition, we show that integrated NBS subtypes for bladder and uterine are more significantly associated with tumor histology than single-data type NBS subtypes. Integrated NBS networks also reveal highly influential genes that span across multiple integrated NBS subtypes and subtype-specific genes. Pathway enrichment analysis of integrated NBS subtypes reveal overarching biological differences between subtypes. These genes and pathways are involved in a heterogeneous set of cell functions, including ubiquitin homeostasis, p53 regulation, cytokine and chemokine signaling, and cell proliferation, emphasizing the importance of identifying not only cancer-specific gene drivers but also subtype-specific tumor drivers.</p><p><strong>Conclusions: </strong>Our study highlights the significance of integrating multi-omics data within the NBS framework to enhance cancer subtyping, specifically its utility in offering profound implications for personalized prognosis and treatment strategies. These insights contribute to the ongoing advancement of computational subtyping methods to uncover more targeted and effective therapeutic treatments while facilitating the discovery of cancer driver genes.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"126"},"PeriodicalIF":2.9,"publicationDate":"2025-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12070578/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143965752","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PCVR: a pre-trained contextualized visual representation for DNA sequence classification. PCVR: DNA序列分类的预训练语境化视觉表示。
IF 2.9 3区 生物学
BMC Bioinformatics Pub Date : 2025-05-09 DOI: 10.1186/s12859-025-06136-x
Jiarui Zhou, Hui Wu, Kang Du, Wengang Zhou, Cong-Zhao Zhou, Houqiang Li
{"title":"PCVR: a pre-trained contextualized visual representation for DNA sequence classification.","authors":"Jiarui Zhou, Hui Wu, Kang Du, Wengang Zhou, Cong-Zhao Zhou, Houqiang Li","doi":"10.1186/s12859-025-06136-x","DOIUrl":"https://doi.org/10.1186/s12859-025-06136-x","url":null,"abstract":"<p><strong>Background: </strong>The classification of DNA sequences is pivotal in bioinformatics, essentially for genetic information analysis. Traditional alignment-based tools tend to have slow speed and low recall. Machine learning methods learn implicit patterns from data with encoding techniques such as k-mer counting and ordinal encoding, which fail to handle long sequences or sacrifice structural and sequential information. Frequency chaos game representation (FCGR) converts DNA sequences of arbitrary lengths into fixed-size images, breaking free from the constraints of sequence length while preserving more sequential information than other representations. However, existing works merely consider local information, ignoring long-range dependencies and global contextual information within FCGR image.</p><p><strong>Results: </strong>We propose PCVR, a Pre-trained Contextualized Visual Representation for DNA sequence classification. PCVR encodes FCGR with a vision transformer into contextualized features containing more global information. To meet the substantial data requirements of the training of vision transformer and learn more robust features, we pre-train the encoder with a masked autoencoder. Pre-trained PCVR exhibits impressive performance on three datasets even with only unsupervised learning. After fine-tuning, PCVR outperforms existing methods on superkingdom and phylum levels. Additionally, our ablation studies confirm the contribution of the vision transformer encoder and masked autoencoder pre-training to performance improvement.</p><p><strong>Conclusions: </strong>PCVR significantly improves DNA sequence classification accuracy and shows strong potential for new species discovery due to its effective capture of global information and robustness. Codes for PCVR are available at https://github.com/jiaruizhou/PCVR .</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"125"},"PeriodicalIF":2.9,"publicationDate":"2025-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12065381/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143967247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sensitivity analysis on protein-protein interaction networks through deep graph networks. 基于深度图网络的蛋白-蛋白相互作用网络敏感性分析。
IF 2.9 3区 生物学
BMC Bioinformatics Pub Date : 2025-05-08 DOI: 10.1186/s12859-025-06140-1
Alessandro Dipalma, Michele Fontanesi, Alessio Micheli, Paolo Milazzo, Marco Podda
{"title":"Sensitivity analysis on protein-protein interaction networks through deep graph networks.","authors":"Alessandro Dipalma, Michele Fontanesi, Alessio Micheli, Paolo Milazzo, Marco Podda","doi":"10.1186/s12859-025-06140-1","DOIUrl":"https://doi.org/10.1186/s12859-025-06140-1","url":null,"abstract":"<p><strong>Background: </strong>Protein-protein interaction networks (PPINs) provide a comprehensive view of the intricate biochemical processes that take place in living organisms. In recent years, the size and information content of PPINs have grown thanks to techniques that allow for the functional association of proteins. However, PPINs are static objects that cannot fully describe the dynamics of the protein interactions; these dynamics are usually studied from external sources and can only be added to the PPIN as annotations. In contrast, the time-dependent characteristics of cellular processes are described in Biochemical Pathways (BP), which frame complex networks of chemical reactions as dynamical systems. Their analysis with numerical simulations allows for the study of different dynamical properties. Unfortunately, available BPs cover only a small portion of the interactome, and simulations are often hampered by the unavailability of kinetic parameters or by their computational cost. In this study, we explore the possibility of enriching PPINs with dynamical properties computed from BPs. We focus on the global dynamical property of sensitivity, which measures how a change in the concentration of an input molecular species influences the concentration of an output molecular species at the steady state of the dynamical system.</p><p><strong>Results: </strong>We started with the analysis of BPs via ODE simulations, which enabled us to compute the sensitivity associated with multiple pairs of chemical species. The sensitivity information was then injected into a PPIN, using public ontologies (BioGRID, UniPROT) to map entities at the BP level with nodes at the PPIN level. The resulting annotated PPIN, termed the DyPPIN (Dynamics of PPIN) dataset, was used to train a DGN to predict the sensitivity relationships among PPIN proteins. Our experimental results show that this model can predict these relationships effectively under different use case scenarios. Furthermore, we show that the PPIN structure (i.e., the way the PPIN is \"wired\") is essential to infer the sensitivity, and that further annotating the PPIN nodes with protein sequence embeddings improves the predictive accuracy.</p><p><strong>Conclusion: </strong>To the best of our knowledge, the model proposed in this study is the first that allows performing sensitivity analysis directly on PPINs. Our findings suggest that, despite the high level of abstraction, the structure of the PPIN holds enough information to infer dynamic properties without needing an exact model of the underlying processes. In addition, the designed pipeline is flexible and can be easily integrated into drug design, repurposing, and personalized medicine processes.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"124"},"PeriodicalIF":2.9,"publicationDate":"2025-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12063327/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143971908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sculpting molecules in text-3D space: a flexible substructure aware framework for text-oriented molecular optimization. 在文本-三维空间中雕刻分子:面向文本的分子优化的柔性子结构感知框架。
IF 2.9 3区 生物学
BMC Bioinformatics Pub Date : 2025-05-07 DOI: 10.1186/s12859-025-06072-w
Kaiwei Zhang, Yange Lin, Guangcheng Wu, Yuxiang Ren, Xuecang Zhang, Bo Wang, Xiao-Yu Zhang, Weitao Du
{"title":"Sculpting molecules in text-3D space: a flexible substructure aware framework for text-oriented molecular optimization.","authors":"Kaiwei Zhang, Yange Lin, Guangcheng Wu, Yuxiang Ren, Xuecang Zhang, Bo Wang, Xiao-Yu Zhang, Weitao Du","doi":"10.1186/s12859-025-06072-w","DOIUrl":"https://doi.org/10.1186/s12859-025-06072-w","url":null,"abstract":"<p><p>The integration of deep learning, particularly AI-Generated Content, with high-quality data derived from ab initio calculations has emerged as a promising avenue for transforming the landscape of scientific research. However, the challenge of designing molecular drugs or materials that incorporate multi-modality prior knowledge remains a critical and complex undertaking. Specifically, achieving a practical molecular design necessitates not only meeting the diversity requirements but also addressing structural and textural constraints with various symmetries outlined by domain experts. In this article, we present an innovative approach to tackle this inverse design problem by formulating it as a multi-modality guidance optimization task. Our proposed solution involves a textural-structure alignment symmetric diffusion framework for the implementation of molecular optimization tasks, namely 3DToMolo. 3DToMolo aims to harmonize diverse modalities including textual description features and graph structural features, aligning them seamlessly to produce molecular structures adhere to specified symmetric structural and textural constraints by experts in the field. Experimental trials across three guidance optimization settings have shown a superior hit optimization performance compared to state-of-the-art methodologies. Moreover, 3DToMolo demonstrates the capability to discover potential novel molecules, incorporating specified target substructures, without the need for prior knowledge. This work not only holds general significance for the advancement of deep learning methodologies but also paves the way for a transformative shift in molecular design strategies. 3DToMolo creates opportunities for a more nuanced and effective exploration of the vast chemical space, opening new frontiers in the development of molecular entities with tailored properties and functionalities.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"123"},"PeriodicalIF":2.9,"publicationDate":"2025-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12060419/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143958233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
py_ped_sim: a flexible forward pedigree and genetic simulator for complex family pedigree analysis. Py_ped_sim:灵活的前向谱系和遗传模拟器,用于复杂的家族谱系分析。
IF 2.9 3区 生物学
BMC Bioinformatics Pub Date : 2025-05-07 DOI: 10.1186/s12859-025-06142-z
Miguel Guardado, Cynthia Perez, Sthen Campana, Berenice Chavez Rojas, Joaquín Magaña, Shalom Jackson, Emily Samperio, Selena Hernandez, Kaela Syas, Ryan D Hernandez, Elena I Zavala, Rori V Rohlfs
{"title":"py_ped_sim: a flexible forward pedigree and genetic simulator for complex family pedigree analysis.","authors":"Miguel Guardado, Cynthia Perez, Sthen Campana, Berenice Chavez Rojas, Joaquín Magaña, Shalom Jackson, Emily Samperio, Selena Hernandez, Kaela Syas, Ryan D Hernandez, Elena I Zavala, Rori V Rohlfs","doi":"10.1186/s12859-025-06142-z","DOIUrl":"https://doi.org/10.1186/s12859-025-06142-z","url":null,"abstract":"<p><strong>Background: </strong>Large-scale family pedigrees are commonly used across medical, evolutionary, and forensic genetics. These pedigrees are tools for identifying genetic disorders, tracking evolutionary patterns, and establishing familial relationships via forensic genetic identification. However, there is a lack of software to accurately simulate different pedigree structures along with genomes corresponding to those individuals in a family pedigree. This limits simulation-based evaluations of methods that use pedigrees.</p><p><strong>Results: </strong>We have developed a python command-line-based tool called py_ped_sim that facilitates the simulation of pedigree structures and the genomes of individuals in a pedigree. py_ped_sim represents pedigrees as directed acyclic graphs, enabling conversion between standard pedigree formats and integration with the forward population genetic simulator, SLiM. Notably, py_ped_sim allows the simulation of varying numbers of offspring for a set of parents, with the capacity to shift the distribution of sibship sizes over generations. We additionally add simulations for events of misattributed paternity, which offers a way to simulate half-sibling relationships, and simulations to extend the breadth of a family pedigree. We validated the accuracy of both our genome simulator and pedigree simulator. We show that we can simulate genomes onto family pedigrees with levels of expected kinship.</p><p><strong>Conclusions: </strong>py_ped_sim is a user-friendly and open-source solution for simulating pedigree structures and conducting pedigree genome simulations. It empowers medical, forensic, and evolutionary genetics researchers to gain deeper insights into the dynamics of genetic inheritance and relatedness within families.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"122"},"PeriodicalIF":2.9,"publicationDate":"2025-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12060417/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143967345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-proteins similarity-based sampling to select representative genomes from large databases. 基于多蛋白质相似性的采样从大型数据库中选择代表性基因组。
IF 2.9 3区 生物学
BMC Bioinformatics Pub Date : 2025-05-06 DOI: 10.1186/s12859-025-06095-3
Rémi-Vinh Coudert, Jean-Philippe Charrier, Frédéric Jauffrit, Jean-Pierre Flandrois, Céline Brochier-Armanet
{"title":"Multi-proteins similarity-based sampling to select representative genomes from large databases.","authors":"Rémi-Vinh Coudert, Jean-Philippe Charrier, Frédéric Jauffrit, Jean-Pierre Flandrois, Céline Brochier-Armanet","doi":"10.1186/s12859-025-06095-3","DOIUrl":"https://doi.org/10.1186/s12859-025-06095-3","url":null,"abstract":"<p><strong>Background: </strong>Genome sequence databases are growing exponentially, but with high redundancy and uneven data quality. For these reasons, selecting representative subsets of genomes is an essential step for almost all studies. However, most current sampling approaches are biased and unable to process large datasets in a reasonable time.</p><p><strong>Methods: </strong>Here we present MPS-Sampling (Multiple-Protein Similarity-based Sampling), a fast, scalable, and efficient method for selecting reliable and representative samples of genomes from very large datasets. Using families of homologous proteins as input, MPS-Sampling delineates homogeneous groups of genomes through two successive clustering steps. Representative genomes are then selected within these groups according to predefined or user-defined priority criteria.</p><p><strong>Results: </strong>MPS-Sampling was applied to a dataset of 48 ribosomal protein families from 178,203 bacterial genomes to generate representative genome sets of various size, corresponding to a sampling of 32.17% down to 0.3% of the complete dataset. An in-depth analysis shows that the selected genomes are both taxonomically and phylogenetically representative of the complete dataset, demonstrating the relevance of the approach.</p><p><strong>Conclusion: </strong>MPS-Sampling provides an efficient, fast and scalable way to sample large collections of genomes in an acceptable computational time. MPS-Sampling does not rely on taxonomic information and does not require the inference of phylogenetic trees, thus avoiding the biases inherent in these approaches. As such, MPS-Sampling meets the needs of a growing number of users.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"121"},"PeriodicalIF":2.9,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12057276/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143971929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
M-DeepAssembly: enhanced DeepAssembly based on multi-objective multi-domain protein conformation sampling. M-DeepAssembly:基于多目标多结构域蛋白质构象采样的增强DeepAssembly。
IF 2.9 3区 生物学
BMC Bioinformatics Pub Date : 2025-05-05 DOI: 10.1186/s12859-025-06131-2
Xinyue Cui, Yuhao Xia, Minghua Hou, Xuanfeng Zhao, Suhui Wang, Guijun Zhang
{"title":"M-DeepAssembly: enhanced DeepAssembly based on multi-objective multi-domain protein conformation sampling.","authors":"Xinyue Cui, Yuhao Xia, Minghua Hou, Xuanfeng Zhao, Suhui Wang, Guijun Zhang","doi":"10.1186/s12859-025-06131-2","DOIUrl":"https://doi.org/10.1186/s12859-025-06131-2","url":null,"abstract":"<p><strong>Background: </strong>Association and cooperation among structural domains play an important role in protein function and drug design. Despite remarkable advancements in highly accurate single-domain protein structure prediction through the collaborative efforts of the community using deep learning, challenges still exist in predicting multi-domain protein structures when the evolutionary signal for a given domain pair is weak or the protein structure is large.</p><p><strong>Results: </strong>To alleviate the above challenges, we proposed M-DeepAssembly, a protocol based on multi-objective protein conformation sampling algorithm for multi-domain protein structure prediction. Firstly, the inter-domain interactions and full-length sequence distance features are extracted through DeepAssembly and AlphaFold2, respectively. Secondly, subject to these features, we constructed a multi-objective energy model and designed a sampling algorithm for exploring and exploiting conformational space to generate ensembles. Finally, the output protein structure was selected from the ensembles using our in-house developed model quality assessment algorithm. On the test set of 164 multi-domain proteins, the results show that the average TM-score of M-DeepAssembly is 15.4% and 2.0% higher than AlphaFold2 and DeepAssembly, respectively. It is worth noting that there are models with higher accuracy in ensembles, achieving an improvement of 20.3% and 6.4% relative to the two baseline methods, although these models were not selected. Furthermore, when compared to the prediction results of AlphaFold2 for CASP15 multi-domain targets, M-DeepAssembly demonstrates certain performance advantages.</p><p><strong>Conclusions: </strong>M-DeepAssembly provides a distinctive multi-domain protein assembly algorithm, which can alleviate the current challenges of weak evolutionary signals and large structures to some extent by forming diverse ensembles using multi-objective protein conformation sampling algorithm. The proposed method contributes to exploring the functions of multi-domain proteins, especially providing new insights into targets with multiple conformational states.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"120"},"PeriodicalIF":2.9,"publicationDate":"2025-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12054043/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143964644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PathNetDRP: a novel biomarker discovery framework using pathway and protein-protein interaction networks for immune checkpoint inhibitor response prediction. PathNetDRP:一种新的生物标志物发现框架,使用途径和蛋白质相互作用网络来预测免疫检查点抑制剂的反应。
IF 2.9 3区 生物学
BMC Bioinformatics Pub Date : 2025-05-05 DOI: 10.1186/s12859-025-06125-0
Dohee Lee, Jaegyoon Ahn, Jonghwan Choi
{"title":"PathNetDRP: a novel biomarker discovery framework using pathway and protein-protein interaction networks for immune checkpoint inhibitor response prediction.","authors":"Dohee Lee, Jaegyoon Ahn, Jonghwan Choi","doi":"10.1186/s12859-025-06125-0","DOIUrl":"https://doi.org/10.1186/s12859-025-06125-0","url":null,"abstract":"<p><strong>Background: </strong>Predicting immune checkpoint inhibitor (ICI) response remains a significant challenge in cancer immunotherapy. Many existing approaches rely on differential gene expression analysis or predefined immune signatures, which may fail to capture the complex regulatory mechanisms underlying immune response. Network-based models attempt to integrate biological interactions, but they often lack a quantitative framework to assess how individual genes contribute within pathways, limiting the specificity and interpretability of biomarkers. Given these limitations, we developed PathNetDRP, a framework that integrates biological pathways, protein-protein interaction networks, and machine learning to identify functionally relevant biomarkers for ICI response prediction.</p><p><strong>Results: </strong>We introduce PathNetDRP, a novel biomarker discovery approach that applies the PageRank algorithm to prioritize ICI-associated genes, maps them to relevant biological pathways, and calculates PathNetGene scores to quantify their contribution to immune response. Unlike conventional methods that focus solely on gene expression differences, PathNetDRP systematically incorporates biological context to improve biomarker selection. Validation across multiple independent cancer cohorts showed that PathNetDRP achieved strong predictive performance, with cross-validation the area under the receiver operating characteristic curves increasing from 0.780 to 0.940. Interestingly, PathNetDRP did not merely improve predictive accuracy; it also provided insights into key immune-related pathways, reinforcing its potential for identifying clinically relevant biomarkers.</p><p><strong>Conclusion: </strong>The biomarkers identified by PathNetDRP demonstrated robust predictive performance across cross-validation and independent validation datasets, suggesting their potential utility in clinical applications. Furthermore, enrichment analysis highlighted key immune-related pathways, providing a deeper understanding of their role in ICI response regulation. While these findings underscore the promise of PathNetDRP, future work will explore the integration of additional predictive features, such as tumor mutational burden and microsatellite instability, to further refine its applicability.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"119"},"PeriodicalIF":2.9,"publicationDate":"2025-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12051301/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143956916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fast noisy long read alignment with multi-level parallelism. 具有多级并行的快速噪声长读对齐。
IF 2.9 3区 生物学
BMC Bioinformatics Pub Date : 2025-05-02 DOI: 10.1186/s12859-025-06129-w
Zeyu Xia, Canqun Yang, Chenchen Peng, Yifei Guo, Yufei Guo, Tao Tang, Yingbo Cui
{"title":"Fast noisy long read alignment with multi-level parallelism.","authors":"Zeyu Xia, Canqun Yang, Chenchen Peng, Yifei Guo, Yufei Guo, Tao Tang, Yingbo Cui","doi":"10.1186/s12859-025-06129-w","DOIUrl":"https://doi.org/10.1186/s12859-025-06129-w","url":null,"abstract":"<p><strong>Background: </strong>The advent of Single Molecule Real-Time (SMRT) sequencing has overcome many limitations of second-generation sequencing, such as limited read lengths, PCR amplification biases. However, longer reads increase data volume exponentially and high error rates make many existing alignment tools inapplicable. Additionally, a single CPU's performance bottleneck restricts the effectiveness of alignment algorithms for SMRT sequencing.</p><p><strong>Results: </strong>To address these challenges, we introduce ParaHAT, a parallel alignment algorithm for noisy long reads. ParaHAT utilizes vector-level, thread-level, process-level, and heterogeneous parallelism. We redesign the dynamic programming matrices layouts to eliminate data dependency in the base-level alignment, enabling effective vectorization. We further enhance computational speed through heterogeneous parallel technology and implement the algorithm for multi-node computing using MPI, overcoming the computational limits of a single node.</p><p><strong>Conclusions: </strong>Performance evaluations show that ParaHAT got a 10.03x speedup in base-level alignment, with a parallel acceleration ratio and weak scalability metric of 94.61 and 98.98% on 128 nodes, respectively.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"118"},"PeriodicalIF":2.9,"publicationDate":"2025-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12049014/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143962795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
M3S-GRPred: a novel ensemble learning approach for the interpretable prediction of glucocorticoid receptor antagonists using a multi-step stacking strategy. M3S-GRPred:一种新的集成学习方法,用于糖皮质激素受体拮抗剂的可解释预测,使用多步堆叠策略。
IF 2.9 3区 生物学
BMC Bioinformatics Pub Date : 2025-04-30 DOI: 10.1186/s12859-025-06132-1
Nalini Schaduangrat, Hathaichanok Chuntakaruk, Thanyada Rungrotmongkol, Pakpoom Mookdarsanit, Watshara Shoombuatong
{"title":"M3S-GRPred: a novel ensemble learning approach for the interpretable prediction of glucocorticoid receptor antagonists using a multi-step stacking strategy.","authors":"Nalini Schaduangrat, Hathaichanok Chuntakaruk, Thanyada Rungrotmongkol, Pakpoom Mookdarsanit, Watshara Shoombuatong","doi":"10.1186/s12859-025-06132-1","DOIUrl":"https://doi.org/10.1186/s12859-025-06132-1","url":null,"abstract":"<p><p>Accelerating drug discovery for glucocorticoid receptor (GR)-related disorders, including innovative machine learning (ML)-based approaches, holds promise in advancing therapeutic development, optimizing treatment efficacy, and mitigating adverse effects. While experimental methods can accurately identify GR antagonists, they are often not cost-effective for large-scale drug discovery. Thus, computational approaches leveraging SMILES information for precise in silico identification of GR antagonists are crucial, enabling efficient and scalable drug discovery. Here, we develop a new ensemble learning approach using a multi-step stacking strategy (M3S), termed M3S-GRPred, aimed at rapidly and accurately discovering novel GR antagonists. To the best of our knowledge, M3S-GRPred is the first SMILES-based predictor designed to identify GR antagonists without the use of 3D structural information. In M3S-GRPred, we first constructed different balanced subsets using an under-sampling approach. Using these balanced subsets, we explored and evaluated heterogeneous base-classifiers trained with a variety of SMILES-based feature descriptors coupled with popular ML algorithms. Finally, M3S-GRPred was constructed by integrating probabilistic feature from the selected base-classifiers derived from a two-step feature selection technique. Our comparative experiments demonstrate that M3S-GRPred can precisely identify GR antagonists and effectively address the imbalanced dataset. Compared to traditional ML classifiers, M3S-GRPred attained superior performance in terms of both the training and independent test datasets. Additionally, M3S-GRPred was applied to identify potential GR antagonists among FDA-approved drugs confirmed through molecular docking, followed by detailed MD simulation studies for drug repurposing in Cushing's syndrome. We anticipate that M3S-GRPred will serve as an efficient screening tool for discovering novel GR antagonists from vast libraries of unknown compounds in a cost-effective manner.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"117"},"PeriodicalIF":2.9,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12044944/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143958228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信