BMC Bioinformatics最新文献

筛选
英文 中文
PtWAVE: a high-sensitive deconvolution software of sequencing trace for the detection of large indels in genome editing. PtWAVE:一款高灵敏度的测序迹反褶积软件,用于基因组编辑中大序列的检测。
IF 2.9 3区 生物学
BMC Bioinformatics Pub Date : 2025-04-29 DOI: 10.1186/s12859-025-06139-8
Kazuki Nakamae, Saya Ide, Nagaki Ohnuki, Yoshiko Nakagawa, Keisuke Okuhara, Hidemasa Bono
{"title":"PtWAVE: a high-sensitive deconvolution software of sequencing trace for the detection of large indels in genome editing.","authors":"Kazuki Nakamae, Saya Ide, Nagaki Ohnuki, Yoshiko Nakagawa, Keisuke Okuhara, Hidemasa Bono","doi":"10.1186/s12859-025-06139-8","DOIUrl":"https://doi.org/10.1186/s12859-025-06139-8","url":null,"abstract":"<p><strong>Background: </strong>Tracking of Insertions and DEletions (TIDE) analysis, which computationally deconvolves capillary sequencing data derived from the DNA of bulk or clonal cell populations to estimate the efficiency of targeted mutagenesis by programmable nucleases, has played a significant role in the field of genome editing. However, the detection range covered by conventional TIDE analysis is limited. Range extension for deconvolution is required to detect larger deletions and insertions (indels) derived from genome editing in TIDE analysis. However, extending the deconvolution range introduces uncertainty into the deconvolution process. Moreover, the accuracy and sensitivity of TIDE analysis tools for large deletions (> 50 bp) remain poorly understood.</p><p><strong>Results: </strong>In this study, we introduced a new software called PtWAVE that can detect a wide range of indel sizes, up to 200 bp. PtWAVE also offers options for variable selection and fitting algorithms to prevent uncertainties in the model. We evaluated the performance of PtWAVE by using in vitro capillary sequencing data that mimicked DNA sequencing, including large deletions. Furthermore, we confirmed that PtWAVE can stably analyze trace sequencing data derived from actual genome-edited samples.</p><p><strong>Conclusions: </strong>PtWAVE demonstrated superior accuracy and sensitivity compared to the existing TIDE analysis tools for DNA samples, including large deletions. PtWAVE can accelerate genome editing applications in organisms and cell types in which large deletions often occur when programmable nucleases are applied.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"114"},"PeriodicalIF":2.9,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12039204/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143962890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PPI-Graphomer: enhanced protein-protein affinity prediction using pretrained and graph transformer models. PPI-Graphomer:使用预训练和图形转换模型增强蛋白质亲和预测。
IF 2.9 3区 生物学
BMC Bioinformatics Pub Date : 2025-04-29 DOI: 10.1186/s12859-025-06123-2
Jun Xie, Youli Zhang, Ziyang Wang, Xiaocheng Jin, Xiaoli Lu, Shengxiang Ge, Xiaoping Min
{"title":"PPI-Graphomer: enhanced protein-protein affinity prediction using pretrained and graph transformer models.","authors":"Jun Xie, Youli Zhang, Ziyang Wang, Xiaocheng Jin, Xiaoli Lu, Shengxiang Ge, Xiaoping Min","doi":"10.1186/s12859-025-06123-2","DOIUrl":"https://doi.org/10.1186/s12859-025-06123-2","url":null,"abstract":"<p><p>Protein-protein interactions (PPIs) refer to the phenomenon of protein binding through various types of bonds to execute biological functions. These interactions are critical for understanding biological mechanisms and drug research. Among these, the protein binding interface is a critical region involved in protein-protein interactions, particularly the hotspot residues on it that play a key role in protein interactions. Current deep learning methods trained on large-scale data can characterize proteins to a certain extent, but they often struggle to adequately capture information about protein binding interfaces. To address this limitation, we propose the PPI-Graphomer module, which integrates pretrained features from large-scale language models and inverse folding models. This approach enhances the characterization of protein binding interfaces by defining edge relationships and interface masks on the basis of molecular interaction information. Our model outperforms existing methods across multiple benchmark datasets and demonstrates strong generalization capabilities.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"116"},"PeriodicalIF":2.9,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12042501/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143961583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A framework for predictive modeling of microbiome multi-omics data: latent interacting variable-effects (LIVE) modeling. 微生物组多组学数据的预测建模框架:潜在相互作用变量效应(LIVE)建模。
IF 2.9 3区 生物学
BMC Bioinformatics Pub Date : 2025-04-29 DOI: 10.1186/s12859-025-06134-z
Javier Munoz Briones, Douglas K Brubaker
{"title":"A framework for predictive modeling of microbiome multi-omics data: latent interacting variable-effects (LIVE) modeling.","authors":"Javier Munoz Briones, Douglas K Brubaker","doi":"10.1186/s12859-025-06134-z","DOIUrl":"https://doi.org/10.1186/s12859-025-06134-z","url":null,"abstract":"<p><strong>Background: </strong>The number and size of multi-omics datasets with paired measurements of the host and microbiome is rapidly increasing with the advance of sequencing technologies. As it becomes routine to generate these datasets, computational methods to aid in their interpretation become increasingly important. Here, we present a framework for integration of microbiome multi-omics data: Latent Interacting Variable Effects (LIVE) modeling. LIVE integrates multi-omics data using single-omic latent variables (LV) organized in a structured meta-model to determine the combinations of features most predictive of a phenotype or condition.</p><p><strong>Results: </strong>We developed a supervised version of LIVE leveraging sparse Partial Least Squares Discriminant Analysis (sPLS-DA) LVs, and an unsupervised version leveraging sparse Principal Component Analysis (sPCA) principal components which both can incorporate covariate awarness. LIVE performance was tested on publicly available metagenomic and metabolomics data set from Crohn's Disease (CD) and Ulcerative Colitis (UC) status patients in the PRISM and LLDeep cohorts, and benchmarked against existing gut microbiome multi-omics approaches and vaginal microbiome datasests, achieving consistent and comparable performances. In addition to these benchmarking efforts, we present a detailed analysis and interpretation of both versions of LIVE using the PRISM and LLDeep cohorts. LIVE reduced the number of feature interactions from the original datasets for CD and UC from millions to less than 20,000 while conditioning the disease-predictive power of gut microbes, metabolites, enzymes, on clinical variables.</p><p><strong>Conclusions: </strong>LIVE makes a distinct, complementary contribution to current methods to integrate microbiome data and offers key advantages to existing approaches in the interpretable integration of multi-omics data with clinical variables to predict to disease outcomes and identify microbiome mechanisms of disease.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"115"},"PeriodicalIF":2.9,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12042529/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143953228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HPC-T-Assembly: a pipeline for de novo transcriptome assembly of large multi-specie datasets. HPC-T-Assembly:一个用于大型多物种数据集从头转录组组装的管道。
IF 2.9 3区 生物学
BMC Bioinformatics Pub Date : 2025-04-28 DOI: 10.1186/s12859-025-06121-4
Franco Liberati, Taiel Maximiliano Pose Marino, Paolo Bottoni, Daniele Canestrelli, Tiziana Castrignanò
{"title":"HPC-T-Assembly: a pipeline for de novo transcriptome assembly of large multi-specie datasets.","authors":"Franco Liberati, Taiel Maximiliano Pose Marino, Paolo Bottoni, Daniele Canestrelli, Tiziana Castrignanò","doi":"10.1186/s12859-025-06121-4","DOIUrl":"https://doi.org/10.1186/s12859-025-06121-4","url":null,"abstract":"<p><strong>Background: </strong>Recent years have seen a substantial increase in RNA-seq data production, with this technique becoming the primary approach for gene expression studies across a wide range of non-model organisms. The majority of these organisms lack a well-annotated reference genome to serve as a basis for studying differentially expressed genes (DEGs). As an alternative cost-effective protocol to using a reference genome, the assembly of RNA-seq raw reads is performed to produce what is referred to as a 'de novo transcriptome,' serving as a reference for subsequent DEGs' analysis. This assembly step for conventional DEGs analysis pipelines for non-model organisms is a computationally expensive task. Furthermore, the complexity of the de novo transcriptome assembly workflows poses a challenge for researchers in implementing best-practice techniques and the most recent software versions, particularly when applied to various organisms of interest.</p><p><strong>Results: </strong>To address computational challenges in transcriptomic analyses of non-model organisms, we present HPC-T-Assembly, a tool for de novo transcriptome assembly from RNA-seq data on high-performance computing (HPC) infrastructures. It is designed for straightforward setup via a Web-oriented interface, allowing analysis configuration for several species. Once configuration data is provided, the entire parallel computing software for assembly is automatically generated and can be launched on a supercomputer with a simple command line. Intermediate and final outputs of the assembly pipeline include additional post-processing steps, such as assembly quality control, ORF prediction, and transcript count matrix construction.</p><p><strong>Conclusion: </strong>HPC-T-Assembly allows users, through a user-friendly Web-oriented interface, to configure a run for simultaneous assemblies of RNA-seq data from multiple species. The parallel pipeline, launched on HPC infrastructures, significantly reduces computational load and execution times, enabling large-scale transcriptomic and meta-transcriptomics analysis projects.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"113"},"PeriodicalIF":2.9,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12039220/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143974207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FastQTLmapping: an ultra-fast and memory efficient package for mQTL-like analysis. FastQTLmapping:用于类似mqtl的分析的超快速且内存高效的包。
IF 2.9 3区 生物学
BMC Bioinformatics Pub Date : 2025-04-27 DOI: 10.1186/s12859-025-06130-3
Xingjian Gao, Jiarui Li, Xinxuan Liu, Qianqian Peng, Han Jing, Sibte Hadi, Andrew E Teschendorff, Sijia Wang, Fan Liu
{"title":"FastQTLmapping: an ultra-fast and memory efficient package for mQTL-like analysis.","authors":"Xingjian Gao, Jiarui Li, Xinxuan Liu, Qianqian Peng, Han Jing, Sibte Hadi, Andrew E Teschendorff, Sijia Wang, Fan Liu","doi":"10.1186/s12859-025-06130-3","DOIUrl":"https://doi.org/10.1186/s12859-025-06130-3","url":null,"abstract":"<p><strong>Background: </strong>FastQTLmapping addresses the need for an ultra-fast and memory-efficient solver capable of handling exhaustive multiple regression analysis with a vast number of dependent and explanatory variables, including covariates. This challenge is especially pronounced in methylation quantitative trait loci (mQTL)-like analysis, which typically involves high-dimensional genetic and epigenetic data.</p><p><strong>Results: </strong>FastQTLmapping is a precompiled C++ software solution accelerated by Intel MKL and GSL, freely available at https://github.com/Fun-Gene/fastQTLmapping . Compared to state-of-the-art methods (MatrixEQTL, FastQTL, and TensorQTL), fastQTLmapping demonstrated an order of magnitude speed improvement, coupled with a marked reduction in peak memory usage. In a large dataset consisting of 3500 individuals, 8 million SNPs, 0.8 million CpGs, and 20 covariates, fastQTLmapping completed the entire mQTL analysis in 4.5 h with only 13.1 GB peak memory usage.</p><p><strong>Conclusions: </strong>FastQTLmapping effectively expedites comprehensive mQTL analyses by providing a robust and generic approach that accommodates large-scale genomic datasets with covariates. This solution has the potential to streamline mQTL-like studies and inform future method development for efficient computational genomics.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"112"},"PeriodicalIF":2.9,"publicationDate":"2025-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12036243/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143953063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
metaTP: a meta-transcriptome data analysis pipeline with integrated automated workflows. metaTP:集成自动化工作流程的元转录组数据分析管道。
IF 2.9 3区 生物学
BMC Bioinformatics Pub Date : 2025-04-26 DOI: 10.1186/s12859-025-06137-w
Limuxuan He, Quan Zou, Yansu Wang
{"title":"metaTP: a meta-transcriptome data analysis pipeline with integrated automated workflows.","authors":"Limuxuan He, Quan Zou, Yansu Wang","doi":"10.1186/s12859-025-06137-w","DOIUrl":"https://doi.org/10.1186/s12859-025-06137-w","url":null,"abstract":"<p><strong>Background: </strong>The accessibility of sequencing technologies has enabled meta-transcriptomic studies to provide a deeper understanding of microbial ecology at the transcriptional level. Analyzing omics data involves multiple steps that require the use of various bioinformatics tools. With the increasing availability of public microbiome datasets, conducting meta-analyses can reveal new insights into microbiome activity. However, the reproducibility of data is often compromised due to variations in processing methods for sample omics data. Therefore, it is essential to develop efficient analytical workflows that ensure repeatability, reproducibility, and the traceability of results in microbiome research.</p><p><strong>Results: </strong>We developed metaTP, a pipeline that integrates bioinformatics tools for analyzing meta-transcriptomic data comprehensively. The pipeline includes quality control, non-coding RNA removal, transcript expression quantification, differential gene expression analysis, functional annotation, and co-expression network analysis. To quantify mRNA expression, we rely on reference indexes built using protein-coding sequences, which help overcome the limitations of database analysis. Additionally, metaTP provides a function for calculating the topological properties of gene co-expression networks, offering an intuitive explanation for correlated gene sets in high-dimensional datasets. The use of metaTP is anticipated to support researchers in addressing microbiota-related biological inquiries and improving the accessibility and interpretation of microbiota RNA-Seq data.</p><p><strong>Conclusions: </strong>We have created a conda package to integrate the tools into our pipeline, making it a flexible and versatile tool for handling meta-transcriptomic sequencing data. The metaTP pipeline is freely available at: https://github.com/nanbei45/metaTP .</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"111"},"PeriodicalIF":2.9,"publicationDate":"2025-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12034179/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143965784","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HPOseq: a deep ensemble model for predicting the protein-phenotype relationships based on protein sequences. HPOseq:基于蛋白质序列预测蛋白质-表型关系的深度集成模型。
IF 2.9 3区 生物学
BMC Bioinformatics Pub Date : 2025-04-22 DOI: 10.1186/s12859-025-06122-3
Kai Zhao, Zhuocheng Ji, Linlin Zhang, Na Quan, Yuheng Li, Guanglei Yu, Xuehua Bi
{"title":"HPOseq: a deep ensemble model for predicting the protein-phenotype relationships based on protein sequences.","authors":"Kai Zhao, Zhuocheng Ji, Linlin Zhang, Na Quan, Yuheng Li, Guanglei Yu, Xuehua Bi","doi":"10.1186/s12859-025-06122-3","DOIUrl":"https://doi.org/10.1186/s12859-025-06122-3","url":null,"abstract":"<p><strong>Background: </strong>Understanding the relationships between proteins and specific disease phenotypes contributes to the early detection of diseases and advances the development of personalized medicine. The acquisition of a large amount of proteomics data has facilitated this process. To improve discovery efficiency and reduce the time and financial costs associated with biological experiments, various computational methods have yielded promising results. However, the lack of rich and reliable protein-related information still presents challenges in this process.</p><p><strong>Results: </strong>In this paper, we propose an ensemble prediction model, named HPOseq, which predicts human protein-phenotype relationships based only on sequence information. HPOseq establishes two base models to achieve objectives. One directly extracts internal information from amino acid sequences as protein features to predict the associated phenotypes. The other builds a protein-protein network based on sequence similarity, extracting information between proteins for phenotype prediction. Ultimately, an ensemble module is employed to integrate the predictions from both base models, resulting in the final prediction.</p><p><strong>Conclusion: </strong>The results of 5-fold cross-validation reveal that HPOseq outperforms seven baseline methods for predicting protein-phenotype relationships. Moreover, we conduct case studies from the points of phenotype annotation and protein analysis to verify the practical significance of HPOseq.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"110"},"PeriodicalIF":2.9,"publicationDate":"2025-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12013097/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143962799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
metacp: a versatile software package for combining dependent or independent p-values. Metacp:用于组合相关或独立p值的通用软件包。
IF 2.9 3区 生物学
BMC Bioinformatics Pub Date : 2025-04-19 DOI: 10.1186/s12859-025-06126-z
Evgenia K Nikolitsa, Panagiota I Kontou, Pantelis G Bagos
{"title":"metacp: a versatile software package for combining dependent or independent p-values.","authors":"Evgenia K Nikolitsa, Panagiota I Kontou, Pantelis G Bagos","doi":"10.1186/s12859-025-06126-z","DOIUrl":"https://doi.org/10.1186/s12859-025-06126-z","url":null,"abstract":"<p><strong>Background: </strong>We present metacp an open-source software package which implements an abundance of statistical methods for the combination of both independent p-values, with methods such as Fisher's, Stouffer's and Edgington's, and dependent p-values, with methods such as Brown's method and the Cauchy Combination Test.</p><p><strong>Results: </strong>The tool is available in Python and STATA, it is very fast, and it is easy to use, requiring only minimal input. It offers a useful resource for combining both independent and dependent p-values, responding to diverse analytical needs for practitioners performing meta-analyses and bioinformaticians developing tools for a variety of applications. Depending on the input data it can be used for gene-based testing, for analysis of multiple traits in GWAS, or for combining diverse multi-omics data such as those of a TWAS, a colocalization or an RNA-seq study.</p><p><strong>Conclusions: </strong>Compared to other similar packages (like poolr or metap), metacp implements the largest collection of statistical methods for this problem, offering users the flexibility to choose from a wide variety of approaches. Being available both as a standalone Python tool and as a STATA command, metacp is accessible to a broad and diverse audience, including practitioners conducting meta-analyses across various fields and bioinformaticians developing new tools where p-value combination is a crucial component.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"109"},"PeriodicalIF":2.9,"publicationDate":"2025-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12008841/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143961035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Correction: MethylSeqLogo: DNA methylation smart sequence logos. 更正:MethylSeqLogo: DNA甲基化智能序列标识。
IF 2.9 3区 生物学
BMC Bioinformatics Pub Date : 2025-04-18 DOI: 10.1186/s12859-025-06124-1
Fei-Man Hsu, Paul Horton
{"title":"Correction: MethylSeqLogo: DNA methylation smart sequence logos.","authors":"Fei-Man Hsu, Paul Horton","doi":"10.1186/s12859-025-06124-1","DOIUrl":"https://doi.org/10.1186/s12859-025-06124-1","url":null,"abstract":"","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"25 Suppl 2","pages":"394"},"PeriodicalIF":2.9,"publicationDate":"2025-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12008853/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143967520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GRLGRN: graph representation-based learning to infer gene regulatory networks from single-cell RNA-seq data. GRLGRN:基于图表示的学习,从单细胞RNA-seq数据推断基因调控网络。
IF 2.9 3区 生物学
BMC Bioinformatics Pub Date : 2025-04-18 DOI: 10.1186/s12859-025-06116-1
Kai Wang, Yulong Li, Fei Liu, Xiaoli Luan, Xinglong Wang, Jingwen Zhou
{"title":"GRLGRN: graph representation-based learning to infer gene regulatory networks from single-cell RNA-seq data.","authors":"Kai Wang, Yulong Li, Fei Liu, Xiaoli Luan, Xinglong Wang, Jingwen Zhou","doi":"10.1186/s12859-025-06116-1","DOIUrl":"https://doi.org/10.1186/s12859-025-06116-1","url":null,"abstract":"<p><strong>Background: </strong>A gene regulatory network (GRN) is a graph-level representation that describes the regulatory relationships between transcription factors and target genes in cells. The reconstruction of GRNs can help investigate cellular dynamics, drug design, and metabolic systems, and the rapid development of single-cell RNA sequencing (scRNA-seq) technology provides important opportunities while posing significant challenges for reconstructing GRNs. A number of methods for inferring GRNs have been proposed in recent years based on traditional machine learning and deep learning algorithms. However, inferring the GRN from scRNA-seq data remains challenging owing to cellular heterogeneity, measurement noise, and data dropout.</p><p><strong>Results: </strong>In this study, we propose a deep learning model called graph representational learning GRN (GRLGRN) to infer the latent regulatory dependencies between genes based on a prior GRN and data on the profiles of single-cell gene expressions. GRLGRN uses a graph transformer network to extract implicit links from the prior GRN, and encodes the features of genes by using both an adjacency matrix of implicit links and a matrix of the profile of gene expression. Moreover, it uses attention mechanisms to improve feature extraction, and feeds the refined gene embeddings into an output module to infer gene regulatory relationships. To evaluate the performance of GRLGRN, we compared it with prevalent models and performed ablation experiments on seven cell-line datasets with three ground-truth networks. The results showed that GRLGRN achieved the best predictions in AUROC and AUPRC on 78.6% and 80.9% of the datasets, and achieved an average improvement of 7.3% in AUROC and 30.7% in AUPRC. The interpretation discussion and the network visualization were conducted.</p><p><strong>Conclusions: </strong>The experimental results and case studies illustrate the considerable performance of GRLGRN in predicting gene interactions and provide interpretability for the prediction tasks, such as identifying hub genes in the network and uncovering implicit links.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"108"},"PeriodicalIF":2.9,"publicationDate":"2025-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12008888/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143958320","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信