{"title":"PtWAVE: a high-sensitive deconvolution software of sequencing trace for the detection of large indels in genome editing.","authors":"Kazuki Nakamae, Saya Ide, Nagaki Ohnuki, Yoshiko Nakagawa, Keisuke Okuhara, Hidemasa Bono","doi":"10.1186/s12859-025-06139-8","DOIUrl":"https://doi.org/10.1186/s12859-025-06139-8","url":null,"abstract":"<p><strong>Background: </strong>Tracking of Insertions and DEletions (TIDE) analysis, which computationally deconvolves capillary sequencing data derived from the DNA of bulk or clonal cell populations to estimate the efficiency of targeted mutagenesis by programmable nucleases, has played a significant role in the field of genome editing. However, the detection range covered by conventional TIDE analysis is limited. Range extension for deconvolution is required to detect larger deletions and insertions (indels) derived from genome editing in TIDE analysis. However, extending the deconvolution range introduces uncertainty into the deconvolution process. Moreover, the accuracy and sensitivity of TIDE analysis tools for large deletions (> 50 bp) remain poorly understood.</p><p><strong>Results: </strong>In this study, we introduced a new software called PtWAVE that can detect a wide range of indel sizes, up to 200 bp. PtWAVE also offers options for variable selection and fitting algorithms to prevent uncertainties in the model. We evaluated the performance of PtWAVE by using in vitro capillary sequencing data that mimicked DNA sequencing, including large deletions. Furthermore, we confirmed that PtWAVE can stably analyze trace sequencing data derived from actual genome-edited samples.</p><p><strong>Conclusions: </strong>PtWAVE demonstrated superior accuracy and sensitivity compared to the existing TIDE analysis tools for DNA samples, including large deletions. PtWAVE can accelerate genome editing applications in organisms and cell types in which large deletions often occur when programmable nucleases are applied.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"114"},"PeriodicalIF":2.9,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12039204/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143962890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jun Xie, Youli Zhang, Ziyang Wang, Xiaocheng Jin, Xiaoli Lu, Shengxiang Ge, Xiaoping Min
{"title":"PPI-Graphomer: enhanced protein-protein affinity prediction using pretrained and graph transformer models.","authors":"Jun Xie, Youli Zhang, Ziyang Wang, Xiaocheng Jin, Xiaoli Lu, Shengxiang Ge, Xiaoping Min","doi":"10.1186/s12859-025-06123-2","DOIUrl":"https://doi.org/10.1186/s12859-025-06123-2","url":null,"abstract":"<p><p>Protein-protein interactions (PPIs) refer to the phenomenon of protein binding through various types of bonds to execute biological functions. These interactions are critical for understanding biological mechanisms and drug research. Among these, the protein binding interface is a critical region involved in protein-protein interactions, particularly the hotspot residues on it that play a key role in protein interactions. Current deep learning methods trained on large-scale data can characterize proteins to a certain extent, but they often struggle to adequately capture information about protein binding interfaces. To address this limitation, we propose the PPI-Graphomer module, which integrates pretrained features from large-scale language models and inverse folding models. This approach enhances the characterization of protein binding interfaces by defining edge relationships and interface masks on the basis of molecular interaction information. Our model outperforms existing methods across multiple benchmark datasets and demonstrates strong generalization capabilities.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"116"},"PeriodicalIF":2.9,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12042501/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143961583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A framework for predictive modeling of microbiome multi-omics data: latent interacting variable-effects (LIVE) modeling.","authors":"Javier Munoz Briones, Douglas K Brubaker","doi":"10.1186/s12859-025-06134-z","DOIUrl":"https://doi.org/10.1186/s12859-025-06134-z","url":null,"abstract":"<p><strong>Background: </strong>The number and size of multi-omics datasets with paired measurements of the host and microbiome is rapidly increasing with the advance of sequencing technologies. As it becomes routine to generate these datasets, computational methods to aid in their interpretation become increasingly important. Here, we present a framework for integration of microbiome multi-omics data: Latent Interacting Variable Effects (LIVE) modeling. LIVE integrates multi-omics data using single-omic latent variables (LV) organized in a structured meta-model to determine the combinations of features most predictive of a phenotype or condition.</p><p><strong>Results: </strong>We developed a supervised version of LIVE leveraging sparse Partial Least Squares Discriminant Analysis (sPLS-DA) LVs, and an unsupervised version leveraging sparse Principal Component Analysis (sPCA) principal components which both can incorporate covariate awarness. LIVE performance was tested on publicly available metagenomic and metabolomics data set from Crohn's Disease (CD) and Ulcerative Colitis (UC) status patients in the PRISM and LLDeep cohorts, and benchmarked against existing gut microbiome multi-omics approaches and vaginal microbiome datasests, achieving consistent and comparable performances. In addition to these benchmarking efforts, we present a detailed analysis and interpretation of both versions of LIVE using the PRISM and LLDeep cohorts. LIVE reduced the number of feature interactions from the original datasets for CD and UC from millions to less than 20,000 while conditioning the disease-predictive power of gut microbes, metabolites, enzymes, on clinical variables.</p><p><strong>Conclusions: </strong>LIVE makes a distinct, complementary contribution to current methods to integrate microbiome data and offers key advantages to existing approaches in the interpretable integration of multi-omics data with clinical variables to predict to disease outcomes and identify microbiome mechanisms of disease.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"115"},"PeriodicalIF":2.9,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12042529/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143953228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Franco Liberati, Taiel Maximiliano Pose Marino, Paolo Bottoni, Daniele Canestrelli, Tiziana Castrignanò
{"title":"HPC-T-Assembly: a pipeline for de novo transcriptome assembly of large multi-specie datasets.","authors":"Franco Liberati, Taiel Maximiliano Pose Marino, Paolo Bottoni, Daniele Canestrelli, Tiziana Castrignanò","doi":"10.1186/s12859-025-06121-4","DOIUrl":"https://doi.org/10.1186/s12859-025-06121-4","url":null,"abstract":"<p><strong>Background: </strong>Recent years have seen a substantial increase in RNA-seq data production, with this technique becoming the primary approach for gene expression studies across a wide range of non-model organisms. The majority of these organisms lack a well-annotated reference genome to serve as a basis for studying differentially expressed genes (DEGs). As an alternative cost-effective protocol to using a reference genome, the assembly of RNA-seq raw reads is performed to produce what is referred to as a 'de novo transcriptome,' serving as a reference for subsequent DEGs' analysis. This assembly step for conventional DEGs analysis pipelines for non-model organisms is a computationally expensive task. Furthermore, the complexity of the de novo transcriptome assembly workflows poses a challenge for researchers in implementing best-practice techniques and the most recent software versions, particularly when applied to various organisms of interest.</p><p><strong>Results: </strong>To address computational challenges in transcriptomic analyses of non-model organisms, we present HPC-T-Assembly, a tool for de novo transcriptome assembly from RNA-seq data on high-performance computing (HPC) infrastructures. It is designed for straightforward setup via a Web-oriented interface, allowing analysis configuration for several species. Once configuration data is provided, the entire parallel computing software for assembly is automatically generated and can be launched on a supercomputer with a simple command line. Intermediate and final outputs of the assembly pipeline include additional post-processing steps, such as assembly quality control, ORF prediction, and transcript count matrix construction.</p><p><strong>Conclusion: </strong>HPC-T-Assembly allows users, through a user-friendly Web-oriented interface, to configure a run for simultaneous assemblies of RNA-seq data from multiple species. The parallel pipeline, launched on HPC infrastructures, significantly reduces computational load and execution times, enabling large-scale transcriptomic and meta-transcriptomics analysis projects.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"113"},"PeriodicalIF":2.9,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12039220/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143974207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xingjian Gao, Jiarui Li, Xinxuan Liu, Qianqian Peng, Han Jing, Sibte Hadi, Andrew E Teschendorff, Sijia Wang, Fan Liu
{"title":"FastQTLmapping: an ultra-fast and memory efficient package for mQTL-like analysis.","authors":"Xingjian Gao, Jiarui Li, Xinxuan Liu, Qianqian Peng, Han Jing, Sibte Hadi, Andrew E Teschendorff, Sijia Wang, Fan Liu","doi":"10.1186/s12859-025-06130-3","DOIUrl":"https://doi.org/10.1186/s12859-025-06130-3","url":null,"abstract":"<p><strong>Background: </strong>FastQTLmapping addresses the need for an ultra-fast and memory-efficient solver capable of handling exhaustive multiple regression analysis with a vast number of dependent and explanatory variables, including covariates. This challenge is especially pronounced in methylation quantitative trait loci (mQTL)-like analysis, which typically involves high-dimensional genetic and epigenetic data.</p><p><strong>Results: </strong>FastQTLmapping is a precompiled C++ software solution accelerated by Intel MKL and GSL, freely available at https://github.com/Fun-Gene/fastQTLmapping . Compared to state-of-the-art methods (MatrixEQTL, FastQTL, and TensorQTL), fastQTLmapping demonstrated an order of magnitude speed improvement, coupled with a marked reduction in peak memory usage. In a large dataset consisting of 3500 individuals, 8 million SNPs, 0.8 million CpGs, and 20 covariates, fastQTLmapping completed the entire mQTL analysis in 4.5 h with only 13.1 GB peak memory usage.</p><p><strong>Conclusions: </strong>FastQTLmapping effectively expedites comprehensive mQTL analyses by providing a robust and generic approach that accommodates large-scale genomic datasets with covariates. This solution has the potential to streamline mQTL-like studies and inform future method development for efficient computational genomics.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"112"},"PeriodicalIF":2.9,"publicationDate":"2025-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12036243/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143953063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"metaTP: a meta-transcriptome data analysis pipeline with integrated automated workflows.","authors":"Limuxuan He, Quan Zou, Yansu Wang","doi":"10.1186/s12859-025-06137-w","DOIUrl":"https://doi.org/10.1186/s12859-025-06137-w","url":null,"abstract":"<p><strong>Background: </strong>The accessibility of sequencing technologies has enabled meta-transcriptomic studies to provide a deeper understanding of microbial ecology at the transcriptional level. Analyzing omics data involves multiple steps that require the use of various bioinformatics tools. With the increasing availability of public microbiome datasets, conducting meta-analyses can reveal new insights into microbiome activity. However, the reproducibility of data is often compromised due to variations in processing methods for sample omics data. Therefore, it is essential to develop efficient analytical workflows that ensure repeatability, reproducibility, and the traceability of results in microbiome research.</p><p><strong>Results: </strong>We developed metaTP, a pipeline that integrates bioinformatics tools for analyzing meta-transcriptomic data comprehensively. The pipeline includes quality control, non-coding RNA removal, transcript expression quantification, differential gene expression analysis, functional annotation, and co-expression network analysis. To quantify mRNA expression, we rely on reference indexes built using protein-coding sequences, which help overcome the limitations of database analysis. Additionally, metaTP provides a function for calculating the topological properties of gene co-expression networks, offering an intuitive explanation for correlated gene sets in high-dimensional datasets. The use of metaTP is anticipated to support researchers in addressing microbiota-related biological inquiries and improving the accessibility and interpretation of microbiota RNA-Seq data.</p><p><strong>Conclusions: </strong>We have created a conda package to integrate the tools into our pipeline, making it a flexible and versatile tool for handling meta-transcriptomic sequencing data. The metaTP pipeline is freely available at: https://github.com/nanbei45/metaTP .</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"111"},"PeriodicalIF":2.9,"publicationDate":"2025-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12034179/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143965784","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kai Zhao, Zhuocheng Ji, Linlin Zhang, Na Quan, Yuheng Li, Guanglei Yu, Xuehua Bi
{"title":"HPOseq: a deep ensemble model for predicting the protein-phenotype relationships based on protein sequences.","authors":"Kai Zhao, Zhuocheng Ji, Linlin Zhang, Na Quan, Yuheng Li, Guanglei Yu, Xuehua Bi","doi":"10.1186/s12859-025-06122-3","DOIUrl":"https://doi.org/10.1186/s12859-025-06122-3","url":null,"abstract":"<p><strong>Background: </strong>Understanding the relationships between proteins and specific disease phenotypes contributes to the early detection of diseases and advances the development of personalized medicine. The acquisition of a large amount of proteomics data has facilitated this process. To improve discovery efficiency and reduce the time and financial costs associated with biological experiments, various computational methods have yielded promising results. However, the lack of rich and reliable protein-related information still presents challenges in this process.</p><p><strong>Results: </strong>In this paper, we propose an ensemble prediction model, named HPOseq, which predicts human protein-phenotype relationships based only on sequence information. HPOseq establishes two base models to achieve objectives. One directly extracts internal information from amino acid sequences as protein features to predict the associated phenotypes. The other builds a protein-protein network based on sequence similarity, extracting information between proteins for phenotype prediction. Ultimately, an ensemble module is employed to integrate the predictions from both base models, resulting in the final prediction.</p><p><strong>Conclusion: </strong>The results of 5-fold cross-validation reveal that HPOseq outperforms seven baseline methods for predicting protein-phenotype relationships. Moreover, we conduct case studies from the points of phenotype annotation and protein analysis to verify the practical significance of HPOseq.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"110"},"PeriodicalIF":2.9,"publicationDate":"2025-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12013097/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143962799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Evgenia K Nikolitsa, Panagiota I Kontou, Pantelis G Bagos
{"title":"metacp: a versatile software package for combining dependent or independent p-values.","authors":"Evgenia K Nikolitsa, Panagiota I Kontou, Pantelis G Bagos","doi":"10.1186/s12859-025-06126-z","DOIUrl":"https://doi.org/10.1186/s12859-025-06126-z","url":null,"abstract":"<p><strong>Background: </strong>We present metacp an open-source software package which implements an abundance of statistical methods for the combination of both independent p-values, with methods such as Fisher's, Stouffer's and Edgington's, and dependent p-values, with methods such as Brown's method and the Cauchy Combination Test.</p><p><strong>Results: </strong>The tool is available in Python and STATA, it is very fast, and it is easy to use, requiring only minimal input. It offers a useful resource for combining both independent and dependent p-values, responding to diverse analytical needs for practitioners performing meta-analyses and bioinformaticians developing tools for a variety of applications. Depending on the input data it can be used for gene-based testing, for analysis of multiple traits in GWAS, or for combining diverse multi-omics data such as those of a TWAS, a colocalization or an RNA-seq study.</p><p><strong>Conclusions: </strong>Compared to other similar packages (like poolr or metap), metacp implements the largest collection of statistical methods for this problem, offering users the flexibility to choose from a wide variety of approaches. Being available both as a standalone Python tool and as a STATA command, metacp is accessible to a broad and diverse audience, including practitioners conducting meta-analyses across various fields and bioinformaticians developing new tools where p-value combination is a crucial component.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"109"},"PeriodicalIF":2.9,"publicationDate":"2025-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12008841/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143961035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"GRLGRN: graph representation-based learning to infer gene regulatory networks from single-cell RNA-seq data.","authors":"Kai Wang, Yulong Li, Fei Liu, Xiaoli Luan, Xinglong Wang, Jingwen Zhou","doi":"10.1186/s12859-025-06116-1","DOIUrl":"https://doi.org/10.1186/s12859-025-06116-1","url":null,"abstract":"<p><strong>Background: </strong>A gene regulatory network (GRN) is a graph-level representation that describes the regulatory relationships between transcription factors and target genes in cells. The reconstruction of GRNs can help investigate cellular dynamics, drug design, and metabolic systems, and the rapid development of single-cell RNA sequencing (scRNA-seq) technology provides important opportunities while posing significant challenges for reconstructing GRNs. A number of methods for inferring GRNs have been proposed in recent years based on traditional machine learning and deep learning algorithms. However, inferring the GRN from scRNA-seq data remains challenging owing to cellular heterogeneity, measurement noise, and data dropout.</p><p><strong>Results: </strong>In this study, we propose a deep learning model called graph representational learning GRN (GRLGRN) to infer the latent regulatory dependencies between genes based on a prior GRN and data on the profiles of single-cell gene expressions. GRLGRN uses a graph transformer network to extract implicit links from the prior GRN, and encodes the features of genes by using both an adjacency matrix of implicit links and a matrix of the profile of gene expression. Moreover, it uses attention mechanisms to improve feature extraction, and feeds the refined gene embeddings into an output module to infer gene regulatory relationships. To evaluate the performance of GRLGRN, we compared it with prevalent models and performed ablation experiments on seven cell-line datasets with three ground-truth networks. The results showed that GRLGRN achieved the best predictions in AUROC and AUPRC on 78.6% and 80.9% of the datasets, and achieved an average improvement of 7.3% in AUROC and 30.7% in AUPRC. The interpretation discussion and the network visualization were conducted.</p><p><strong>Conclusions: </strong>The experimental results and case studies illustrate the considerable performance of GRLGRN in predicting gene interactions and provide interpretability for the prediction tasks, such as identifying hub genes in the network and uncovering implicit links.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"108"},"PeriodicalIF":2.9,"publicationDate":"2025-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12008888/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143958320","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}