{"title":"M-DeepAssembly: enhanced DeepAssembly based on multi-objective multi-domain protein conformation sampling.","authors":"Xinyue Cui, Yuhao Xia, Minghua Hou, Xuanfeng Zhao, Suhui Wang, Guijun Zhang","doi":"10.1186/s12859-025-06131-2","DOIUrl":"https://doi.org/10.1186/s12859-025-06131-2","url":null,"abstract":"<p><strong>Background: </strong>Association and cooperation among structural domains play an important role in protein function and drug design. Despite remarkable advancements in highly accurate single-domain protein structure prediction through the collaborative efforts of the community using deep learning, challenges still exist in predicting multi-domain protein structures when the evolutionary signal for a given domain pair is weak or the protein structure is large.</p><p><strong>Results: </strong>To alleviate the above challenges, we proposed M-DeepAssembly, a protocol based on multi-objective protein conformation sampling algorithm for multi-domain protein structure prediction. Firstly, the inter-domain interactions and full-length sequence distance features are extracted through DeepAssembly and AlphaFold2, respectively. Secondly, subject to these features, we constructed a multi-objective energy model and designed a sampling algorithm for exploring and exploiting conformational space to generate ensembles. Finally, the output protein structure was selected from the ensembles using our in-house developed model quality assessment algorithm. On the test set of 164 multi-domain proteins, the results show that the average TM-score of M-DeepAssembly is 15.4% and 2.0% higher than AlphaFold2 and DeepAssembly, respectively. It is worth noting that there are models with higher accuracy in ensembles, achieving an improvement of 20.3% and 6.4% relative to the two baseline methods, although these models were not selected. Furthermore, when compared to the prediction results of AlphaFold2 for CASP15 multi-domain targets, M-DeepAssembly demonstrates certain performance advantages.</p><p><strong>Conclusions: </strong>M-DeepAssembly provides a distinctive multi-domain protein assembly algorithm, which can alleviate the current challenges of weak evolutionary signals and large structures to some extent by forming diverse ensembles using multi-objective protein conformation sampling algorithm. The proposed method contributes to exploring the functions of multi-domain proteins, especially providing new insights into targets with multiple conformational states.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"120"},"PeriodicalIF":2.9,"publicationDate":"2025-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12054043/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143964644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"PathNetDRP: a novel biomarker discovery framework using pathway and protein-protein interaction networks for immune checkpoint inhibitor response prediction.","authors":"Dohee Lee, Jaegyoon Ahn, Jonghwan Choi","doi":"10.1186/s12859-025-06125-0","DOIUrl":"https://doi.org/10.1186/s12859-025-06125-0","url":null,"abstract":"<p><strong>Background: </strong>Predicting immune checkpoint inhibitor (ICI) response remains a significant challenge in cancer immunotherapy. Many existing approaches rely on differential gene expression analysis or predefined immune signatures, which may fail to capture the complex regulatory mechanisms underlying immune response. Network-based models attempt to integrate biological interactions, but they often lack a quantitative framework to assess how individual genes contribute within pathways, limiting the specificity and interpretability of biomarkers. Given these limitations, we developed PathNetDRP, a framework that integrates biological pathways, protein-protein interaction networks, and machine learning to identify functionally relevant biomarkers for ICI response prediction.</p><p><strong>Results: </strong>We introduce PathNetDRP, a novel biomarker discovery approach that applies the PageRank algorithm to prioritize ICI-associated genes, maps them to relevant biological pathways, and calculates PathNetGene scores to quantify their contribution to immune response. Unlike conventional methods that focus solely on gene expression differences, PathNetDRP systematically incorporates biological context to improve biomarker selection. Validation across multiple independent cancer cohorts showed that PathNetDRP achieved strong predictive performance, with cross-validation the area under the receiver operating characteristic curves increasing from 0.780 to 0.940. Interestingly, PathNetDRP did not merely improve predictive accuracy; it also provided insights into key immune-related pathways, reinforcing its potential for identifying clinically relevant biomarkers.</p><p><strong>Conclusion: </strong>The biomarkers identified by PathNetDRP demonstrated robust predictive performance across cross-validation and independent validation datasets, suggesting their potential utility in clinical applications. Furthermore, enrichment analysis highlighted key immune-related pathways, providing a deeper understanding of their role in ICI response regulation. While these findings underscore the promise of PathNetDRP, future work will explore the integration of additional predictive features, such as tumor mutational burden and microsatellite instability, to further refine its applicability.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"119"},"PeriodicalIF":2.9,"publicationDate":"2025-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12051301/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143956916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zeyu Xia, Canqun Yang, Chenchen Peng, Yifei Guo, Yufei Guo, Tao Tang, Yingbo Cui
{"title":"Fast noisy long read alignment with multi-level parallelism.","authors":"Zeyu Xia, Canqun Yang, Chenchen Peng, Yifei Guo, Yufei Guo, Tao Tang, Yingbo Cui","doi":"10.1186/s12859-025-06129-w","DOIUrl":"https://doi.org/10.1186/s12859-025-06129-w","url":null,"abstract":"<p><strong>Background: </strong>The advent of Single Molecule Real-Time (SMRT) sequencing has overcome many limitations of second-generation sequencing, such as limited read lengths, PCR amplification biases. However, longer reads increase data volume exponentially and high error rates make many existing alignment tools inapplicable. Additionally, a single CPU's performance bottleneck restricts the effectiveness of alignment algorithms for SMRT sequencing.</p><p><strong>Results: </strong>To address these challenges, we introduce ParaHAT, a parallel alignment algorithm for noisy long reads. ParaHAT utilizes vector-level, thread-level, process-level, and heterogeneous parallelism. We redesign the dynamic programming matrices layouts to eliminate data dependency in the base-level alignment, enabling effective vectorization. We further enhance computational speed through heterogeneous parallel technology and implement the algorithm for multi-node computing using MPI, overcoming the computational limits of a single node.</p><p><strong>Conclusions: </strong>Performance evaluations show that ParaHAT got a 10.03x speedup in base-level alignment, with a parallel acceleration ratio and weak scalability metric of 94.61 and 98.98% on 128 nodes, respectively.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"118"},"PeriodicalIF":2.9,"publicationDate":"2025-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12049014/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143962795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"M3S-GRPred: a novel ensemble learning approach for the interpretable prediction of glucocorticoid receptor antagonists using a multi-step stacking strategy.","authors":"Nalini Schaduangrat, Hathaichanok Chuntakaruk, Thanyada Rungrotmongkol, Pakpoom Mookdarsanit, Watshara Shoombuatong","doi":"10.1186/s12859-025-06132-1","DOIUrl":"https://doi.org/10.1186/s12859-025-06132-1","url":null,"abstract":"<p><p>Accelerating drug discovery for glucocorticoid receptor (GR)-related disorders, including innovative machine learning (ML)-based approaches, holds promise in advancing therapeutic development, optimizing treatment efficacy, and mitigating adverse effects. While experimental methods can accurately identify GR antagonists, they are often not cost-effective for large-scale drug discovery. Thus, computational approaches leveraging SMILES information for precise in silico identification of GR antagonists are crucial, enabling efficient and scalable drug discovery. Here, we develop a new ensemble learning approach using a multi-step stacking strategy (M3S), termed M3S-GRPred, aimed at rapidly and accurately discovering novel GR antagonists. To the best of our knowledge, M3S-GRPred is the first SMILES-based predictor designed to identify GR antagonists without the use of 3D structural information. In M3S-GRPred, we first constructed different balanced subsets using an under-sampling approach. Using these balanced subsets, we explored and evaluated heterogeneous base-classifiers trained with a variety of SMILES-based feature descriptors coupled with popular ML algorithms. Finally, M3S-GRPred was constructed by integrating probabilistic feature from the selected base-classifiers derived from a two-step feature selection technique. Our comparative experiments demonstrate that M3S-GRPred can precisely identify GR antagonists and effectively address the imbalanced dataset. Compared to traditional ML classifiers, M3S-GRPred attained superior performance in terms of both the training and independent test datasets. Additionally, M3S-GRPred was applied to identify potential GR antagonists among FDA-approved drugs confirmed through molecular docking, followed by detailed MD simulation studies for drug repurposing in Cushing's syndrome. We anticipate that M3S-GRPred will serve as an efficient screening tool for discovering novel GR antagonists from vast libraries of unknown compounds in a cost-effective manner.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"117"},"PeriodicalIF":2.9,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12044944/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143958228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"PtWAVE: a high-sensitive deconvolution software of sequencing trace for the detection of large indels in genome editing.","authors":"Kazuki Nakamae, Saya Ide, Nagaki Ohnuki, Yoshiko Nakagawa, Keisuke Okuhara, Hidemasa Bono","doi":"10.1186/s12859-025-06139-8","DOIUrl":"https://doi.org/10.1186/s12859-025-06139-8","url":null,"abstract":"<p><strong>Background: </strong>Tracking of Insertions and DEletions (TIDE) analysis, which computationally deconvolves capillary sequencing data derived from the DNA of bulk or clonal cell populations to estimate the efficiency of targeted mutagenesis by programmable nucleases, has played a significant role in the field of genome editing. However, the detection range covered by conventional TIDE analysis is limited. Range extension for deconvolution is required to detect larger deletions and insertions (indels) derived from genome editing in TIDE analysis. However, extending the deconvolution range introduces uncertainty into the deconvolution process. Moreover, the accuracy and sensitivity of TIDE analysis tools for large deletions (> 50 bp) remain poorly understood.</p><p><strong>Results: </strong>In this study, we introduced a new software called PtWAVE that can detect a wide range of indel sizes, up to 200 bp. PtWAVE also offers options for variable selection and fitting algorithms to prevent uncertainties in the model. We evaluated the performance of PtWAVE by using in vitro capillary sequencing data that mimicked DNA sequencing, including large deletions. Furthermore, we confirmed that PtWAVE can stably analyze trace sequencing data derived from actual genome-edited samples.</p><p><strong>Conclusions: </strong>PtWAVE demonstrated superior accuracy and sensitivity compared to the existing TIDE analysis tools for DNA samples, including large deletions. PtWAVE can accelerate genome editing applications in organisms and cell types in which large deletions often occur when programmable nucleases are applied.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"114"},"PeriodicalIF":2.9,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12039204/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143962890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jun Xie, Youli Zhang, Ziyang Wang, Xiaocheng Jin, Xiaoli Lu, Shengxiang Ge, Xiaoping Min
{"title":"PPI-Graphomer: enhanced protein-protein affinity prediction using pretrained and graph transformer models.","authors":"Jun Xie, Youli Zhang, Ziyang Wang, Xiaocheng Jin, Xiaoli Lu, Shengxiang Ge, Xiaoping Min","doi":"10.1186/s12859-025-06123-2","DOIUrl":"https://doi.org/10.1186/s12859-025-06123-2","url":null,"abstract":"<p><p>Protein-protein interactions (PPIs) refer to the phenomenon of protein binding through various types of bonds to execute biological functions. These interactions are critical for understanding biological mechanisms and drug research. Among these, the protein binding interface is a critical region involved in protein-protein interactions, particularly the hotspot residues on it that play a key role in protein interactions. Current deep learning methods trained on large-scale data can characterize proteins to a certain extent, but they often struggle to adequately capture information about protein binding interfaces. To address this limitation, we propose the PPI-Graphomer module, which integrates pretrained features from large-scale language models and inverse folding models. This approach enhances the characterization of protein binding interfaces by defining edge relationships and interface masks on the basis of molecular interaction information. Our model outperforms existing methods across multiple benchmark datasets and demonstrates strong generalization capabilities.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"116"},"PeriodicalIF":2.9,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12042501/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143961583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A framework for predictive modeling of microbiome multi-omics data: latent interacting variable-effects (LIVE) modeling.","authors":"Javier Munoz Briones, Douglas K Brubaker","doi":"10.1186/s12859-025-06134-z","DOIUrl":"https://doi.org/10.1186/s12859-025-06134-z","url":null,"abstract":"<p><strong>Background: </strong>The number and size of multi-omics datasets with paired measurements of the host and microbiome is rapidly increasing with the advance of sequencing technologies. As it becomes routine to generate these datasets, computational methods to aid in their interpretation become increasingly important. Here, we present a framework for integration of microbiome multi-omics data: Latent Interacting Variable Effects (LIVE) modeling. LIVE integrates multi-omics data using single-omic latent variables (LV) organized in a structured meta-model to determine the combinations of features most predictive of a phenotype or condition.</p><p><strong>Results: </strong>We developed a supervised version of LIVE leveraging sparse Partial Least Squares Discriminant Analysis (sPLS-DA) LVs, and an unsupervised version leveraging sparse Principal Component Analysis (sPCA) principal components which both can incorporate covariate awarness. LIVE performance was tested on publicly available metagenomic and metabolomics data set from Crohn's Disease (CD) and Ulcerative Colitis (UC) status patients in the PRISM and LLDeep cohorts, and benchmarked against existing gut microbiome multi-omics approaches and vaginal microbiome datasests, achieving consistent and comparable performances. In addition to these benchmarking efforts, we present a detailed analysis and interpretation of both versions of LIVE using the PRISM and LLDeep cohorts. LIVE reduced the number of feature interactions from the original datasets for CD and UC from millions to less than 20,000 while conditioning the disease-predictive power of gut microbes, metabolites, enzymes, on clinical variables.</p><p><strong>Conclusions: </strong>LIVE makes a distinct, complementary contribution to current methods to integrate microbiome data and offers key advantages to existing approaches in the interpretable integration of multi-omics data with clinical variables to predict to disease outcomes and identify microbiome mechanisms of disease.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"115"},"PeriodicalIF":2.9,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12042529/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143953228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Franco Liberati, Taiel Maximiliano Pose Marino, Paolo Bottoni, Daniele Canestrelli, Tiziana Castrignanò
{"title":"HPC-T-Assembly: a pipeline for de novo transcriptome assembly of large multi-specie datasets.","authors":"Franco Liberati, Taiel Maximiliano Pose Marino, Paolo Bottoni, Daniele Canestrelli, Tiziana Castrignanò","doi":"10.1186/s12859-025-06121-4","DOIUrl":"https://doi.org/10.1186/s12859-025-06121-4","url":null,"abstract":"<p><strong>Background: </strong>Recent years have seen a substantial increase in RNA-seq data production, with this technique becoming the primary approach for gene expression studies across a wide range of non-model organisms. The majority of these organisms lack a well-annotated reference genome to serve as a basis for studying differentially expressed genes (DEGs). As an alternative cost-effective protocol to using a reference genome, the assembly of RNA-seq raw reads is performed to produce what is referred to as a 'de novo transcriptome,' serving as a reference for subsequent DEGs' analysis. This assembly step for conventional DEGs analysis pipelines for non-model organisms is a computationally expensive task. Furthermore, the complexity of the de novo transcriptome assembly workflows poses a challenge for researchers in implementing best-practice techniques and the most recent software versions, particularly when applied to various organisms of interest.</p><p><strong>Results: </strong>To address computational challenges in transcriptomic analyses of non-model organisms, we present HPC-T-Assembly, a tool for de novo transcriptome assembly from RNA-seq data on high-performance computing (HPC) infrastructures. It is designed for straightforward setup via a Web-oriented interface, allowing analysis configuration for several species. Once configuration data is provided, the entire parallel computing software for assembly is automatically generated and can be launched on a supercomputer with a simple command line. Intermediate and final outputs of the assembly pipeline include additional post-processing steps, such as assembly quality control, ORF prediction, and transcript count matrix construction.</p><p><strong>Conclusion: </strong>HPC-T-Assembly allows users, through a user-friendly Web-oriented interface, to configure a run for simultaneous assemblies of RNA-seq data from multiple species. The parallel pipeline, launched on HPC infrastructures, significantly reduces computational load and execution times, enabling large-scale transcriptomic and meta-transcriptomics analysis projects.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"113"},"PeriodicalIF":2.9,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12039220/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143974207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xingjian Gao, Jiarui Li, Xinxuan Liu, Qianqian Peng, Han Jing, Sibte Hadi, Andrew E Teschendorff, Sijia Wang, Fan Liu
{"title":"FastQTLmapping: an ultra-fast and memory efficient package for mQTL-like analysis.","authors":"Xingjian Gao, Jiarui Li, Xinxuan Liu, Qianqian Peng, Han Jing, Sibte Hadi, Andrew E Teschendorff, Sijia Wang, Fan Liu","doi":"10.1186/s12859-025-06130-3","DOIUrl":"https://doi.org/10.1186/s12859-025-06130-3","url":null,"abstract":"<p><strong>Background: </strong>FastQTLmapping addresses the need for an ultra-fast and memory-efficient solver capable of handling exhaustive multiple regression analysis with a vast number of dependent and explanatory variables, including covariates. This challenge is especially pronounced in methylation quantitative trait loci (mQTL)-like analysis, which typically involves high-dimensional genetic and epigenetic data.</p><p><strong>Results: </strong>FastQTLmapping is a precompiled C++ software solution accelerated by Intel MKL and GSL, freely available at https://github.com/Fun-Gene/fastQTLmapping . Compared to state-of-the-art methods (MatrixEQTL, FastQTL, and TensorQTL), fastQTLmapping demonstrated an order of magnitude speed improvement, coupled with a marked reduction in peak memory usage. In a large dataset consisting of 3500 individuals, 8 million SNPs, 0.8 million CpGs, and 20 covariates, fastQTLmapping completed the entire mQTL analysis in 4.5 h with only 13.1 GB peak memory usage.</p><p><strong>Conclusions: </strong>FastQTLmapping effectively expedites comprehensive mQTL analyses by providing a robust and generic approach that accommodates large-scale genomic datasets with covariates. This solution has the potential to streamline mQTL-like studies and inform future method development for efficient computational genomics.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"112"},"PeriodicalIF":2.9,"publicationDate":"2025-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12036243/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143953063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"metaTP: a meta-transcriptome data analysis pipeline with integrated automated workflows.","authors":"Limuxuan He, Quan Zou, Yansu Wang","doi":"10.1186/s12859-025-06137-w","DOIUrl":"https://doi.org/10.1186/s12859-025-06137-w","url":null,"abstract":"<p><strong>Background: </strong>The accessibility of sequencing technologies has enabled meta-transcriptomic studies to provide a deeper understanding of microbial ecology at the transcriptional level. Analyzing omics data involves multiple steps that require the use of various bioinformatics tools. With the increasing availability of public microbiome datasets, conducting meta-analyses can reveal new insights into microbiome activity. However, the reproducibility of data is often compromised due to variations in processing methods for sample omics data. Therefore, it is essential to develop efficient analytical workflows that ensure repeatability, reproducibility, and the traceability of results in microbiome research.</p><p><strong>Results: </strong>We developed metaTP, a pipeline that integrates bioinformatics tools for analyzing meta-transcriptomic data comprehensively. The pipeline includes quality control, non-coding RNA removal, transcript expression quantification, differential gene expression analysis, functional annotation, and co-expression network analysis. To quantify mRNA expression, we rely on reference indexes built using protein-coding sequences, which help overcome the limitations of database analysis. Additionally, metaTP provides a function for calculating the topological properties of gene co-expression networks, offering an intuitive explanation for correlated gene sets in high-dimensional datasets. The use of metaTP is anticipated to support researchers in addressing microbiota-related biological inquiries and improving the accessibility and interpretation of microbiota RNA-Seq data.</p><p><strong>Conclusions: </strong>We have created a conda package to integrate the tools into our pipeline, making it a flexible and versatile tool for handling meta-transcriptomic sequencing data. The metaTP pipeline is freely available at: https://github.com/nanbei45/metaTP .</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"111"},"PeriodicalIF":2.9,"publicationDate":"2025-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12034179/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143965784","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}