BMC Bioinformatics最新文献

筛选
英文 中文
FastQTLmapping: an ultra-fast and memory efficient package for mQTL-like analysis. FastQTLmapping:用于类似mqtl的分析的超快速且内存高效的包。
IF 2.9 3区 生物学
BMC Bioinformatics Pub Date : 2025-04-27 DOI: 10.1186/s12859-025-06130-3
Xingjian Gao, Jiarui Li, Xinxuan Liu, Qianqian Peng, Han Jing, Sibte Hadi, Andrew E Teschendorff, Sijia Wang, Fan Liu
{"title":"FastQTLmapping: an ultra-fast and memory efficient package for mQTL-like analysis.","authors":"Xingjian Gao, Jiarui Li, Xinxuan Liu, Qianqian Peng, Han Jing, Sibte Hadi, Andrew E Teschendorff, Sijia Wang, Fan Liu","doi":"10.1186/s12859-025-06130-3","DOIUrl":"https://doi.org/10.1186/s12859-025-06130-3","url":null,"abstract":"<p><strong>Background: </strong>FastQTLmapping addresses the need for an ultra-fast and memory-efficient solver capable of handling exhaustive multiple regression analysis with a vast number of dependent and explanatory variables, including covariates. This challenge is especially pronounced in methylation quantitative trait loci (mQTL)-like analysis, which typically involves high-dimensional genetic and epigenetic data.</p><p><strong>Results: </strong>FastQTLmapping is a precompiled C++ software solution accelerated by Intel MKL and GSL, freely available at https://github.com/Fun-Gene/fastQTLmapping . Compared to state-of-the-art methods (MatrixEQTL, FastQTL, and TensorQTL), fastQTLmapping demonstrated an order of magnitude speed improvement, coupled with a marked reduction in peak memory usage. In a large dataset consisting of 3500 individuals, 8 million SNPs, 0.8 million CpGs, and 20 covariates, fastQTLmapping completed the entire mQTL analysis in 4.5 h with only 13.1 GB peak memory usage.</p><p><strong>Conclusions: </strong>FastQTLmapping effectively expedites comprehensive mQTL analyses by providing a robust and generic approach that accommodates large-scale genomic datasets with covariates. This solution has the potential to streamline mQTL-like studies and inform future method development for efficient computational genomics.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"112"},"PeriodicalIF":2.9,"publicationDate":"2025-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12036243/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143953063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
metaTP: a meta-transcriptome data analysis pipeline with integrated automated workflows. metaTP:集成自动化工作流程的元转录组数据分析管道。
IF 2.9 3区 生物学
BMC Bioinformatics Pub Date : 2025-04-26 DOI: 10.1186/s12859-025-06137-w
Limuxuan He, Quan Zou, Yansu Wang
{"title":"metaTP: a meta-transcriptome data analysis pipeline with integrated automated workflows.","authors":"Limuxuan He, Quan Zou, Yansu Wang","doi":"10.1186/s12859-025-06137-w","DOIUrl":"https://doi.org/10.1186/s12859-025-06137-w","url":null,"abstract":"<p><strong>Background: </strong>The accessibility of sequencing technologies has enabled meta-transcriptomic studies to provide a deeper understanding of microbial ecology at the transcriptional level. Analyzing omics data involves multiple steps that require the use of various bioinformatics tools. With the increasing availability of public microbiome datasets, conducting meta-analyses can reveal new insights into microbiome activity. However, the reproducibility of data is often compromised due to variations in processing methods for sample omics data. Therefore, it is essential to develop efficient analytical workflows that ensure repeatability, reproducibility, and the traceability of results in microbiome research.</p><p><strong>Results: </strong>We developed metaTP, a pipeline that integrates bioinformatics tools for analyzing meta-transcriptomic data comprehensively. The pipeline includes quality control, non-coding RNA removal, transcript expression quantification, differential gene expression analysis, functional annotation, and co-expression network analysis. To quantify mRNA expression, we rely on reference indexes built using protein-coding sequences, which help overcome the limitations of database analysis. Additionally, metaTP provides a function for calculating the topological properties of gene co-expression networks, offering an intuitive explanation for correlated gene sets in high-dimensional datasets. The use of metaTP is anticipated to support researchers in addressing microbiota-related biological inquiries and improving the accessibility and interpretation of microbiota RNA-Seq data.</p><p><strong>Conclusions: </strong>We have created a conda package to integrate the tools into our pipeline, making it a flexible and versatile tool for handling meta-transcriptomic sequencing data. The metaTP pipeline is freely available at: https://github.com/nanbei45/metaTP .</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"111"},"PeriodicalIF":2.9,"publicationDate":"2025-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12034179/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143965784","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HPOseq: a deep ensemble model for predicting the protein-phenotype relationships based on protein sequences. HPOseq:基于蛋白质序列预测蛋白质-表型关系的深度集成模型。
IF 2.9 3区 生物学
BMC Bioinformatics Pub Date : 2025-04-22 DOI: 10.1186/s12859-025-06122-3
Kai Zhao, Zhuocheng Ji, Linlin Zhang, Na Quan, Yuheng Li, Guanglei Yu, Xuehua Bi
{"title":"HPOseq: a deep ensemble model for predicting the protein-phenotype relationships based on protein sequences.","authors":"Kai Zhao, Zhuocheng Ji, Linlin Zhang, Na Quan, Yuheng Li, Guanglei Yu, Xuehua Bi","doi":"10.1186/s12859-025-06122-3","DOIUrl":"https://doi.org/10.1186/s12859-025-06122-3","url":null,"abstract":"<p><strong>Background: </strong>Understanding the relationships between proteins and specific disease phenotypes contributes to the early detection of diseases and advances the development of personalized medicine. The acquisition of a large amount of proteomics data has facilitated this process. To improve discovery efficiency and reduce the time and financial costs associated with biological experiments, various computational methods have yielded promising results. However, the lack of rich and reliable protein-related information still presents challenges in this process.</p><p><strong>Results: </strong>In this paper, we propose an ensemble prediction model, named HPOseq, which predicts human protein-phenotype relationships based only on sequence information. HPOseq establishes two base models to achieve objectives. One directly extracts internal information from amino acid sequences as protein features to predict the associated phenotypes. The other builds a protein-protein network based on sequence similarity, extracting information between proteins for phenotype prediction. Ultimately, an ensemble module is employed to integrate the predictions from both base models, resulting in the final prediction.</p><p><strong>Conclusion: </strong>The results of 5-fold cross-validation reveal that HPOseq outperforms seven baseline methods for predicting protein-phenotype relationships. Moreover, we conduct case studies from the points of phenotype annotation and protein analysis to verify the practical significance of HPOseq.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"110"},"PeriodicalIF":2.9,"publicationDate":"2025-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12013097/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143962799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
metacp: a versatile software package for combining dependent or independent p-values. Metacp:用于组合相关或独立p值的通用软件包。
IF 2.9 3区 生物学
BMC Bioinformatics Pub Date : 2025-04-19 DOI: 10.1186/s12859-025-06126-z
Evgenia K Nikolitsa, Panagiota I Kontou, Pantelis G Bagos
{"title":"metacp: a versatile software package for combining dependent or independent p-values.","authors":"Evgenia K Nikolitsa, Panagiota I Kontou, Pantelis G Bagos","doi":"10.1186/s12859-025-06126-z","DOIUrl":"https://doi.org/10.1186/s12859-025-06126-z","url":null,"abstract":"<p><strong>Background: </strong>We present metacp an open-source software package which implements an abundance of statistical methods for the combination of both independent p-values, with methods such as Fisher's, Stouffer's and Edgington's, and dependent p-values, with methods such as Brown's method and the Cauchy Combination Test.</p><p><strong>Results: </strong>The tool is available in Python and STATA, it is very fast, and it is easy to use, requiring only minimal input. It offers a useful resource for combining both independent and dependent p-values, responding to diverse analytical needs for practitioners performing meta-analyses and bioinformaticians developing tools for a variety of applications. Depending on the input data it can be used for gene-based testing, for analysis of multiple traits in GWAS, or for combining diverse multi-omics data such as those of a TWAS, a colocalization or an RNA-seq study.</p><p><strong>Conclusions: </strong>Compared to other similar packages (like poolr or metap), metacp implements the largest collection of statistical methods for this problem, offering users the flexibility to choose from a wide variety of approaches. Being available both as a standalone Python tool and as a STATA command, metacp is accessible to a broad and diverse audience, including practitioners conducting meta-analyses across various fields and bioinformaticians developing new tools where p-value combination is a crucial component.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"109"},"PeriodicalIF":2.9,"publicationDate":"2025-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12008841/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143961035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Correction: MethylSeqLogo: DNA methylation smart sequence logos. 更正:MethylSeqLogo: DNA甲基化智能序列标识。
IF 2.9 3区 生物学
BMC Bioinformatics Pub Date : 2025-04-18 DOI: 10.1186/s12859-025-06124-1
Fei-Man Hsu, Paul Horton
{"title":"Correction: MethylSeqLogo: DNA methylation smart sequence logos.","authors":"Fei-Man Hsu, Paul Horton","doi":"10.1186/s12859-025-06124-1","DOIUrl":"https://doi.org/10.1186/s12859-025-06124-1","url":null,"abstract":"","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"25 Suppl 2","pages":"394"},"PeriodicalIF":2.9,"publicationDate":"2025-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12008853/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143967520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GRLGRN: graph representation-based learning to infer gene regulatory networks from single-cell RNA-seq data. GRLGRN:基于图表示的学习,从单细胞RNA-seq数据推断基因调控网络。
IF 2.9 3区 生物学
BMC Bioinformatics Pub Date : 2025-04-18 DOI: 10.1186/s12859-025-06116-1
Kai Wang, Yulong Li, Fei Liu, Xiaoli Luan, Xinglong Wang, Jingwen Zhou
{"title":"GRLGRN: graph representation-based learning to infer gene regulatory networks from single-cell RNA-seq data.","authors":"Kai Wang, Yulong Li, Fei Liu, Xiaoli Luan, Xinglong Wang, Jingwen Zhou","doi":"10.1186/s12859-025-06116-1","DOIUrl":"https://doi.org/10.1186/s12859-025-06116-1","url":null,"abstract":"<p><strong>Background: </strong>A gene regulatory network (GRN) is a graph-level representation that describes the regulatory relationships between transcription factors and target genes in cells. The reconstruction of GRNs can help investigate cellular dynamics, drug design, and metabolic systems, and the rapid development of single-cell RNA sequencing (scRNA-seq) technology provides important opportunities while posing significant challenges for reconstructing GRNs. A number of methods for inferring GRNs have been proposed in recent years based on traditional machine learning and deep learning algorithms. However, inferring the GRN from scRNA-seq data remains challenging owing to cellular heterogeneity, measurement noise, and data dropout.</p><p><strong>Results: </strong>In this study, we propose a deep learning model called graph representational learning GRN (GRLGRN) to infer the latent regulatory dependencies between genes based on a prior GRN and data on the profiles of single-cell gene expressions. GRLGRN uses a graph transformer network to extract implicit links from the prior GRN, and encodes the features of genes by using both an adjacency matrix of implicit links and a matrix of the profile of gene expression. Moreover, it uses attention mechanisms to improve feature extraction, and feeds the refined gene embeddings into an output module to infer gene regulatory relationships. To evaluate the performance of GRLGRN, we compared it with prevalent models and performed ablation experiments on seven cell-line datasets with three ground-truth networks. The results showed that GRLGRN achieved the best predictions in AUROC and AUPRC on 78.6% and 80.9% of the datasets, and achieved an average improvement of 7.3% in AUROC and 30.7% in AUPRC. The interpretation discussion and the network visualization were conducted.</p><p><strong>Conclusions: </strong>The experimental results and case studies illustrate the considerable performance of GRLGRN in predicting gene interactions and provide interpretability for the prediction tasks, such as identifying hub genes in the network and uncovering implicit links.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"108"},"PeriodicalIF":2.9,"publicationDate":"2025-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12008888/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143958320","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DeBasher: a flow-based programming bash extension for the implementation of complex and interactive workflows with stateful processes. DeBasher:一个基于流的编程bash扩展,用于实现具有状态流程的复杂交互式工作流。
IF 2.9 3区 生物学
BMC Bioinformatics Pub Date : 2025-04-16 DOI: 10.1186/s12859-025-06108-1
Daniel Ortiz-Martínez
{"title":"DeBasher: a flow-based programming bash extension for the implementation of complex and interactive workflows with stateful processes.","authors":"Daniel Ortiz-Martínez","doi":"10.1186/s12859-025-06108-1","DOIUrl":"https://doi.org/10.1186/s12859-025-06108-1","url":null,"abstract":"<p><strong>Background: </strong>Bioinformatics data analysis faces significant challenges. As data analysis often takes the form of pipelines or workflows, workflow managers (WfMs) have become essential. Data flow programming constitutes the preferred approach in WfMs, enabling parallel processes activated reactively based on input availability. While this paradigm typically follows a linear, acyclic progression, cyclic workflows are sometimes necessary in bioinformatics analyses. These cyclic workflows also present an opportunity to explore workflow interactivity, a feature not widely implemented in existing WfMs.</p><p><strong>Results: </strong>We propose DeBasher, a tool that adopts the flow-based programming (FBP) paradigm, in which the workflow components are in control of their life cycle and can store state information, allowing the execution of complex workflows that include cycles. DeBasher also incorporates a powerful model of interactivity, where the user can alter the behavior of a running workflow. Additionally, DeBasher allows the user to define triggers so as to initiate the execution of a complete workflow or a part of it. The ability to execute processes with state and in control of their life cycle also has applications in dynamic scheduling tasks. Furthermore, DeBasher presents a series of extra features, including the combination of multiple workflows at runtime through a feature we have called runtime piping, switching to static scheduling to increase scalability, or implementing processes in multiple languages. DeBasher has been successfully used to process 131.7 TB of genomic data by means of a variant calling pipeline.</p><p><strong>Conclusions: </strong>DeBasher is an FBP Bash extension that can be useful in a wide range of situations and in particular when implementing complex workflows, workflows with interactivity or triggers, or when a high scalability is required.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"106"},"PeriodicalIF":2.9,"publicationDate":"2025-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12004750/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143963140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PRED-LD: efficient imputation of GWAS summary statistics. PRED-LD: GWAS汇总统计数据的有效输入。
IF 2.9 3区 生物学
BMC Bioinformatics Pub Date : 2025-04-16 DOI: 10.1186/s12859-025-06119-y
Georgios A Manios, Aikaterini Michailidi, Panagiota I Kontou, Pantelis G Bagos
{"title":"PRED-LD: efficient imputation of GWAS summary statistics.","authors":"Georgios A Manios, Aikaterini Michailidi, Panagiota I Kontou, Pantelis G Bagos","doi":"10.1186/s12859-025-06119-y","DOIUrl":"https://doi.org/10.1186/s12859-025-06119-y","url":null,"abstract":"<p><strong>Background: </strong>Genome-wide association studies have identified connections between genetic variations and diseases, but they only examine a small portion of single nucleotide polymorphisms. To enhance genetic findings, researchers suggest imputing genotypes for unmeasured SNPs to improve coverage and statistical power. When this is not possible, summary statistics imputation can be used as an alternative. The available summary statistics imputation tools rely on reference panels, such as the 1000 Genomes Project, to estimate linkage disequilibrium (LD) between variants for accurate imputation. Tools like FAPI and SSIMP use these reference panels in variant call format (VCF) for this purpose, though this process can be time-consuming. A more effective approach for processing reference panels in summary statistics imputation was proposed in RAISS. In this approach, the LD among the variants is precomputed from the reference panel, prior to imputation, thereby reducing computational time.</p><p><strong>Results: </strong>We present PRED-LD, an imputation method for GWAS summary statistics that aims to enhance the resolution of genetic association analyses. The proposed method uses precomputed linkage disequilibrium statistics from HapMap, Pheno Scanner and TOP-LD to impute summary statistics, given beta coefficients and standard errors. The single-point approach that we describe provides a fast and accurate way to estimate associations for untyped single nucleotide polymorphisms that exhibit high linkage disequilibrium (LD). The proposed method is faster, provides accurate imputation compared to existing tools, and has been implemented in both a web service ( https://compgen.dib.uth.gr/PRED-LD/ ) and a command-line tool ( https://github.com/pbagos/PRED-LD ), making it a useful resource for the research community.</p><p><strong>Conclusions: </strong>PRED-LD offers an efficient and accurate method for GWAS summary statistics imputation, providing faster performance, direct result interpretation, and the ability to use multiple reference panels. Also, the online version of PRED-LD simplifies obtaining LD information and performing imputation tasks without downloading reference panels and will be continuously updated to support tools for meta-analysis and fine-mapping in GWAS.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"107"},"PeriodicalIF":2.9,"publicationDate":"2025-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12004831/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143963406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Phoenics: a novel statistical approach for longitudinal metabolomic pathway analysis. 腓尼基:纵向代谢组学途径分析的一种新的统计方法。
IF 2.9 3区 生物学
BMC Bioinformatics Pub Date : 2025-04-16 DOI: 10.1186/s12859-025-06118-z
Camille Guilmineau, Marie Tremblay-Franco, Nathalie Vialaneix, Rémi Servien
{"title":"Phoenics: a novel statistical approach for longitudinal metabolomic pathway analysis.","authors":"Camille Guilmineau, Marie Tremblay-Franco, Nathalie Vialaneix, Rémi Servien","doi":"10.1186/s12859-025-06118-z","DOIUrl":"https://doi.org/10.1186/s12859-025-06118-z","url":null,"abstract":"<p><strong>Background: </strong>Metabolomics describes the metabolic profile of an organism at a given time by the concentrations of its constituent metabolites. When studied over time, metabolite concentrations can help understand the dynamical evolution of a biological process. However, metabolites are involved into sequences of chemical reactions, called metabolic pathways, related to a given biological function. Accounting for these pathways into statistical methods for metabolomic data is thus a relevant way to directly express results in terms of biological functions and to increase their interpretability.</p><p><strong>Methods: </strong>We propose a new method, phoenics, to perform differential analysis for longitudinal metabolomic data at the pathway level. In short, phoenics proceeds in two steps: First, the matrix of metabolite quantifications is transformed by a dimension reduction approach accounting for pathway information. Then, a mixed linear model is fitted on the transformed data.</p><p><strong>Results: </strong>This method was applied to semi-synthetic NMR data and two real NMR datasets assessing the effects of antibiotics and irritable bowel syndrome on feces. Results showed that phoenics properly controls the Type I error rate and has a better ability to detect differential metabolic pathways and to extract new impacted biological functions than alternative methods. The method is implemented in the R package phoenics available on CRAN.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"105"},"PeriodicalIF":2.9,"publicationDate":"2025-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12001596/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143961579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Redefining the high variable genes by optimized LOESS regression with positive ratio. 利用优化的正比黄土回归重新定义高变量基因。
IF 2.9 3区 生物学
BMC Bioinformatics Pub Date : 2025-04-15 DOI: 10.1186/s12859-025-06112-5
Yue Xie, Zehua Jing, Hailin Pan, Xun Xu, Qi Fang
{"title":"Redefining the high variable genes by optimized LOESS regression with positive ratio.","authors":"Yue Xie, Zehua Jing, Hailin Pan, Xun Xu, Qi Fang","doi":"10.1186/s12859-025-06112-5","DOIUrl":"https://doi.org/10.1186/s12859-025-06112-5","url":null,"abstract":"<p><strong>Background: </strong>Single-cell RNA sequencing allows for the exploration of transcriptomic features at the individual cell level, but the high dimensionality and sparsity of the data pose substantial challenges for downstream analysis. Feature selection, therefore, is a critical step to reduce dimensionality and enhance interpretability.</p><p><strong>Results: </strong>We developed a robust feature selection algorithm that leverages optimized locally estimated scatterplot smoothing regression (LOESS) to precisely capture the relationship between gene average expression level and positive ratio while minimizing overfitting. Our evaluations showed that our algorithm consistently outperforms eight leading feature selection methods across three benchmark criteria and helps improve downstream analysis, thus offering a significant improvement in gene subset selection.</p><p><strong>Conclusions: </strong>By preserving key biological information through feature selection, GLP provides informative features to enhance the accuracy and effectiveness of downstream analyses.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"104"},"PeriodicalIF":2.9,"publicationDate":"2025-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12001687/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143964553","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信