Jinjin Chen, Ahmed Mohamed, Dharmesh D Bhuva, Melissa J Davis, Chin Wee Tan
{"title":"mastR: An R Ppackage for Automated Identification of Tissue-Specific Gene Signatures in Multi-Group Differential Expression Analysis.","authors":"Jinjin Chen, Ahmed Mohamed, Dharmesh D Bhuva, Melissa J Davis, Chin Wee Tan","doi":"10.1093/bioinformatics/btaf114","DOIUrl":null,"url":null,"abstract":"<p><strong>Motivation: </strong>Biomarker discovery is important and offers insight into potential underlying mechanisms of disease. While existing biomarker identification methods primarily focus on single cell RNA sequencing (scRNA-seq) data, there remains a need for automated methods designed for labeled bulk RNA-seq data from sorted cell populations or experiments. Current methods require curation of results or statistical thresholds and may not account for tissue background expression. Here we bridge these limitations with an automated marker identification method for labeled bulk RNA-seq data that explicitly considers background expressions.</p><p><strong>Results: </strong>We developed mastR, a novel tool for accurate marker identification using transcriptomic data. It leverages robust statistical pipelines like edgeR and limma to perform pairwise comparisons between groups, and aggregates results using rank-product-based permutation test. A signal-to-noise ratio approach is implemented to minimize background signals. We assessed the performance of mastR-derived NK cell signatures against published curated signatures and found that the mastR-derived signature performs as well, if not better than the published signatures. We further demonstrated the utility of mastR on simulated scRNA-seq data and in comparison with Seurat in terms of marker selection performance.</p><p><strong>Availability: </strong>mastR is freely available from https://bioconductor.org/packages/release/bioc/html/mastR.html. A vignette and guide are available at https://davislaboratory.github.io/mastR. All statistical analyses were carried out using R (version ≥ 4.3.0) and Bioconductor (version ≥3.17).</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics (Oxford, England)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioinformatics/btaf114","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Motivation: Biomarker discovery is important and offers insight into potential underlying mechanisms of disease. While existing biomarker identification methods primarily focus on single cell RNA sequencing (scRNA-seq) data, there remains a need for automated methods designed for labeled bulk RNA-seq data from sorted cell populations or experiments. Current methods require curation of results or statistical thresholds and may not account for tissue background expression. Here we bridge these limitations with an automated marker identification method for labeled bulk RNA-seq data that explicitly considers background expressions.
Results: We developed mastR, a novel tool for accurate marker identification using transcriptomic data. It leverages robust statistical pipelines like edgeR and limma to perform pairwise comparisons between groups, and aggregates results using rank-product-based permutation test. A signal-to-noise ratio approach is implemented to minimize background signals. We assessed the performance of mastR-derived NK cell signatures against published curated signatures and found that the mastR-derived signature performs as well, if not better than the published signatures. We further demonstrated the utility of mastR on simulated scRNA-seq data and in comparison with Seurat in terms of marker selection performance.
Availability: mastR is freely available from https://bioconductor.org/packages/release/bioc/html/mastR.html. A vignette and guide are available at https://davislaboratory.github.io/mastR. All statistical analyses were carried out using R (version ≥ 4.3.0) and Bioconductor (version ≥3.17).
Supplementary information: Supplementary data are available at Bioinformatics online.