Danny Salem, Anuradha Surendra, Graeme SV McDowell, Miroslava Cuperlovic-Culf
{"title":"Projection Statistics ProST Online statistical assessment of group separation in data projection analysis","authors":"Danny Salem, Anuradha Surendra, Graeme SV McDowell, Miroslava Cuperlovic-Culf","doi":"10.1101/2024.09.04.611273","DOIUrl":null,"url":null,"abstract":"Motivation: Unsupervised data projection for the determination of trends in the data, visualization of multidimensional data in a reduced dimension space or feature space reduction through combination of data is a major step in data mining. Methods such as Principal Component Analysis or t-Distribution Stochastic Neighbor Embedding are regularly used as one of the first steps in computational biology or omics investigation. However, the significance of the separation of sample groups by these methods generally relies on visual assessment. User-friendly application for different projection methods, each focusing on distinct data properties, are needed as well as a rigorous method for statistical determination of the significance of separation of groups of interest in each dataset.\nResults: We present Projection STatistics (ProST), a user-friendly solution for data projection analysis providing three unsupervised (PCA, t-SNE and UMAP) and one supervised (LDA) approach. For each method we are including a novel statistical investigation of the significance of group separation with Mann-Whitney U-rank or t-test analysis as well as necessary preprocessing steps. ProST provides an unbiased, objective application of the determination of the significance of the separation of measurement groups through either linear or manifold projection analysis with methods ranging from a focus on the separation of points based on major variances or on point proximity based on distance.\nAvailability: The ProST software application is freely available at https://complimet.ca/shiny/ProST/ with source code provided on https://github.com/complimet/prost.","PeriodicalId":501307,"journal":{"name":"bioRxiv - Bioinformatics","volume":"60 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"bioRxiv - Bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.09.04.611273","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Motivation: Unsupervised data projection for the determination of trends in the data, visualization of multidimensional data in a reduced dimension space or feature space reduction through combination of data is a major step in data mining. Methods such as Principal Component Analysis or t-Distribution Stochastic Neighbor Embedding are regularly used as one of the first steps in computational biology or omics investigation. However, the significance of the separation of sample groups by these methods generally relies on visual assessment. User-friendly application for different projection methods, each focusing on distinct data properties, are needed as well as a rigorous method for statistical determination of the significance of separation of groups of interest in each dataset.
Results: We present Projection STatistics (ProST), a user-friendly solution for data projection analysis providing three unsupervised (PCA, t-SNE and UMAP) and one supervised (LDA) approach. For each method we are including a novel statistical investigation of the significance of group separation with Mann-Whitney U-rank or t-test analysis as well as necessary preprocessing steps. ProST provides an unbiased, objective application of the determination of the significance of the separation of measurement groups through either linear or manifold projection analysis with methods ranging from a focus on the separation of points based on major variances or on point proximity based on distance.
Availability: The ProST software application is freely available at https://complimet.ca/shiny/ProST/ with source code provided on https://github.com/complimet/prost.