{"title":"设计高维数据的事后分析:优于ASCA、rMANOVA和VASCA的综合框架","authors":"Benjamin Mahieu, Véronique Cariou","doi":"10.1002/cem.70039","DOIUrl":null,"url":null,"abstract":"<p>Analytical platforms generate high-dimensional data, where the number of variables usually exceeds the number of observations. Such data are frequently derived from an experimental design, where samples have been collected to identify potential variation in the factors or interactions of interest. To circumvent issues related to large data sizes when evaluating factor and interaction effects, ANOVA simultaneous component analysis (ASCA), regularized multivariate analysis of variance (rMANOVA), and variable selection ASCA (VASCA) have been proposed previously. However, they require computationally intensive methods to test the effects of factors and interactions. In the present paper, multiple ANOVAs (MultANOVA) is proposed as a simple yet effective alternative to the above methods. MultANOVA has the advantage of being direct and fast, as it does not rely on intensive calculation methods, while incorporating a variable selection strategy. This method entails the execution of multiple ANOVAs, one per variable, with multiple test corrections. Subsequent post hoc analyses are also introduced. These encompass multiple least-squares difference tests (MultLSD) for the pairwise comparison of multivariate least-squares means and diagonal canonical discriminant analysis (DCDA) with approximate confidence ellipses to visualize significant effects. MultANOVA is compared to the aforementioned methods based on simulations, which demonstrate that it holds the nominal alpha risk as opposed to rMANOVA and VASCA, while being more powerful than ASCA and VASCA. Even though MultANOVA is proven less powerful than VASCA for variable selection, it has been demonstrated to hold the nominal risk, whereas VASCA does not. Finally, the MultANOVA framework is illustrated based on metagenomics, metabolomics, and spectroscopic data.</p>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"39 7","pages":""},"PeriodicalIF":2.3000,"publicationDate":"2025-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cem.70039","citationCount":"0","resultStr":"{\"title\":\"MultANOVA Followed by Post Hoc Analyses for Designed High-Dimensional Data: A Comprehensive Framework That Outperforms ASCA, rMANOVA, and VASCA\",\"authors\":\"Benjamin Mahieu, Véronique Cariou\",\"doi\":\"10.1002/cem.70039\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Analytical platforms generate high-dimensional data, where the number of variables usually exceeds the number of observations. Such data are frequently derived from an experimental design, where samples have been collected to identify potential variation in the factors or interactions of interest. To circumvent issues related to large data sizes when evaluating factor and interaction effects, ANOVA simultaneous component analysis (ASCA), regularized multivariate analysis of variance (rMANOVA), and variable selection ASCA (VASCA) have been proposed previously. However, they require computationally intensive methods to test the effects of factors and interactions. In the present paper, multiple ANOVAs (MultANOVA) is proposed as a simple yet effective alternative to the above methods. MultANOVA has the advantage of being direct and fast, as it does not rely on intensive calculation methods, while incorporating a variable selection strategy. This method entails the execution of multiple ANOVAs, one per variable, with multiple test corrections. Subsequent post hoc analyses are also introduced. These encompass multiple least-squares difference tests (MultLSD) for the pairwise comparison of multivariate least-squares means and diagonal canonical discriminant analysis (DCDA) with approximate confidence ellipses to visualize significant effects. MultANOVA is compared to the aforementioned methods based on simulations, which demonstrate that it holds the nominal alpha risk as opposed to rMANOVA and VASCA, while being more powerful than ASCA and VASCA. Even though MultANOVA is proven less powerful than VASCA for variable selection, it has been demonstrated to hold the nominal risk, whereas VASCA does not. Finally, the MultANOVA framework is illustrated based on metagenomics, metabolomics, and spectroscopic data.</p>\",\"PeriodicalId\":15274,\"journal\":{\"name\":\"Journal of Chemometrics\",\"volume\":\"39 7\",\"pages\":\"\"},\"PeriodicalIF\":2.3000,\"publicationDate\":\"2025-07-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cem.70039\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Chemometrics\",\"FirstCategoryId\":\"92\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/cem.70039\",\"RegionNum\":4,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"SOCIAL WORK\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Chemometrics","FirstCategoryId":"92","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cem.70039","RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"SOCIAL WORK","Score":null,"Total":0}
MultANOVA Followed by Post Hoc Analyses for Designed High-Dimensional Data: A Comprehensive Framework That Outperforms ASCA, rMANOVA, and VASCA
Analytical platforms generate high-dimensional data, where the number of variables usually exceeds the number of observations. Such data are frequently derived from an experimental design, where samples have been collected to identify potential variation in the factors or interactions of interest. To circumvent issues related to large data sizes when evaluating factor and interaction effects, ANOVA simultaneous component analysis (ASCA), regularized multivariate analysis of variance (rMANOVA), and variable selection ASCA (VASCA) have been proposed previously. However, they require computationally intensive methods to test the effects of factors and interactions. In the present paper, multiple ANOVAs (MultANOVA) is proposed as a simple yet effective alternative to the above methods. MultANOVA has the advantage of being direct and fast, as it does not rely on intensive calculation methods, while incorporating a variable selection strategy. This method entails the execution of multiple ANOVAs, one per variable, with multiple test corrections. Subsequent post hoc analyses are also introduced. These encompass multiple least-squares difference tests (MultLSD) for the pairwise comparison of multivariate least-squares means and diagonal canonical discriminant analysis (DCDA) with approximate confidence ellipses to visualize significant effects. MultANOVA is compared to the aforementioned methods based on simulations, which demonstrate that it holds the nominal alpha risk as opposed to rMANOVA and VASCA, while being more powerful than ASCA and VASCA. Even though MultANOVA is proven less powerful than VASCA for variable selection, it has been demonstrated to hold the nominal risk, whereas VASCA does not. Finally, the MultANOVA framework is illustrated based on metagenomics, metabolomics, and spectroscopic data.
期刊介绍:
The Journal of Chemometrics is devoted to the rapid publication of original scientific papers, reviews and short communications on fundamental and applied aspects of chemometrics. It also provides a forum for the exchange of information on meetings and other news relevant to the growing community of scientists who are interested in chemometrics and its applications. Short, critical review papers are a particularly important feature of the journal, in view of the multidisciplinary readership at which it is aimed.