Miguel de Figueiredo , Serge Rudaz , Julien Boccard
{"title":"Integration of multifactorial omics data from several sources using multiblock methods","authors":"Miguel de Figueiredo , Serge Rudaz , Julien Boccard","doi":"10.1016/j.chemolab.2025.105403","DOIUrl":null,"url":null,"abstract":"<div><div>With advances in data acquisition methods and technical platforms, omics measurement collection yields increasingly complex data structures. While high-dimensional matrices with more variables than samples can be handled via multivariate methods, extracting information is more challenging in the case of experimental designs involving several factors. Multifactorial models combining ANOVA and multivariate approaches have been developed for this purpose, but analyzing unbalanced designs remains challenging, especially when several data blocks are integrated.</div><div>This study introduces integrative AComDim (iAComDim) and integrative AMOPLS (iAMOPLS) for the analysis of multifactorial data from multiple sources. These methods implement a rebalancing strategy tailored for multiblock settings, ensuring unbiased effect estimators and orthogonal effect matrices even with unbalanced designs. When applied to a multiomics benchmark dataset with two experimental factors, these approaches effectively separate the sources of variation related to the effects in the design while summarizing information into a single multiblock model. Rebalancing strategies prevent the mixing of variation sources in extracted components, and their integration with multiblock chemometric methods offers an efficient and versatile solution for analyzing complex data structures.</div><div>This work establishes a novel framework for analyzing data from single or multiple sources within multifactorial experimental designs. Furthermore, the proposed methods are flexible enough to analyze unbalanced designs with heterogeneously missing replicates across multiple tables, making them broadly applicable for handling multiomics or other datasets in various application domains.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"262 ","pages":"Article 105403"},"PeriodicalIF":3.7000,"publicationDate":"2025-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Chemometrics and Intelligent Laboratory Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169743925000887","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
With advances in data acquisition methods and technical platforms, omics measurement collection yields increasingly complex data structures. While high-dimensional matrices with more variables than samples can be handled via multivariate methods, extracting information is more challenging in the case of experimental designs involving several factors. Multifactorial models combining ANOVA and multivariate approaches have been developed for this purpose, but analyzing unbalanced designs remains challenging, especially when several data blocks are integrated.
This study introduces integrative AComDim (iAComDim) and integrative AMOPLS (iAMOPLS) for the analysis of multifactorial data from multiple sources. These methods implement a rebalancing strategy tailored for multiblock settings, ensuring unbiased effect estimators and orthogonal effect matrices even with unbalanced designs. When applied to a multiomics benchmark dataset with two experimental factors, these approaches effectively separate the sources of variation related to the effects in the design while summarizing information into a single multiblock model. Rebalancing strategies prevent the mixing of variation sources in extracted components, and their integration with multiblock chemometric methods offers an efficient and versatile solution for analyzing complex data structures.
This work establishes a novel framework for analyzing data from single or multiple sources within multifactorial experimental designs. Furthermore, the proposed methods are flexible enough to analyze unbalanced designs with heterogeneously missing replicates across multiple tables, making them broadly applicable for handling multiomics or other datasets in various application domains.
期刊介绍:
Chemometrics and Intelligent Laboratory Systems publishes original research papers, short communications, reviews, tutorials and Original Software Publications reporting on development of novel statistical, mathematical, or computer techniques in Chemistry and related disciplines.
Chemometrics is the chemical discipline that uses mathematical and statistical methods to design or select optimal procedures and experiments, and to provide maximum chemical information by analysing chemical data.
The journal deals with the following topics:
1) Development of new statistical, mathematical and chemometrical methods for Chemistry and related fields (Environmental Chemistry, Biochemistry, Toxicology, System Biology, -Omics, etc.)
2) Novel applications of chemometrics to all branches of Chemistry and related fields (typical domains of interest are: process data analysis, experimental design, data mining, signal processing, supervised modelling, decision making, robust statistics, mixture analysis, multivariate calibration etc.) Routine applications of established chemometrical techniques will not be considered.
3) Development of new software that provides novel tools or truly advances the use of chemometrical methods.
4) Well characterized data sets to test performance for the new methods and software.
The journal complies with International Committee of Medical Journal Editors'' Uniform requirements for manuscripts.