{"title":"统计匹配使用自编码器-典型相关分析,核典型相关分析和多输出多层感知器","authors":"Hugues Annoye , Alessandro Beretta , Cédric Heuchenne","doi":"10.1016/j.knosys.2025.114626","DOIUrl":null,"url":null,"abstract":"<div><div>A lot of data are gathered every day, whether via surveys or other sources. For many people, the need for variables from different data sources is a key factor and leads to the need of methods to combine them. A recognized practice to combine data sets in this field is statistical matching. In this paper, we investigate and extend to statistical matching an Autoencoders-Canonical Correlation Analysis (A-CCA). A-CCA is an extension of KCCA, that reduces the need for kernels, with the added benefit of a dimensionality reduction. It can be regarded as an extension of Deep Canonical Correlation Analysis (DCCA), providing enhanced flexibility that makes it well suited for statistical matching. This method is designed to deal with various variable types, sampling weights and incompatibilities among categorical variables. We compare the performance of this method with other methods based on Kernel Canonical Correlation Analysis (KCCA) or Multi-output Multilayer Perceptron (MMLP), using 2017 Belgian Statistics on Income and Living Conditions (SILC). We divide this data set in two parts and we act as if they were coming from two different sources.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"330 ","pages":"Article 114626"},"PeriodicalIF":7.6000,"publicationDate":"2025-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Statistical matching using autoencoders-canonical correlation analysis, kernel canonical correlation analysis and multi-output multilayer perceptron\",\"authors\":\"Hugues Annoye , Alessandro Beretta , Cédric Heuchenne\",\"doi\":\"10.1016/j.knosys.2025.114626\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>A lot of data are gathered every day, whether via surveys or other sources. For many people, the need for variables from different data sources is a key factor and leads to the need of methods to combine them. A recognized practice to combine data sets in this field is statistical matching. In this paper, we investigate and extend to statistical matching an Autoencoders-Canonical Correlation Analysis (A-CCA). A-CCA is an extension of KCCA, that reduces the need for kernels, with the added benefit of a dimensionality reduction. It can be regarded as an extension of Deep Canonical Correlation Analysis (DCCA), providing enhanced flexibility that makes it well suited for statistical matching. This method is designed to deal with various variable types, sampling weights and incompatibilities among categorical variables. We compare the performance of this method with other methods based on Kernel Canonical Correlation Analysis (KCCA) or Multi-output Multilayer Perceptron (MMLP), using 2017 Belgian Statistics on Income and Living Conditions (SILC). We divide this data set in two parts and we act as if they were coming from two different sources.</div></div>\",\"PeriodicalId\":49939,\"journal\":{\"name\":\"Knowledge-Based Systems\",\"volume\":\"330 \",\"pages\":\"Article 114626\"},\"PeriodicalIF\":7.6000,\"publicationDate\":\"2025-10-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Knowledge-Based Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S095070512501665X\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Knowledge-Based Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S095070512501665X","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Statistical matching using autoencoders-canonical correlation analysis, kernel canonical correlation analysis and multi-output multilayer perceptron
A lot of data are gathered every day, whether via surveys or other sources. For many people, the need for variables from different data sources is a key factor and leads to the need of methods to combine them. A recognized practice to combine data sets in this field is statistical matching. In this paper, we investigate and extend to statistical matching an Autoencoders-Canonical Correlation Analysis (A-CCA). A-CCA is an extension of KCCA, that reduces the need for kernels, with the added benefit of a dimensionality reduction. It can be regarded as an extension of Deep Canonical Correlation Analysis (DCCA), providing enhanced flexibility that makes it well suited for statistical matching. This method is designed to deal with various variable types, sampling weights and incompatibilities among categorical variables. We compare the performance of this method with other methods based on Kernel Canonical Correlation Analysis (KCCA) or Multi-output Multilayer Perceptron (MMLP), using 2017 Belgian Statistics on Income and Living Conditions (SILC). We divide this data set in two parts and we act as if they were coming from two different sources.
期刊介绍:
Knowledge-Based Systems, an international and interdisciplinary journal in artificial intelligence, publishes original, innovative, and creative research results in the field. It focuses on knowledge-based and other artificial intelligence techniques-based systems. The journal aims to support human prediction and decision-making through data science and computation techniques, provide a balanced coverage of theory and practical study, and encourage the development and implementation of knowledge-based intelligence models, methods, systems, and software tools. Applications in business, government, education, engineering, and healthcare are emphasized.