降维方法的两个关键性质

2014 IEEE Symposium on Computational Intelligence and Data Mining (CIDM) Pub Date : 2014-12-01 DOI:10.1109/CIDM.2014.7008663

J. Lee, M. Verleysen

{"title":"降维方法的两个关键性质","authors":"J. Lee, M. Verleysen","doi":"10.1109/CIDM.2014.7008663","DOIUrl":null,"url":null,"abstract":"Dimensionality reduction aims at providing faithful low-dimensional representations of high-dimensional data. Its general principle is to attempt to reproduce in a low-dimensional space the salient characteristics of data, such as proximities. A large variety of methods exist in the literature, ranging from principal component analysis to deep neural networks with a bottleneck layer. In this cornucopia, it is rather difficult to find out why a few methods clearly outperform others. This paper identifies two important properties that enable some recent methods like stochastic neighborhood embedding and its variants to produce improved visualizations of high-dimensional data. The first property is a low sensitivity to the phenomenon of distance concentration. The second one is plasticity, that is, the capability to forget about some data characteristics to better reproduce the other ones. In a manifold learning perspective, breaking some proximities typically allow for a better unfolding of data. Theoretical developments as well as experiments support our claim that both properties have a strong impact. In particular, we show that equipping classical methods with the missing properties significantly improves their results.","PeriodicalId":117542,"journal":{"name":"2014 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"22","resultStr":"{\"title\":\"Two key properties of dimensionality reduction methods\",\"authors\":\"J. Lee, M. Verleysen\",\"doi\":\"10.1109/CIDM.2014.7008663\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Dimensionality reduction aims at providing faithful low-dimensional representations of high-dimensional data. Its general principle is to attempt to reproduce in a low-dimensional space the salient characteristics of data, such as proximities. A large variety of methods exist in the literature, ranging from principal component analysis to deep neural networks with a bottleneck layer. In this cornucopia, it is rather difficult to find out why a few methods clearly outperform others. This paper identifies two important properties that enable some recent methods like stochastic neighborhood embedding and its variants to produce improved visualizations of high-dimensional data. The first property is a low sensitivity to the phenomenon of distance concentration. The second one is plasticity, that is, the capability to forget about some data characteristics to better reproduce the other ones. In a manifold learning perspective, breaking some proximities typically allow for a better unfolding of data. Theoretical developments as well as experiments support our claim that both properties have a strong impact. In particular, we show that equipping classical methods with the missing properties significantly improves their results.\",\"PeriodicalId\":117542,\"journal\":{\"name\":\"2014 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)\",\"volume\":\"4 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"22\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CIDM.2014.7008663\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CIDM.2014.7008663","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 22

摘要

降维的目的是为高维数据提供忠实的低维表示。它的一般原理是试图在低维空间中再现数据的显著特征，例如接近性。文献中存在各种各样的方法，从主成分分析到具有瓶颈层的深度神经网络。在这种丰富性中，很难找出为什么一些方法明显优于其他方法。本文确定了两个重要的性质，使一些最近的方法，如随机邻域嵌入及其变体，能够产生改进的高维数据的可视化。第一个性质是对距离集中现象的低灵敏度。第二个是可塑性，即忘记某些数据特征以更好地再现其他数据特征的能力。从多元学习的角度来看，打破一些接近性通常可以更好地展开数据。理论发展和实验都支持我们的说法，即这两种性质都有很强的影响。特别地，我们证明了在经典方法中加入缺失的性质可以显著改善它们的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Two key properties of dimensionality reduction methods

Dimensionality reduction aims at providing faithful low-dimensional representations of high-dimensional data. Its general principle is to attempt to reproduce in a low-dimensional space the salient characteristics of data, such as proximities. A large variety of methods exist in the literature, ranging from principal component analysis to deep neural networks with a bottleneck layer. In this cornucopia, it is rather difficult to find out why a few methods clearly outperform others. This paper identifies two important properties that enable some recent methods like stochastic neighborhood embedding and its variants to produce improved visualizations of high-dimensional data. The first property is a low sensitivity to the phenomenon of distance concentration. The second one is plasticity, that is, the capability to forget about some data characteristics to better reproduce the other ones. In a manifold learning perspective, breaking some proximities typically allow for a better unfolding of data. Theoretical developments as well as experiments support our claim that both properties have a strong impact. In particular, we show that equipping classical methods with the missing properties significantly improves their results.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2014 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)

自引率

0.00%

发文量