Joint Characterization of Multiscale Information in High Dimensional Data

Adv. Artif. Intell. Mach. Learn. Pub Date : 2021-02-18 DOI:10.54364/AAIML.2021.1113

D. Sousa, C. Small

{"title":"Joint Characterization of Multiscale Information in High Dimensional Data","authors":"D. Sousa, C. Small","doi":"10.54364/AAIML.2021.1113","DOIUrl":null,"url":null,"abstract":"High dimensional feature spaces can contain information onmultiple scales. At global scales, spanning an entire feature space, covariance structure among dimensions can determine topology and intrinsic dimensionality. In addition, local scale information can be captured by the structure of low-dimensionalmanifolds embeddedwithin the high-dimensional feature space. Such manifolds may not easily be resolved by the global covariance structure. Analysis tools that preferentially operate at one scale can be ineffective at capturing all the information present in cross-scale complexity. We propose a multiscale joint characterization approach designed to exploit synergies between global and local approaches to dimensionality reduction. We illustrate this approach using Principal Components Analysis (PCA) to characterize global variance structure and t-distributed Stochastic Neighbor Embedding (t-SNE) to characterize local manifold structure, also comparing against a second approach for characterization of local manifold structure, Laplacian Eigenmaps (LE). Using both low dimensional synthetic images and high dimensional imaging spectroscopy data, we show that joint characterization is capable of detecting and isolating signals which are not evident from either algorithm alone. Broadly, t-SNE is effective at rendering a randomly oriented low-dimensional map of local manifolds (clustering), and PCA renders this map interpretable by providing global, physically meaningful structure. LE provides additional useful context by reinforcing and refining the feature space topology found by PCA, simplifying structural interpretation, clarifying endmember identification and highlighting new potential endmembers which are not evident from other methods alone. This approach is illustrated using hyperspectral imagery of agriculture resolving crop-specific, field scale, differencesin vegetation reflectance. The fundamental premise of joint characterization could easily be extended to other high dimensional datasets, including image time series and nonimage data. The approach may prove particularly useful for other geospatial data since both robust manifold structure (due to spatial autocorrelation) and physically interpretable global variance structure (due to physical generative processes) are frequently present.","PeriodicalId":373878,"journal":{"name":"Adv. Artif. Intell. Mach. Learn.","volume":"122 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Adv. Artif. Intell. Mach. Learn.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.54364/AAIML.2021.1113","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

Abstract

High dimensional feature spaces can contain information onmultiple scales. At global scales, spanning an entire feature space, covariance structure among dimensions can determine topology and intrinsic dimensionality. In addition, local scale information can be captured by the structure of low-dimensionalmanifolds embeddedwithin the high-dimensional feature space. Such manifolds may not easily be resolved by the global covariance structure. Analysis tools that preferentially operate at one scale can be ineffective at capturing all the information present in cross-scale complexity. We propose a multiscale joint characterization approach designed to exploit synergies between global and local approaches to dimensionality reduction. We illustrate this approach using Principal Components Analysis (PCA) to characterize global variance structure and t-distributed Stochastic Neighbor Embedding (t-SNE) to characterize local manifold structure, also comparing against a second approach for characterization of local manifold structure, Laplacian Eigenmaps (LE). Using both low dimensional synthetic images and high dimensional imaging spectroscopy data, we show that joint characterization is capable of detecting and isolating signals which are not evident from either algorithm alone. Broadly, t-SNE is effective at rendering a randomly oriented low-dimensional map of local manifolds (clustering), and PCA renders this map interpretable by providing global, physically meaningful structure. LE provides additional useful context by reinforcing and refining the feature space topology found by PCA, simplifying structural interpretation, clarifying endmember identification and highlighting new potential endmembers which are not evident from other methods alone. This approach is illustrated using hyperspectral imagery of agriculture resolving crop-specific, field scale, differencesin vegetation reflectance. The fundamental premise of joint characterization could easily be extended to other high dimensional datasets, including image time series and nonimage data. The approach may prove particularly useful for other geospatial data since both robust manifold structure (due to spatial autocorrelation) and physically interpretable global variance structure (due to physical generative processes) are frequently present.

查看原文本刊更多论文

高维数据中多尺度信息的联合表征

高维特征空间可以包含多个尺度的信息。在全局尺度上，跨越整个特征空间，维度之间的协方差结构可以决定拓扑结构和内在维度。此外，局部尺度信息可以通过嵌入在高维特征空间中的低维流形结构来捕获。这种流形可能不容易被全局协方差结构解析。优先在一个尺度上操作的分析工具在捕获跨尺度复杂性中呈现的所有信息时可能是无效的。我们提出了一种多尺度联合表征方法，旨在利用全局和局部降维方法之间的协同作用。我们使用主成分分析(PCA)来表征全局方差结构，使用t分布随机邻居嵌入(t-SNE)来表征局部流形结构，并与用于表征局部流形结构的第二种方法拉普拉斯特征映射(LE)进行了比较。使用低维合成图像和高维成像光谱数据，我们表明联合表征能够检测和隔离信号，而这两种算法单独使用时都不明显。总的来说，t-SNE在呈现随机定向的局部流形(聚类)的低维地图方面是有效的，而PCA通过提供全局的、物理上有意义的结构来呈现这个地图是可解释的。LE通过加强和细化PCA发现的特征空间拓扑、简化结构解释、澄清端元识别和突出其他方法不明显的新的潜在端元，提供了额外的有用上下文。该方法使用农业的高光谱图像来说明，该图像解决了作物特异性、田间规模和植被反射率的差异。联合表征的基本前提可以很容易地扩展到其他高维数据集，包括图像时间序列和非图像数据。该方法可能对其他地理空间数据特别有用，因为经常存在鲁棒流形结构(由于空间自相关)和物理可解释的全球方差结构(由于物理生成过程)。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Adv. Artif. Intell. Mach. Learn.

自引率

0.00%

发文量