{"title":"Joint Characterization of Multiscale Information in High Dimensional Data","authors":"D. Sousa, C. Small","doi":"10.54364/AAIML.2021.1113","DOIUrl":null,"url":null,"abstract":"High dimensional feature spaces can contain information onmultiple scales. At global scales, spanning an entire feature space, covariance structure among dimensions can determine topology and intrinsic dimensionality. In addition, local scale information can be captured by the structure of low-dimensionalmanifolds embeddedwithin the high-dimensional feature space. Such manifolds may not easily be resolved by the global covariance structure. Analysis tools that preferentially operate at one scale can be ineffective at capturing all the information present in cross-scale complexity. We propose a multiscale joint characterization approach designed to exploit synergies between global and local approaches to dimensionality reduction. We illustrate this approach using Principal Components Analysis (PCA) to characterize global variance structure and t-distributed Stochastic Neighbor Embedding (t-SNE) to characterize local manifold structure, also comparing against a second approach for characterization of local manifold structure, Laplacian Eigenmaps (LE). Using both low dimensional synthetic images and high dimensional imaging spectroscopy data, we show that joint characterization is capable of detecting and isolating signals which are not evident from either algorithm alone. Broadly, t-SNE is effective at rendering a randomly oriented low-dimensional map of local manifolds (clustering), and PCA renders this map interpretable by providing global, physically meaningful structure. LE provides additional useful context by reinforcing and refining the feature space topology found by PCA, simplifying structural interpretation, clarifying endmember identification and highlighting new potential endmembers which are not evident from other methods alone. This approach is illustrated using hyperspectral imagery of agriculture resolving crop-specific, field scale, differencesin vegetation reflectance. The fundamental premise of joint characterization could easily be extended to other high dimensional datasets, including image time series and nonimage data. The approach may prove particularly useful for other geospatial data since both robust manifold structure (due to spatial autocorrelation) and physically interpretable global variance structure (due to physical generative processes) are frequently present.","PeriodicalId":373878,"journal":{"name":"Adv. Artif. Intell. Mach. Learn.","volume":"122 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Adv. Artif. Intell. Mach. Learn.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.54364/AAIML.2021.1113","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9
Abstract
High dimensional feature spaces can contain information onmultiple scales. At global scales, spanning an entire feature space, covariance structure among dimensions can determine topology and intrinsic dimensionality. In addition, local scale information can be captured by the structure of low-dimensionalmanifolds embeddedwithin the high-dimensional feature space. Such manifolds may not easily be resolved by the global covariance structure. Analysis tools that preferentially operate at one scale can be ineffective at capturing all the information present in cross-scale complexity. We propose a multiscale joint characterization approach designed to exploit synergies between global and local approaches to dimensionality reduction. We illustrate this approach using Principal Components Analysis (PCA) to characterize global variance structure and t-distributed Stochastic Neighbor Embedding (t-SNE) to characterize local manifold structure, also comparing against a second approach for characterization of local manifold structure, Laplacian Eigenmaps (LE). Using both low dimensional synthetic images and high dimensional imaging spectroscopy data, we show that joint characterization is capable of detecting and isolating signals which are not evident from either algorithm alone. Broadly, t-SNE is effective at rendering a randomly oriented low-dimensional map of local manifolds (clustering), and PCA renders this map interpretable by providing global, physically meaningful structure. LE provides additional useful context by reinforcing and refining the feature space topology found by PCA, simplifying structural interpretation, clarifying endmember identification and highlighting new potential endmembers which are not evident from other methods alone. This approach is illustrated using hyperspectral imagery of agriculture resolving crop-specific, field scale, differencesin vegetation reflectance. The fundamental premise of joint characterization could easily be extended to other high dimensional datasets, including image time series and nonimage data. The approach may prove particularly useful for other geospatial data since both robust manifold structure (due to spatial autocorrelation) and physically interpretable global variance structure (due to physical generative processes) are frequently present.