{"title":"The Method of Making the Low-dimensional Map that Preserves the Distance Relationships from Selected Data Point","authors":"Koki Yoshioka, Gen Niina, H. Dozono","doi":"10.1109/IRI58017.2023.00035","DOIUrl":null,"url":null,"abstract":"In recent years, data analyses have been conducted in various fields. If the data distribution is unknown before analysis, it is often necessary to determine it. Based on the obtained distribution, hypotheses are set up and the analysis is conducted according to the task. In general, data are high-dimensional with many elements. Therefore, dimensionality reduction is performed to determine the data distribution. Dimensionality reduction maps high-dimensional data onto a low-dimensional space of two or three dimensions while preserving the data features. This allows humans to easily grasp the data distribution. However, in a low-dimensional space, the number of dimensions that can be used for expression is reduced; thus, there will inevitably be gaps in the distance relationships between data in high-dimensional and low-dimensional spaces. As a result, the gaps lead to misinterpretation of the data distribution and analyses, based on incorrect hypotheses. To solve this problem, one possible method is to select a data point with a large gap in the distance relationships as a candidate and check the low-dimensional map that preserves the distance relationships from the candidate data point to the other data points, while preserving the distance relationships between noncandidate data points as much as possible. In this paper, we propose a method that creates a low-dimensional map in which the distance relationships from one selected data point to the other data points are preserved. As a result of the experiment, we confirmed that the proposed method preserves the distance relationships from the candidate data point to the other data points, while preserving the distance relationships between noncandidate data points as much as possible.","PeriodicalId":290818,"journal":{"name":"2023 IEEE 24th International Conference on Information Reuse and Integration for Data Science (IRI)","volume":"144 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE 24th International Conference on Information Reuse and Integration for Data Science (IRI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IRI58017.2023.00035","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In recent years, data analyses have been conducted in various fields. If the data distribution is unknown before analysis, it is often necessary to determine it. Based on the obtained distribution, hypotheses are set up and the analysis is conducted according to the task. In general, data are high-dimensional with many elements. Therefore, dimensionality reduction is performed to determine the data distribution. Dimensionality reduction maps high-dimensional data onto a low-dimensional space of two or three dimensions while preserving the data features. This allows humans to easily grasp the data distribution. However, in a low-dimensional space, the number of dimensions that can be used for expression is reduced; thus, there will inevitably be gaps in the distance relationships between data in high-dimensional and low-dimensional spaces. As a result, the gaps lead to misinterpretation of the data distribution and analyses, based on incorrect hypotheses. To solve this problem, one possible method is to select a data point with a large gap in the distance relationships as a candidate and check the low-dimensional map that preserves the distance relationships from the candidate data point to the other data points, while preserving the distance relationships between noncandidate data points as much as possible. In this paper, we propose a method that creates a low-dimensional map in which the distance relationships from one selected data point to the other data points are preserved. As a result of the experiment, we confirmed that the proposed method preserves the distance relationships from the candidate data point to the other data points, while preserving the distance relationships between noncandidate data points as much as possible.