{"title":"Data vulnerability index for the “crowding problem” in nonlinear dimensionality reduction","authors":"Dominik Olszewski","doi":"10.1016/j.neucom.2025.130619","DOIUrl":null,"url":null,"abstract":"<div><div>We propose a data vulnerability index measuring the intensity and harmfulness level of the “crowding problem” in nonlinear dimensionality reduction. The index is useful in supporting nonlinear dimensionality reduction by increasing its robustness to this problem. The index informs about the necessity of using the methods secured from the problem and justifies their employment. The vulnerability index provides auxiliary preliminary information that is helpful in conducting and guiding further dimensionality reduction and data visualization. The introduced index is formulated on the basis of the <span><math><mi>k</mi></math></span>-Nearest Neighbors (<span><math><mi>k</mi></math></span>-NN) graph of the data. The graph allows for estimating the intrinsic dimensionality of the low-dimensional manifold embedded in the input high-dimensional linear Euclidean space, which is required during our index computation. The experiments on thirteen real datasets confirm the usefulness of our index in nonlinear dimensionality reduction and its ability to detect the “crowding problem” and determine its gravity. The index values ranged from 2 to 26 corresponding to an increase in superiority of the methods using the <span><math><mi>t</mi></math></span>-distribution over those not using it. Moreover, we conducted additional experiments on tuning the neighborhood width parameter in Neighborhood Preserving Projections (NPPs). For most datasets, an improvement was achieved based on Adjusted Mutual Information (AMI) and silhouette values. The highest increase in AMI was obtained for <span><math><mi>t</mi></math></span>-NeRV (0.9410 vs. 0.8169) and in silhouette for <span><math><mi>t</mi></math></span>-SNE (0.8124 vs. 0.6992).</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"648 ","pages":"Article 130619"},"PeriodicalIF":5.5000,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231225012913","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
We propose a data vulnerability index measuring the intensity and harmfulness level of the “crowding problem” in nonlinear dimensionality reduction. The index is useful in supporting nonlinear dimensionality reduction by increasing its robustness to this problem. The index informs about the necessity of using the methods secured from the problem and justifies their employment. The vulnerability index provides auxiliary preliminary information that is helpful in conducting and guiding further dimensionality reduction and data visualization. The introduced index is formulated on the basis of the -Nearest Neighbors (-NN) graph of the data. The graph allows for estimating the intrinsic dimensionality of the low-dimensional manifold embedded in the input high-dimensional linear Euclidean space, which is required during our index computation. The experiments on thirteen real datasets confirm the usefulness of our index in nonlinear dimensionality reduction and its ability to detect the “crowding problem” and determine its gravity. The index values ranged from 2 to 26 corresponding to an increase in superiority of the methods using the -distribution over those not using it. Moreover, we conducted additional experiments on tuning the neighborhood width parameter in Neighborhood Preserving Projections (NPPs). For most datasets, an improvement was achieved based on Adjusted Mutual Information (AMI) and silhouette values. The highest increase in AMI was obtained for -NeRV (0.9410 vs. 0.8169) and in silhouette for -SNE (0.8124 vs. 0.6992).
期刊介绍:
Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.