Data vulnerability index for the “crowding problem” in nonlinear dimensionality reduction

IF 5.5 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neurocomputing Pub Date : 2025-06-10 DOI:10.1016/j.neucom.2025.130619

Dominik Olszewski

{"title":"Data vulnerability index for the “crowding problem” in nonlinear dimensionality reduction","authors":"Dominik Olszewski","doi":"10.1016/j.neucom.2025.130619","DOIUrl":null,"url":null,"abstract":"<div><div>We propose a data vulnerability index measuring the intensity and harmfulness level of the “crowding problem” in nonlinear dimensionality reduction. The index is useful in supporting nonlinear dimensionality reduction by increasing its robustness to this problem. The index informs about the necessity of using the methods secured from the problem and justifies their employment. The vulnerability index provides auxiliary preliminary information that is helpful in conducting and guiding further dimensionality reduction and data visualization. The introduced index is formulated on the basis of the <span><math><mi>k</mi></math></span>-Nearest Neighbors (<span><math><mi>k</mi></math></span>-NN) graph of the data. The graph allows for estimating the intrinsic dimensionality of the low-dimensional manifold embedded in the input high-dimensional linear Euclidean space, which is required during our index computation. The experiments on thirteen real datasets confirm the usefulness of our index in nonlinear dimensionality reduction and its ability to detect the “crowding problem” and determine its gravity. The index values ranged from 2 to 26 corresponding to an increase in superiority of the methods using the <span><math><mi>t</mi></math></span>-distribution over those not using it. Moreover, we conducted additional experiments on tuning the neighborhood width parameter in Neighborhood Preserving Projections (NPPs). For most datasets, an improvement was achieved based on Adjusted Mutual Information (AMI) and silhouette values. The highest increase in AMI was obtained for <span><math><mi>t</mi></math></span>-NeRV (0.9410 vs. 0.8169) and in silhouette for <span><math><mi>t</mi></math></span>-SNE (0.8124 vs. 0.6992).</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"648 ","pages":"Article 130619"},"PeriodicalIF":5.5000,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231225012913","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

We propose a data vulnerability index measuring the intensity and harmfulness level of the “crowding problem” in nonlinear dimensionality reduction. The index is useful in supporting nonlinear dimensionality reduction by increasing its robustness to this problem. The index informs about the necessity of using the methods secured from the problem and justifies their employment. The vulnerability index provides auxiliary preliminary information that is helpful in conducting and guiding further dimensionality reduction and data visualization. The introduced index is formulated on the basis of the

k

-Nearest Neighbors (

k

-NN) graph of the data. The graph allows for estimating the intrinsic dimensionality of the low-dimensional manifold embedded in the input high-dimensional linear Euclidean space, which is required during our index computation. The experiments on thirteen real datasets confirm the usefulness of our index in nonlinear dimensionality reduction and its ability to detect the “crowding problem” and determine its gravity. The index values ranged from 2 to 26 corresponding to an increase in superiority of the methods using the

t

-distribution over those not using it. Moreover, we conducted additional experiments on tuning the neighborhood width parameter in Neighborhood Preserving Projections (NPPs). For most datasets, an improvement was achieved based on Adjusted Mutual Information (AMI) and silhouette values. The highest increase in AMI was obtained for

t

-NeRV (0.9410 vs. 0.8169) and in silhouette for

t

-SNE (0.8124 vs. 0.6992).

查看原文本刊更多论文

非线性降维中“拥挤问题”的数据脆弱性指标

在非线性降维中，我们提出了一个衡量“拥挤问题”的强度和危害程度的数据脆弱性指数。该指标通过增强对该问题的鲁棒性，有助于支持非线性降维。该指数告知使用从问题中获得的方法的必要性，并证明其雇用的合理性。脆弱性指数提供了辅助的初步信息，有助于进行和指导进一步降维和数据可视化。引入的索引是在数据的k近邻（k-NN）图的基础上制定的。该图允许估计嵌入在输入高维线性欧几里德空间中的低维流形的固有维数，这在我们的索引计算中是必需的。在13个真实数据集上的实验证实了我们的指数在非线性降维中的有效性，以及它在检测“拥挤问题”和确定其严重性方面的能力。指标值范围从2到26，对应于使用t分布的方法比不使用t分布的方法优越性的增加。此外，我们还对邻域保持投影（NPPs）的邻域宽度参数进行了进一步的实验。对于大多数数据集，基于调整互信息（AMI）和轮廓值实现了改进。t-NeRV组AMI增加最多（0.9410比0.8169），t-SNE组AMI增加最多（0.8124比0.6992）。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Neurocomputing 工程技术-计算机：人工智能

CiteScore

13.10

自引率

10.00%

发文量

1382

审稿时长

70 days

期刊介绍： Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.