Data vulnerability index for the “crowding problem” in nonlinear dimensionality reduction

IF 5.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Dominik Olszewski
{"title":"Data vulnerability index for the “crowding problem” in nonlinear dimensionality reduction","authors":"Dominik Olszewski","doi":"10.1016/j.neucom.2025.130619","DOIUrl":null,"url":null,"abstract":"<div><div>We propose a data vulnerability index measuring the intensity and harmfulness level of the “crowding problem” in nonlinear dimensionality reduction. The index is useful in supporting nonlinear dimensionality reduction by increasing its robustness to this problem. The index informs about the necessity of using the methods secured from the problem and justifies their employment. The vulnerability index provides auxiliary preliminary information that is helpful in conducting and guiding further dimensionality reduction and data visualization. The introduced index is formulated on the basis of the <span><math><mi>k</mi></math></span>-Nearest Neighbors (<span><math><mi>k</mi></math></span>-NN) graph of the data. The graph allows for estimating the intrinsic dimensionality of the low-dimensional manifold embedded in the input high-dimensional linear Euclidean space, which is required during our index computation. The experiments on thirteen real datasets confirm the usefulness of our index in nonlinear dimensionality reduction and its ability to detect the “crowding problem” and determine its gravity. The index values ranged from 2 to 26 corresponding to an increase in superiority of the methods using the <span><math><mi>t</mi></math></span>-distribution over those not using it. Moreover, we conducted additional experiments on tuning the neighborhood width parameter in Neighborhood Preserving Projections (NPPs). For most datasets, an improvement was achieved based on Adjusted Mutual Information (AMI) and silhouette values. The highest increase in AMI was obtained for <span><math><mi>t</mi></math></span>-NeRV (0.9410 vs. 0.8169) and in silhouette for <span><math><mi>t</mi></math></span>-SNE (0.8124 vs. 0.6992).</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"648 ","pages":"Article 130619"},"PeriodicalIF":5.5000,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231225012913","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

We propose a data vulnerability index measuring the intensity and harmfulness level of the “crowding problem” in nonlinear dimensionality reduction. The index is useful in supporting nonlinear dimensionality reduction by increasing its robustness to this problem. The index informs about the necessity of using the methods secured from the problem and justifies their employment. The vulnerability index provides auxiliary preliminary information that is helpful in conducting and guiding further dimensionality reduction and data visualization. The introduced index is formulated on the basis of the k-Nearest Neighbors (k-NN) graph of the data. The graph allows for estimating the intrinsic dimensionality of the low-dimensional manifold embedded in the input high-dimensional linear Euclidean space, which is required during our index computation. The experiments on thirteen real datasets confirm the usefulness of our index in nonlinear dimensionality reduction and its ability to detect the “crowding problem” and determine its gravity. The index values ranged from 2 to 26 corresponding to an increase in superiority of the methods using the t-distribution over those not using it. Moreover, we conducted additional experiments on tuning the neighborhood width parameter in Neighborhood Preserving Projections (NPPs). For most datasets, an improvement was achieved based on Adjusted Mutual Information (AMI) and silhouette values. The highest increase in AMI was obtained for t-NeRV (0.9410 vs. 0.8169) and in silhouette for t-SNE (0.8124 vs. 0.6992).
非线性降维中“拥挤问题”的数据脆弱性指标
在非线性降维中,我们提出了一个衡量“拥挤问题”的强度和危害程度的数据脆弱性指数。该指标通过增强对该问题的鲁棒性,有助于支持非线性降维。该指数告知使用从问题中获得的方法的必要性,并证明其雇用的合理性。脆弱性指数提供了辅助的初步信息,有助于进行和指导进一步降维和数据可视化。引入的索引是在数据的k近邻(k-NN)图的基础上制定的。该图允许估计嵌入在输入高维线性欧几里德空间中的低维流形的固有维数,这在我们的索引计算中是必需的。在13个真实数据集上的实验证实了我们的指数在非线性降维中的有效性,以及它在检测“拥挤问题”和确定其严重性方面的能力。指标值范围从2到26,对应于使用t分布的方法比不使用t分布的方法优越性的增加。此外,我们还对邻域保持投影(NPPs)的邻域宽度参数进行了进一步的实验。对于大多数数据集,基于调整互信息(AMI)和轮廓值实现了改进。t-NeRV组AMI增加最多(0.9410比0.8169),t-SNE组AMI增加最多(0.8124比0.6992)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Neurocomputing
Neurocomputing 工程技术-计算机:人工智能
CiteScore
13.10
自引率
10.00%
发文量
1382
审稿时长
70 days
期刊介绍: Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信