支持采样的可扩展流形学习揭示了高维数据的判别聚类结构

IF 23.9 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Nature Machine Intelligence Pub Date : 2025-09-10 DOI:10.1038/s42256-025-01112-9

Dehua Peng, Zhipeng Gui, Wenzhang Wei, Fa Li, Jie Gui, Huayi Wu, Jianya Gong

{"title":"支持采样的可扩展流形学习揭示了高维数据的判别聚类结构","authors":"Dehua Peng, Zhipeng Gui, Wenzhang Wei, Fa Li, Jie Gui, Huayi Wu, Jianya Gong","doi":"10.1038/s42256-025-01112-9","DOIUrl":null,"url":null,"abstract":"<p>As a pivotal branch of machine learning, manifold learning uncovers the intrinsic low-dimensional structure within complex non-linear manifolds in high-dimensional space for visualization, classification, clustering and gaining key insights. Although existing techniques have achieved remarkable successes, they suffer from extensive distortions of cluster structure, which hinders the understanding of underlying patterns. Scalability issues also limit their applicability for handling large-scale data. Here we propose a sampling-based scalable manifold learning technique that enables uniform and discriminative embedding (SUDE) for large-scale and high-dimensional data. It starts by seeking a set of landmarks to construct the low-dimensional skeleton of the entire data and then incorporates the non-landmarks into the learned space by constrained locally linear embedding. We empirically validated the effectiveness of SUDE on synthetic datasets and real-world benchmarks and applied it to analyse single-cell data and detect anomalies in electrocardiogram signals. SUDE exhibits a distinct advantage in scalability with respect to data size and embedding dimension and shows promising performance in cluster separation, integrity and global structure preservation. The experiments also demonstrate notable robustness in embedding quality as the sampling rate decreases.</p>","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"15 1","pages":""},"PeriodicalIF":23.9000,"publicationDate":"2025-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Sampling-enabled scalable manifold learning unveils the discriminative cluster structure of high-dimensional data\",\"authors\":\"Dehua Peng, Zhipeng Gui, Wenzhang Wei, Fa Li, Jie Gui, Huayi Wu, Jianya Gong\",\"doi\":\"10.1038/s42256-025-01112-9\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>As a pivotal branch of machine learning, manifold learning uncovers the intrinsic low-dimensional structure within complex non-linear manifolds in high-dimensional space for visualization, classification, clustering and gaining key insights. Although existing techniques have achieved remarkable successes, they suffer from extensive distortions of cluster structure, which hinders the understanding of underlying patterns. Scalability issues also limit their applicability for handling large-scale data. Here we propose a sampling-based scalable manifold learning technique that enables uniform and discriminative embedding (SUDE) for large-scale and high-dimensional data. It starts by seeking a set of landmarks to construct the low-dimensional skeleton of the entire data and then incorporates the non-landmarks into the learned space by constrained locally linear embedding. We empirically validated the effectiveness of SUDE on synthetic datasets and real-world benchmarks and applied it to analyse single-cell data and detect anomalies in electrocardiogram signals. SUDE exhibits a distinct advantage in scalability with respect to data size and embedding dimension and shows promising performance in cluster separation, integrity and global structure preservation. The experiments also demonstrate notable robustness in embedding quality as the sampling rate decreases.</p>\",\"PeriodicalId\":48533,\"journal\":{\"name\":\"Nature Machine Intelligence\",\"volume\":\"15 1\",\"pages\":\"\"},\"PeriodicalIF\":23.9000,\"publicationDate\":\"2025-09-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Nature Machine Intelligence\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1038/s42256-025-01112-9\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nature Machine Intelligence","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1038/s42256-025-01112-9","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

流形学习是机器学习的一个关键分支，它揭示了高维空间中复杂非线性流形内在的低维结构，用于可视化、分类、聚类和获得关键见解。虽然现有的技术已经取得了显著的成功，但它们受到团簇结构的广泛扭曲的影响，这阻碍了对潜在模式的理解。可伸缩性问题也限制了它们处理大规模数据的适用性。在这里，我们提出了一种基于采样的可扩展流形学习技术，该技术可以实现大规模和高维数据的均匀和判别嵌入（SUDE）。它首先寻找一组地标来构建整个数据的低维骨架，然后通过约束局部线性嵌入将非地标纳入学习空间。我们在合成数据集和现实世界基准上验证了SUDE的有效性，并将其应用于分析单细胞数据和检测心电图信号中的异常。SUDE在数据大小和嵌入维度方面具有明显的可伸缩性优势，并且在簇分离、完整性和全局结构保存方面表现出良好的性能。实验还表明，随着采样率的降低，嵌入质量具有显著的鲁棒性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Sampling-enabled scalable manifold learning unveils the discriminative cluster structure of high-dimensional data

查看原文本刊更多论文

Sampling-enabled scalable manifold learning unveils the discriminative cluster structure of high-dimensional data

As a pivotal branch of machine learning, manifold learning uncovers the intrinsic low-dimensional structure within complex non-linear manifolds in high-dimensional space for visualization, classification, clustering and gaining key insights. Although existing techniques have achieved remarkable successes, they suffer from extensive distortions of cluster structure, which hinders the understanding of underlying patterns. Scalability issues also limit their applicability for handling large-scale data. Here we propose a sampling-based scalable manifold learning technique that enables uniform and discriminative embedding (SUDE) for large-scale and high-dimensional data. It starts by seeking a set of landmarks to construct the low-dimensional skeleton of the entire data and then incorporates the non-landmarks into the learned space by constrained locally linear embedding. We empirically validated the effectiveness of SUDE on synthetic datasets and real-world benchmarks and applied it to analyse single-cell data and detect anomalies in electrocardiogram signals. SUDE exhibits a distinct advantage in scalability with respect to data size and embedding dimension and shows promising performance in cluster separation, integrity and global structure preservation. The experiments also demonstrate notable robustness in embedding quality as the sampling rate decreases.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Nature Machine Intelligence Multiple-

CiteScore

36.90

自引率

2.10%

发文量

127

期刊介绍： Nature Machine Intelligence is a distinguished publication that presents original research and reviews on various topics in machine learning, robotics, and AI. Our focus extends beyond these fields, exploring their profound impact on other scientific disciplines, as well as societal and industrial aspects. We recognize limitless possibilities wherein machine intelligence can augment human capabilities and knowledge in domains like scientific exploration, healthcare, medical diagnostics, and the creation of safe and sustainable cities, transportation, and agriculture. Simultaneously, we acknowledge the emergence of ethical, social, and legal concerns due to the rapid pace of advancements. To foster interdisciplinary discussions on these far-reaching implications, Nature Machine Intelligence serves as a platform for dialogue facilitated through Comments, News Features, News & Views articles, and Correspondence. Our goal is to encourage a comprehensive examination of these subjects. Similar to all Nature-branded journals, Nature Machine Intelligence operates under the guidance of a team of skilled editors. We adhere to a fair and rigorous peer-review process, ensuring high standards of copy-editing and production, swift publication, and editorial independence.