{"title":"DVC2:视频结构的深度视频级联聚类","authors":"Zihua Wang , Siya Mi , Yu Zhang","doi":"10.1016/j.neucom.2025.131565","DOIUrl":null,"url":null,"abstract":"<div><div>Video clustering is a critical unsupervised learning task, where category labels are unavailable, unlike in supervised video classification. The primary challenge is learning meaningful video representations without annotations to effectively group similar videos. Most existing methods extract frame-level features and apply standard clustering algorithms such as K-means, but they often fail to capture temporal relationships inherent in video data. In this paper, we introduce Deep Video Cascade Clustering (<span><math><msup><mtext>DVC</mtext><mn>2</mn></msup></math></span>), a novel unsupervised video learning paradigm. Unlike image-based clustering methods, <span><math><msup><mtext>DVC</mtext><mn>2</mn></msup></math></span> first learns an initial video representation through frame clustering, which serves as guidance, and then aligns video clustering results with both long-term and short-term structures as well as nearest neighbors. We evaluate <span><math><msup><mtext>DVC</mtext><mn>2</mn></msup></math></span> on benchmark datasets, including UCF101 and Kinetics-400, achieving state-of-the-art results. Notably, even in annotation-free scenarios where self-supervised learning with K-means already yields reasonable clustering, <span><math><msup><mtext>DVC</mtext><mn>2</mn></msup></math></span> demonstrates significantly superior performance.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"657 ","pages":"Article 131565"},"PeriodicalIF":6.5000,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"DVC2: Deep video cascade clustering from video structures\",\"authors\":\"Zihua Wang , Siya Mi , Yu Zhang\",\"doi\":\"10.1016/j.neucom.2025.131565\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Video clustering is a critical unsupervised learning task, where category labels are unavailable, unlike in supervised video classification. The primary challenge is learning meaningful video representations without annotations to effectively group similar videos. Most existing methods extract frame-level features and apply standard clustering algorithms such as K-means, but they often fail to capture temporal relationships inherent in video data. In this paper, we introduce Deep Video Cascade Clustering (<span><math><msup><mtext>DVC</mtext><mn>2</mn></msup></math></span>), a novel unsupervised video learning paradigm. Unlike image-based clustering methods, <span><math><msup><mtext>DVC</mtext><mn>2</mn></msup></math></span> first learns an initial video representation through frame clustering, which serves as guidance, and then aligns video clustering results with both long-term and short-term structures as well as nearest neighbors. We evaluate <span><math><msup><mtext>DVC</mtext><mn>2</mn></msup></math></span> on benchmark datasets, including UCF101 and Kinetics-400, achieving state-of-the-art results. Notably, even in annotation-free scenarios where self-supervised learning with K-means already yields reasonable clustering, <span><math><msup><mtext>DVC</mtext><mn>2</mn></msup></math></span> demonstrates significantly superior performance.</div></div>\",\"PeriodicalId\":19268,\"journal\":{\"name\":\"Neurocomputing\",\"volume\":\"657 \",\"pages\":\"Article 131565\"},\"PeriodicalIF\":6.5000,\"publicationDate\":\"2025-09-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neurocomputing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0925231225022374\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231225022374","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
DVC2: Deep video cascade clustering from video structures
Video clustering is a critical unsupervised learning task, where category labels are unavailable, unlike in supervised video classification. The primary challenge is learning meaningful video representations without annotations to effectively group similar videos. Most existing methods extract frame-level features and apply standard clustering algorithms such as K-means, but they often fail to capture temporal relationships inherent in video data. In this paper, we introduce Deep Video Cascade Clustering (), a novel unsupervised video learning paradigm. Unlike image-based clustering methods, first learns an initial video representation through frame clustering, which serves as guidance, and then aligns video clustering results with both long-term and short-term structures as well as nearest neighbors. We evaluate on benchmark datasets, including UCF101 and Kinetics-400, achieving state-of-the-art results. Notably, even in annotation-free scenarios where self-supervised learning with K-means already yields reasonable clustering, demonstrates significantly superior performance.
期刊介绍:
Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.