Generalized Cauchy–Schwarz divergence: Efficient estimation and applications in deep learning

IF 5.5 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neurocomputing Pub Date : 2025-07-19 DOI:10.1016/j.neucom.2025.130904

Mingfei Lu , Shujian Yu , Robert Jenssen , Badong Chen

{"title":"Generalized Cauchy–Schwarz divergence: Efficient estimation and applications in deep learning","authors":"Mingfei Lu , Shujian Yu , Robert Jenssen , Badong Chen","doi":"10.1016/j.neucom.2025.130904","DOIUrl":null,"url":null,"abstract":"<div><div>Divergence measures play a fundamental role in machine learning and deep learning; however, efficient methods for handling multiple distributions (i.e., more than two) remain largely underexplored. This challenge is particularly critical in scenarios where managing multiple distributions simultaneously is both necessary and unavoidable, such as clustering, multi-source domain adaptation, and multi-view learning. A common approach to quantifying overall divergence involves computing the mean pairwise distances between distributions. However, this method suffers from two key limitations. First, it is restricted to pairwise comparisons and fails to capture higher-order interactions or dependencies among three or more distributions. Second, its implementation requires a double-loop traversal over all distribution pairs, leading to significant computational overhead, particularly when dealing with a large number of distributions. In this study, we introduce the generalized Cauchy–Schwarz divergence (GCSD), a novel divergence measure specifically designed for multiple distributions. To facilitate its practical application, we propose a kernel-based closed-form sample estimator, which enables efficient computation in various deep-learning contexts. Furthermore, we validate GCSD through two representative tasks: deep clustering, achieved by maximizing the generalized divergence between clusters, and multi-source domain adaptation, achieved by minimizing the generalized discrepancy among feature distributions. Extensive experimental evaluations highlight the robustness and effectiveness of GCSD in both tasks, underscoring its potential to advance machine learning techniques that require the quantification of multiple distributions.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"652 ","pages":"Article 130904"},"PeriodicalIF":5.5000,"publicationDate":"2025-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231225015760","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Divergence measures play a fundamental role in machine learning and deep learning; however, efficient methods for handling multiple distributions (i.e., more than two) remain largely underexplored. This challenge is particularly critical in scenarios where managing multiple distributions simultaneously is both necessary and unavoidable, such as clustering, multi-source domain adaptation, and multi-view learning. A common approach to quantifying overall divergence involves computing the mean pairwise distances between distributions. However, this method suffers from two key limitations. First, it is restricted to pairwise comparisons and fails to capture higher-order interactions or dependencies among three or more distributions. Second, its implementation requires a double-loop traversal over all distribution pairs, leading to significant computational overhead, particularly when dealing with a large number of distributions. In this study, we introduce the generalized Cauchy–Schwarz divergence (GCSD), a novel divergence measure specifically designed for multiple distributions. To facilitate its practical application, we propose a kernel-based closed-form sample estimator, which enables efficient computation in various deep-learning contexts. Furthermore, we validate GCSD through two representative tasks: deep clustering, achieved by maximizing the generalized divergence between clusters, and multi-source domain adaptation, achieved by minimizing the generalized discrepancy among feature distributions. Extensive experimental evaluations highlight the robustness and effectiveness of GCSD in both tasks, underscoring its potential to advance machine learning techniques that require the quantification of multiple distributions.

查看原文本刊更多论文

广义Cauchy-Schwarz散度：有效估计及其在深度学习中的应用

散度度量在机器学习和深度学习中起着基础性的作用；然而，处理多个分布（即两个以上）的有效方法在很大程度上仍未得到充分开发。在同时管理多个分布既必要又不可避免的情况下，例如聚类、多源域适应和多视图学习，这一挑战尤为关键。量化总体散度的一种常用方法是计算分布之间的平均成对距离。然而，这种方法有两个关键的限制。首先，它仅限于两两比较，无法捕获三个或更多分布之间的高阶交互或依赖关系。其次，它的实现需要在所有分布对上进行双循环遍历，这会导致显著的计算开销，特别是在处理大量分布时。在本研究中，我们引入了广义Cauchy-Schwarz散度（GCSD），这是一种专门为多个分布设计的新型散度度量。为了促进其实际应用，我们提出了一种基于核的封闭形式样本估计器，它可以在各种深度学习环境下进行有效的计算。此外，我们通过两个具有代表性的任务来验证GCSD：通过最大化聚类之间的广义差异来实现深度聚类，以及通过最小化特征分布之间的广义差异来实现多源域自适应。大量的实验评估强调了GCSD在这两项任务中的鲁棒性和有效性，强调了它在需要对多个分布进行量化的机器学习技术方面的潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Neurocomputing 工程技术-计算机：人工智能

CiteScore

13.10

自引率

10.00%

发文量

1382

审稿时长

70 days

期刊介绍： Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.