Ponnuthurai Nagaratnam Suganthan , Lingping Kong , Václav Snášel , Varun Ojha , Hussein Ahmed Hussein Zaky Aly
{"title":"欧几里得和庞加莱空间集合 Xgboost","authors":"Ponnuthurai Nagaratnam Suganthan , Lingping Kong , Václav Snášel , Varun Ojha , Hussein Ahmed Hussein Zaky Aly","doi":"10.1016/j.inffus.2024.102746","DOIUrl":null,"url":null,"abstract":"<div><div>The Hyperbolic space has garnered attention for its unique properties and efficient representation of hierarchical structures. Recent studies have explored hyperbolic alternatives to hyperplane-based classifiers, such as logistic regression and support vector machines. Hyperbolic methods have even been fused into random forests by constructing data splits with horosphere, which proved effective for hyperbolic datasets. However, the existing incorporation of the horosphere leads to substantial computation time, diverting attention from its application on most datasets. Against this backdrop, we introduce an extension of Xgboost, a renowned machine learning (ML) algorithm to hyperbolic space, denoted as PXgboost. This extension involves a redefinition of the node split concept using the Riemannian gradient and Riemannian Hessian. Our findings unveil the promising performance of PXgboost compared to the algorithms in the literature through comprehensive experiments conducted on 64 datasets from the UCI ML repository and 8 datasets from WordNet by fusing both their Euclidean and hyperbolic-transformed (hyperbolic UCI) representations. Furthermore, our findings suggest that the Euclidean metric-based classifier performs well even on hyperbolic data. Building upon the above finding, we propose a space fusion classifier called, EPboost. It harmonizes data processing across various spaces and integrates probability outcomes for predictive analysis. In our comparative analysis involving 19 algorithms on the UCI dataset, our EPboost outperforms others in most cases, underscoring its efficacy and potential significance in diverse ML applications. This research marks a step forward in harnessing hyperbolic geometry for ML tasks and showcases its potential to enhance algorithmic efficacy.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"115 ","pages":"Article 102746"},"PeriodicalIF":14.7000,"publicationDate":"2024-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Euclidean and Poincaré space ensemble Xgboost\",\"authors\":\"Ponnuthurai Nagaratnam Suganthan , Lingping Kong , Václav Snášel , Varun Ojha , Hussein Ahmed Hussein Zaky Aly\",\"doi\":\"10.1016/j.inffus.2024.102746\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The Hyperbolic space has garnered attention for its unique properties and efficient representation of hierarchical structures. Recent studies have explored hyperbolic alternatives to hyperplane-based classifiers, such as logistic regression and support vector machines. Hyperbolic methods have even been fused into random forests by constructing data splits with horosphere, which proved effective for hyperbolic datasets. However, the existing incorporation of the horosphere leads to substantial computation time, diverting attention from its application on most datasets. Against this backdrop, we introduce an extension of Xgboost, a renowned machine learning (ML) algorithm to hyperbolic space, denoted as PXgboost. This extension involves a redefinition of the node split concept using the Riemannian gradient and Riemannian Hessian. Our findings unveil the promising performance of PXgboost compared to the algorithms in the literature through comprehensive experiments conducted on 64 datasets from the UCI ML repository and 8 datasets from WordNet by fusing both their Euclidean and hyperbolic-transformed (hyperbolic UCI) representations. Furthermore, our findings suggest that the Euclidean metric-based classifier performs well even on hyperbolic data. Building upon the above finding, we propose a space fusion classifier called, EPboost. It harmonizes data processing across various spaces and integrates probability outcomes for predictive analysis. In our comparative analysis involving 19 algorithms on the UCI dataset, our EPboost outperforms others in most cases, underscoring its efficacy and potential significance in diverse ML applications. This research marks a step forward in harnessing hyperbolic geometry for ML tasks and showcases its potential to enhance algorithmic efficacy.</div></div>\",\"PeriodicalId\":50367,\"journal\":{\"name\":\"Information Fusion\",\"volume\":\"115 \",\"pages\":\"Article 102746\"},\"PeriodicalIF\":14.7000,\"publicationDate\":\"2024-10-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Fusion\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1566253524005244\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Fusion","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1566253524005244","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
摘要
双曲空间因其独特的特性和对层次结构的高效表示而备受关注。最近的研究探索了基于双曲的分类器,如逻辑回归和支持向量机。双曲方法甚至被融合到随机森林中,方法是用角圈构建数据分割,这被证明对双曲数据集很有效。然而,现有的双曲法加入水平层会导致大量的计算时间,从而分散了人们对其在大多数数据集上应用的关注。在此背景下,我们将著名的机器学习(ML)算法 Xgboost 扩展到双曲空间,称为 PXgboost。这一扩展涉及使用黎曼梯度和黎曼赫塞斯重新定义节点分割概念。通过对 UCI ML 数据库中的 64 个数据集和 WordNet 中的 8 个数据集进行融合欧几里得和双曲变换(双曲 UCI)表示,我们的研究结果揭示了 PXgboost 与文献中的算法相比具有良好的性能。此外,我们的研究结果表明,基于欧氏度量的分类器即使在双曲数据上也表现良好。基于上述发现,我们提出了一种名为 EPboost 的空间融合分类器。它协调了不同空间的数据处理,并整合了用于预测分析的概率结果。我们在 UCI 数据集上对 19 种算法进行了比较分析,在大多数情况下,我们的 EPboost 都优于其他算法,这突出表明了它在各种 ML 应用中的功效和潜在意义。这项研究标志着在利用双曲几何完成 ML 任务方面向前迈进了一步,并展示了其提高算法效率的潜力。
The Hyperbolic space has garnered attention for its unique properties and efficient representation of hierarchical structures. Recent studies have explored hyperbolic alternatives to hyperplane-based classifiers, such as logistic regression and support vector machines. Hyperbolic methods have even been fused into random forests by constructing data splits with horosphere, which proved effective for hyperbolic datasets. However, the existing incorporation of the horosphere leads to substantial computation time, diverting attention from its application on most datasets. Against this backdrop, we introduce an extension of Xgboost, a renowned machine learning (ML) algorithm to hyperbolic space, denoted as PXgboost. This extension involves a redefinition of the node split concept using the Riemannian gradient and Riemannian Hessian. Our findings unveil the promising performance of PXgboost compared to the algorithms in the literature through comprehensive experiments conducted on 64 datasets from the UCI ML repository and 8 datasets from WordNet by fusing both their Euclidean and hyperbolic-transformed (hyperbolic UCI) representations. Furthermore, our findings suggest that the Euclidean metric-based classifier performs well even on hyperbolic data. Building upon the above finding, we propose a space fusion classifier called, EPboost. It harmonizes data processing across various spaces and integrates probability outcomes for predictive analysis. In our comparative analysis involving 19 algorithms on the UCI dataset, our EPboost outperforms others in most cases, underscoring its efficacy and potential significance in diverse ML applications. This research marks a step forward in harnessing hyperbolic geometry for ML tasks and showcases its potential to enhance algorithmic efficacy.
期刊介绍:
Information Fusion serves as a central platform for showcasing advancements in multi-sensor, multi-source, multi-process information fusion, fostering collaboration among diverse disciplines driving its progress. It is the leading outlet for sharing research and development in this field, focusing on architectures, algorithms, and applications. Papers dealing with fundamental theoretical analyses as well as those demonstrating their application to real-world problems will be welcome.