基于Wasserstein距离的不平衡数据分类代价敏感框架

IF 0.5 4区 工程技术 Q4 ENGINEERING, ELECTRICAL & ELECTRONIC
R. Feng, H. Ji, Z. Zhu, L. Wang
{"title":"基于Wasserstein距离的不平衡数据分类代价敏感框架","authors":"R. Feng, H. Ji, Z. Zhu, L. Wang","doi":"10.13164/re.2023.0451","DOIUrl":null,"url":null,"abstract":". Class imbalance is a prevalent problem in many real-world applications, and imbalanced data distribution can dramatically skew the performance of classifiers. In general, the higher the imbalance ratio of a dataset, the more difficult it is to classify. However, it is found that standard classifiers can still achieve good classification results on some highly imbalanced datasets. Obviously, the class imbalance is only a superficial characteristic of the data, and the underlying structural information is often the key factor affecting the classification performance. As implicit prior knowledge, structural information has been validated to be crucial for designing a good classifier. This paper proposes a Wasserstein-based cost-sensitive support vector machine (CS-WSVM) for class imbalance learning, incorporating prior structural information and a cost-sensitive strategy. The Wasserstein distance is introduced to model the distribution of majority and minority samples to capture the structural information, which is employed to weight the majority and minority samples. Comprehensive experiments on synthetic and real-world datasets, especially on the radar emitter signal dataset, demonstrated that CS-WSVM can achieve outstanding performance in imbalanced scenarios.","PeriodicalId":54514,"journal":{"name":"Radioengineering","volume":null,"pages":null},"PeriodicalIF":0.5000,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Wasserstein Distance-Based Cost-Sensitive Framework for Imbalanced Data Classification\",\"authors\":\"R. Feng, H. Ji, Z. Zhu, L. Wang\",\"doi\":\"10.13164/re.2023.0451\",\"DOIUrl\":null,\"url\":null,\"abstract\":\". Class imbalance is a prevalent problem in many real-world applications, and imbalanced data distribution can dramatically skew the performance of classifiers. In general, the higher the imbalance ratio of a dataset, the more difficult it is to classify. However, it is found that standard classifiers can still achieve good classification results on some highly imbalanced datasets. Obviously, the class imbalance is only a superficial characteristic of the data, and the underlying structural information is often the key factor affecting the classification performance. As implicit prior knowledge, structural information has been validated to be crucial for designing a good classifier. This paper proposes a Wasserstein-based cost-sensitive support vector machine (CS-WSVM) for class imbalance learning, incorporating prior structural information and a cost-sensitive strategy. The Wasserstein distance is introduced to model the distribution of majority and minority samples to capture the structural information, which is employed to weight the majority and minority samples. Comprehensive experiments on synthetic and real-world datasets, especially on the radar emitter signal dataset, demonstrated that CS-WSVM can achieve outstanding performance in imbalanced scenarios.\",\"PeriodicalId\":54514,\"journal\":{\"name\":\"Radioengineering\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.5000,\"publicationDate\":\"2023-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Radioengineering\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://doi.org/10.13164/re.2023.0451\",\"RegionNum\":4,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Radioengineering","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.13164/re.2023.0451","RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

摘要

类不平衡是许多现实应用中普遍存在的问题,不平衡的数据分布会极大地扭曲分类器的性能。一般来说,数据集的不平衡率越高,分类就越困难。然而,研究发现,在一些高度不平衡的数据集上,标准分类器仍然可以获得良好的分类结果。显然,类不平衡只是数据的一个表面特征,底层的结构信息往往是影响分类性能的关键因素。作为隐含的先验知识,结构信息已被证明对设计一个好的分类器至关重要。本文提出了一种基于Wasserstein的成本敏感支持向量机(CS-WSVM),用于类不平衡学习,结合了先验结构信息和成本敏感策略。引入Wasserstein距离对多数样本和少数样本的分布进行建模,以获取结构信息,并对多数和少数样本进行加权。在合成数据集和真实世界数据集上,特别是在雷达辐射源信号数据集上进行的综合实验表明,CS-WSVM在不平衡场景中可以获得出色的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A Wasserstein Distance-Based Cost-Sensitive Framework for Imbalanced Data Classification
. Class imbalance is a prevalent problem in many real-world applications, and imbalanced data distribution can dramatically skew the performance of classifiers. In general, the higher the imbalance ratio of a dataset, the more difficult it is to classify. However, it is found that standard classifiers can still achieve good classification results on some highly imbalanced datasets. Obviously, the class imbalance is only a superficial characteristic of the data, and the underlying structural information is often the key factor affecting the classification performance. As implicit prior knowledge, structural information has been validated to be crucial for designing a good classifier. This paper proposes a Wasserstein-based cost-sensitive support vector machine (CS-WSVM) for class imbalance learning, incorporating prior structural information and a cost-sensitive strategy. The Wasserstein distance is introduced to model the distribution of majority and minority samples to capture the structural information, which is employed to weight the majority and minority samples. Comprehensive experiments on synthetic and real-world datasets, especially on the radar emitter signal dataset, demonstrated that CS-WSVM can achieve outstanding performance in imbalanced scenarios.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Radioengineering
Radioengineering 工程技术-工程:电子与电气
CiteScore
2.00
自引率
9.10%
发文量
0
审稿时长
5.7 months
期刊介绍: Since 1992, the Radioengineering Journal has been publishing original scientific and engineering papers from the area of wireless communication and application of wireless technologies. The submitted papers are expected to deal with electromagnetics (antennas, propagation, microwaves), signals, circuits, optics and related fields. Each issue of the Radioengineering Journal is started by a feature article. Feature articles are organized by members of the Editorial Board to present the latest development in the selected areas of radio engineering. The Radioengineering Journal makes a maximum effort to publish submitted papers as quickly as possible. The first round of reviews should be completed within two months. Then, authors are expected to improve their manuscript within one month. If substantial changes are recommended and further reviews are requested by the reviewers, the publication time is prolonged.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信