Generating Synthetic Samples to Improve Small Sample Learning with Mixed Numerical and Categorical Attributes

Yao-San Lin, Wan-Ni Cheng, C. Chen, Der-Chiang Li, Hung-Yu Chen
{"title":"Generating Synthetic Samples to Improve Small Sample Learning with Mixed Numerical and Categorical Attributes","authors":"Yao-San Lin, Wan-Ni Cheng, C. Chen, Der-Chiang Li, Hung-Yu Chen","doi":"10.1109/IIAI-AAI.2019.00121","DOIUrl":null,"url":null,"abstract":"The small data learning issue has existed for over one hundred years (since 1908) when the Student's t-distribution was first developed. Few statistical tools can evaluate a population appropriately if the sample size is too small; small samples can be remedied through virtual sample generation (VSG) methods, which are widely used in industry and machine learning. However, most VSG methods were developed for data having only numerical attributes, very few studies have dealt with nominal attributes and cause domain estimation limitations. Therefore, this paper proposes a method that generates virtual samples based on the discrete degrees of nominal attributes, and then estimates the general population domains by fuzzy membership functions. A backpropagation neural network model and a support vector regression model are used to test the efficiency of the proposed method, while the Wilcoxon-sign test is used to test the difference with raw data sets. The result shows that the proposed method can reduce the mean absolute error and enhance classification accuracy by generating virtual samples that have nominal attributes.","PeriodicalId":136474,"journal":{"name":"2019 8th International Congress on Advanced Applied Informatics (IIAI-AAI)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 8th International Congress on Advanced Applied Informatics (IIAI-AAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IIAI-AAI.2019.00121","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The small data learning issue has existed for over one hundred years (since 1908) when the Student's t-distribution was first developed. Few statistical tools can evaluate a population appropriately if the sample size is too small; small samples can be remedied through virtual sample generation (VSG) methods, which are widely used in industry and machine learning. However, most VSG methods were developed for data having only numerical attributes, very few studies have dealt with nominal attributes and cause domain estimation limitations. Therefore, this paper proposes a method that generates virtual samples based on the discrete degrees of nominal attributes, and then estimates the general population domains by fuzzy membership functions. A backpropagation neural network model and a support vector regression model are used to test the efficiency of the proposed method, while the Wilcoxon-sign test is used to test the difference with raw data sets. The result shows that the proposed method can reduce the mean absolute error and enhance classification accuracy by generating virtual samples that have nominal attributes.
生成合成样本以改进混合数值和分类属性的小样本学习
小数据学习问题已经存在了一百多年(从1908年开始),当时学生的t分布首次被提出。如果样本量太小,很少有统计工具可以适当地评估一个群体;小样本可以通过虚拟样本生成(VSG)方法来弥补,这种方法广泛应用于工业和机器学习。然而,大多数VSG方法都是针对仅具有数值属性的数据开发的,很少有研究涉及标称属性并导致域估计的局限性。因此,本文提出了一种基于名义属性离散度生成虚拟样本的方法,然后利用模糊隶属函数估计一般总体域。使用反向传播神经网络模型和支持向量回归模型来测试所提方法的有效性,并使用Wilcoxon-sign检验与原始数据集的差异。结果表明,该方法通过生成具有名义属性的虚拟样本,降低了平均绝对误差,提高了分类精度。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信