A Gaussian Data Augmentation Technique on Highly Dimensional, Limited Labeled Data for Multiclass Classification Using Deep Learning

2019 Tenth International Conference on Intelligent Control and Information Processing (ICICIP) Pub Date : 2019-12-01 DOI:10.1109/ICICIP47338.2019.9012197

J. Rochac, L. Liang, N. Zhang, T. Oladunni

{"title":"A Gaussian Data Augmentation Technique on Highly Dimensional, Limited Labeled Data for Multiclass Classification Using Deep Learning","authors":"J. Rochac, L. Liang, N. Zhang, T. Oladunni","doi":"10.1109/ICICIP47338.2019.9012197","DOIUrl":null,"url":null,"abstract":"In recent years, using oceans of data and virtually infinite cloud-based computation power, deep learning models leverage the current state-of-the-art classification to reach expert level performance. Researchers continue to explore applications of deep machine learning models ranging from face-, text- and voice-recognition to signal and information processing. With the continuously increasing data collection capabilities, datasets are becoming larger and more dimensional. However, manually labeled data points cannot keep up. It is this disparity between the high number of features and the low number of labeled samples what motivates a new approach to integrate feature reduction and sample augmentation to deep learning classifiers. This paper explores the performance of such approach on three deep learning classifiers: MLP, CNN, and LSTM. First, we establish a baseline using the original dataset. Second, we preprocess the dataset using principal component analysis (PCA). Third, we preprocess the dataset with PCA followed by our Gaussian data augmentation (GDA) technique. To estimate performance, we add k-fold cross-validation to our experiments and compile our results in a numerical and graphical using the confusion matrix and a classification report that includes accuracy, recall, f-score and support. Our experiments suggest superior classification accuracy of all three classifiers in the presence of our PCA+GDA approach.","PeriodicalId":431872,"journal":{"name":"2019 Tenth International Conference on Intelligent Control and Information Processing (ICICIP)","volume":"157 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 Tenth International Conference on Intelligent Control and Information Processing (ICICIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICICIP47338.2019.9012197","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

In recent years, using oceans of data and virtually infinite cloud-based computation power, deep learning models leverage the current state-of-the-art classification to reach expert level performance. Researchers continue to explore applications of deep machine learning models ranging from face-, text- and voice-recognition to signal and information processing. With the continuously increasing data collection capabilities, datasets are becoming larger and more dimensional. However, manually labeled data points cannot keep up. It is this disparity between the high number of features and the low number of labeled samples what motivates a new approach to integrate feature reduction and sample augmentation to deep learning classifiers. This paper explores the performance of such approach on three deep learning classifiers: MLP, CNN, and LSTM. First, we establish a baseline using the original dataset. Second, we preprocess the dataset using principal component analysis (PCA). Third, we preprocess the dataset with PCA followed by our Gaussian data augmentation (GDA) technique. To estimate performance, we add k-fold cross-validation to our experiments and compile our results in a numerical and graphical using the confusion matrix and a classification report that includes accuracy, recall, f-score and support. Our experiments suggest superior classification accuracy of all three classifiers in the presence of our PCA+GDA approach.

查看原文本刊更多论文

基于深度学习的高维有限标签数据高斯数据增强技术

近年来，深度学习模型利用海量数据和几乎无限的云计算能力，利用当前最先进的分类技术达到专家级别的性能。研究人员继续探索深度机器学习模型的应用，从面部、文本和语音识别到信号和信息处理。随着数据收集能力的不断提高，数据集变得越来越大，维度越来越高。然而，手动标记的数据点无法跟上。正是这种高数量特征和低数量标记样本之间的差异，激发了一种将特征约简和样本增强集成到深度学习分类器中的新方法。本文探讨了这种方法在三个深度学习分类器:MLP、CNN和LSTM上的性能。首先，我们使用原始数据集建立基线。其次，使用主成分分析(PCA)对数据集进行预处理。第三，我们使用主成分分析法对数据集进行预处理，然后使用高斯数据增强(GDA)技术。为了评估性能，我们在实验中添加了k-fold交叉验证，并使用混淆矩阵和分类报告(包括准确性、召回率、f分数和支持度)以数字和图形形式编译我们的结果。我们的实验表明，在我们的PCA+GDA方法存在的情况下，所有三种分类器的分类精度都很高。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 Tenth International Conference on Intelligent Control and Information Processing (ICICIP)

自引率

0.00%

发文量