{"title":"基于深度学习的高维有限标签数据高斯数据增强技术","authors":"J. Rochac, L. Liang, N. Zhang, T. Oladunni","doi":"10.1109/ICICIP47338.2019.9012197","DOIUrl":null,"url":null,"abstract":"In recent years, using oceans of data and virtually infinite cloud-based computation power, deep learning models leverage the current state-of-the-art classification to reach expert level performance. Researchers continue to explore applications of deep machine learning models ranging from face-, text- and voice-recognition to signal and information processing. With the continuously increasing data collection capabilities, datasets are becoming larger and more dimensional. However, manually labeled data points cannot keep up. It is this disparity between the high number of features and the low number of labeled samples what motivates a new approach to integrate feature reduction and sample augmentation to deep learning classifiers. This paper explores the performance of such approach on three deep learning classifiers: MLP, CNN, and LSTM. First, we establish a baseline using the original dataset. Second, we preprocess the dataset using principal component analysis (PCA). Third, we preprocess the dataset with PCA followed by our Gaussian data augmentation (GDA) technique. To estimate performance, we add k-fold cross-validation to our experiments and compile our results in a numerical and graphical using the confusion matrix and a classification report that includes accuracy, recall, f-score and support. Our experiments suggest superior classification accuracy of all three classifiers in the presence of our PCA+GDA approach.","PeriodicalId":431872,"journal":{"name":"2019 Tenth International Conference on Intelligent Control and Information Processing (ICICIP)","volume":"157 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"A Gaussian Data Augmentation Technique on Highly Dimensional, Limited Labeled Data for Multiclass Classification Using Deep Learning\",\"authors\":\"J. Rochac, L. Liang, N. Zhang, T. Oladunni\",\"doi\":\"10.1109/ICICIP47338.2019.9012197\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In recent years, using oceans of data and virtually infinite cloud-based computation power, deep learning models leverage the current state-of-the-art classification to reach expert level performance. Researchers continue to explore applications of deep machine learning models ranging from face-, text- and voice-recognition to signal and information processing. With the continuously increasing data collection capabilities, datasets are becoming larger and more dimensional. However, manually labeled data points cannot keep up. It is this disparity between the high number of features and the low number of labeled samples what motivates a new approach to integrate feature reduction and sample augmentation to deep learning classifiers. This paper explores the performance of such approach on three deep learning classifiers: MLP, CNN, and LSTM. First, we establish a baseline using the original dataset. Second, we preprocess the dataset using principal component analysis (PCA). Third, we preprocess the dataset with PCA followed by our Gaussian data augmentation (GDA) technique. To estimate performance, we add k-fold cross-validation to our experiments and compile our results in a numerical and graphical using the confusion matrix and a classification report that includes accuracy, recall, f-score and support. Our experiments suggest superior classification accuracy of all three classifiers in the presence of our PCA+GDA approach.\",\"PeriodicalId\":431872,\"journal\":{\"name\":\"2019 Tenth International Conference on Intelligent Control and Information Processing (ICICIP)\",\"volume\":\"157 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 Tenth International Conference on Intelligent Control and Information Processing (ICICIP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICICIP47338.2019.9012197\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 Tenth International Conference on Intelligent Control and Information Processing (ICICIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICICIP47338.2019.9012197","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Gaussian Data Augmentation Technique on Highly Dimensional, Limited Labeled Data for Multiclass Classification Using Deep Learning
In recent years, using oceans of data and virtually infinite cloud-based computation power, deep learning models leverage the current state-of-the-art classification to reach expert level performance. Researchers continue to explore applications of deep machine learning models ranging from face-, text- and voice-recognition to signal and information processing. With the continuously increasing data collection capabilities, datasets are becoming larger and more dimensional. However, manually labeled data points cannot keep up. It is this disparity between the high number of features and the low number of labeled samples what motivates a new approach to integrate feature reduction and sample augmentation to deep learning classifiers. This paper explores the performance of such approach on three deep learning classifiers: MLP, CNN, and LSTM. First, we establish a baseline using the original dataset. Second, we preprocess the dataset using principal component analysis (PCA). Third, we preprocess the dataset with PCA followed by our Gaussian data augmentation (GDA) technique. To estimate performance, we add k-fold cross-validation to our experiments and compile our results in a numerical and graphical using the confusion matrix and a classification report that includes accuracy, recall, f-score and support. Our experiments suggest superior classification accuracy of all three classifiers in the presence of our PCA+GDA approach.