{"title":"基于特征融合的深度卷积神经网络半监督文本分类","authors":"Parvaneh Shayegh, Yuefeng Li, Jinglan Zhang, Qing Zhang","doi":"10.1145/3350546.3352548","DOIUrl":null,"url":null,"abstract":"Supervised learning algorithms employ labeled training data for classification purposes while obtaining labeled data for large datasets is costly and time consuming. Semi-supervised learning algorithms, on the contrary, use a small set of labeled data and a large set of unlabeled data to improve predication performance and thus may be a good alternative to supervised learning algorithms for large text datasets. Although many semi-supervised learning algorithms have been proposed in the data science literature, most of these algorithms are not feasible for discrete and unstructured text data.This paper aims to improve classification accuracy of semi-supervised learning algorithms applied to text data. To achieve this goal, a novel design for convolutional neural network is employed in a co-training semi-supervised learning algorithm which adds augmented data as the second input of the convolutional neural network to predict labels of text data. we also propose a novel approach for partitioning the dataset into independent views via topic modeling to train independent classifiers. In so doing, neighbour classifiers are found and confident predictions of unlabeled data are fused into labeled data. The prediction accuracy of the combined algorithm is then compared to the state-of-the-art supervised and semi-supervised learning algorithms. Our findings show that the proposed combined algorithm outperforms the supervised and semi-supervised algorithms in terms of prediction accuracy. CCS CONCEPTS• Information systems → Content analysis and feature selection.","PeriodicalId":171168,"journal":{"name":"2019 IEEE/WIC/ACM International Conference on Web Intelligence (WI)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Semi-supervised text classification with deep convolutional neural network using feature fusion approach\",\"authors\":\"Parvaneh Shayegh, Yuefeng Li, Jinglan Zhang, Qing Zhang\",\"doi\":\"10.1145/3350546.3352548\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Supervised learning algorithms employ labeled training data for classification purposes while obtaining labeled data for large datasets is costly and time consuming. Semi-supervised learning algorithms, on the contrary, use a small set of labeled data and a large set of unlabeled data to improve predication performance and thus may be a good alternative to supervised learning algorithms for large text datasets. Although many semi-supervised learning algorithms have been proposed in the data science literature, most of these algorithms are not feasible for discrete and unstructured text data.This paper aims to improve classification accuracy of semi-supervised learning algorithms applied to text data. To achieve this goal, a novel design for convolutional neural network is employed in a co-training semi-supervised learning algorithm which adds augmented data as the second input of the convolutional neural network to predict labels of text data. we also propose a novel approach for partitioning the dataset into independent views via topic modeling to train independent classifiers. In so doing, neighbour classifiers are found and confident predictions of unlabeled data are fused into labeled data. The prediction accuracy of the combined algorithm is then compared to the state-of-the-art supervised and semi-supervised learning algorithms. Our findings show that the proposed combined algorithm outperforms the supervised and semi-supervised algorithms in terms of prediction accuracy. CCS CONCEPTS• Information systems → Content analysis and feature selection.\",\"PeriodicalId\":171168,\"journal\":{\"name\":\"2019 IEEE/WIC/ACM International Conference on Web Intelligence (WI)\",\"volume\":\"23 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE/WIC/ACM International Conference on Web Intelligence (WI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3350546.3352548\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE/WIC/ACM International Conference on Web Intelligence (WI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3350546.3352548","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Semi-supervised text classification with deep convolutional neural network using feature fusion approach
Supervised learning algorithms employ labeled training data for classification purposes while obtaining labeled data for large datasets is costly and time consuming. Semi-supervised learning algorithms, on the contrary, use a small set of labeled data and a large set of unlabeled data to improve predication performance and thus may be a good alternative to supervised learning algorithms for large text datasets. Although many semi-supervised learning algorithms have been proposed in the data science literature, most of these algorithms are not feasible for discrete and unstructured text data.This paper aims to improve classification accuracy of semi-supervised learning algorithms applied to text data. To achieve this goal, a novel design for convolutional neural network is employed in a co-training semi-supervised learning algorithm which adds augmented data as the second input of the convolutional neural network to predict labels of text data. we also propose a novel approach for partitioning the dataset into independent views via topic modeling to train independent classifiers. In so doing, neighbour classifiers are found and confident predictions of unlabeled data are fused into labeled data. The prediction accuracy of the combined algorithm is then compared to the state-of-the-art supervised and semi-supervised learning algorithms. Our findings show that the proposed combined algorithm outperforms the supervised and semi-supervised algorithms in terms of prediction accuracy. CCS CONCEPTS• Information systems → Content analysis and feature selection.