基于特征融合的深度卷积神经网络半监督文本分类

Parvaneh Shayegh, Yuefeng Li, Jinglan Zhang, Qing Zhang
{"title":"基于特征融合的深度卷积神经网络半监督文本分类","authors":"Parvaneh Shayegh, Yuefeng Li, Jinglan Zhang, Qing Zhang","doi":"10.1145/3350546.3352548","DOIUrl":null,"url":null,"abstract":"Supervised learning algorithms employ labeled training data for classification purposes while obtaining labeled data for large datasets is costly and time consuming. Semi-supervised learning algorithms, on the contrary, use a small set of labeled data and a large set of unlabeled data to improve predication performance and thus may be a good alternative to supervised learning algorithms for large text datasets. Although many semi-supervised learning algorithms have been proposed in the data science literature, most of these algorithms are not feasible for discrete and unstructured text data.This paper aims to improve classification accuracy of semi-supervised learning algorithms applied to text data. To achieve this goal, a novel design for convolutional neural network is employed in a co-training semi-supervised learning algorithm which adds augmented data as the second input of the convolutional neural network to predict labels of text data. we also propose a novel approach for partitioning the dataset into independent views via topic modeling to train independent classifiers. In so doing, neighbour classifiers are found and confident predictions of unlabeled data are fused into labeled data. The prediction accuracy of the combined algorithm is then compared to the state-of-the-art supervised and semi-supervised learning algorithms. Our findings show that the proposed combined algorithm outperforms the supervised and semi-supervised algorithms in terms of prediction accuracy. CCS CONCEPTS• Information systems → Content analysis and feature selection.","PeriodicalId":171168,"journal":{"name":"2019 IEEE/WIC/ACM International Conference on Web Intelligence (WI)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Semi-supervised text classification with deep convolutional neural network using feature fusion approach\",\"authors\":\"Parvaneh Shayegh, Yuefeng Li, Jinglan Zhang, Qing Zhang\",\"doi\":\"10.1145/3350546.3352548\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Supervised learning algorithms employ labeled training data for classification purposes while obtaining labeled data for large datasets is costly and time consuming. Semi-supervised learning algorithms, on the contrary, use a small set of labeled data and a large set of unlabeled data to improve predication performance and thus may be a good alternative to supervised learning algorithms for large text datasets. Although many semi-supervised learning algorithms have been proposed in the data science literature, most of these algorithms are not feasible for discrete and unstructured text data.This paper aims to improve classification accuracy of semi-supervised learning algorithms applied to text data. To achieve this goal, a novel design for convolutional neural network is employed in a co-training semi-supervised learning algorithm which adds augmented data as the second input of the convolutional neural network to predict labels of text data. we also propose a novel approach for partitioning the dataset into independent views via topic modeling to train independent classifiers. In so doing, neighbour classifiers are found and confident predictions of unlabeled data are fused into labeled data. The prediction accuracy of the combined algorithm is then compared to the state-of-the-art supervised and semi-supervised learning algorithms. Our findings show that the proposed combined algorithm outperforms the supervised and semi-supervised algorithms in terms of prediction accuracy. CCS CONCEPTS• Information systems → Content analysis and feature selection.\",\"PeriodicalId\":171168,\"journal\":{\"name\":\"2019 IEEE/WIC/ACM International Conference on Web Intelligence (WI)\",\"volume\":\"23 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE/WIC/ACM International Conference on Web Intelligence (WI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3350546.3352548\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE/WIC/ACM International Conference on Web Intelligence (WI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3350546.3352548","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

监督学习算法采用标记训练数据进行分类,而获取大型数据集的标记数据成本高且耗时长。相反,半监督学习算法使用小组标记数据和大组未标记数据来提高预测性能,因此可能是大型文本数据集的监督学习算法的良好替代方案。尽管在数据科学文献中提出了许多半监督学习算法,但这些算法中的大多数对于离散和非结构化文本数据是不可行的。本文旨在提高应用于文本数据的半监督学习算法的分类精度。为了实现这一目标,采用一种新颖的卷积神经网络设计,在协同训练半监督学习算法中加入增广数据作为卷积神经网络的第二输入来预测文本数据的标签。我们还提出了一种新的方法,通过主题建模将数据集划分为独立的视图,以训练独立的分类器。在这样做的过程中,邻居分类器被发现,未标记数据的自信预测被融合到标记数据中。然后将组合算法的预测精度与最先进的监督和半监督学习算法进行比较。我们的研究结果表明,所提出的组合算法在预测精度方面优于监督和半监督算法。•信息系统→内容分析和特征选择。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Semi-supervised text classification with deep convolutional neural network using feature fusion approach
Supervised learning algorithms employ labeled training data for classification purposes while obtaining labeled data for large datasets is costly and time consuming. Semi-supervised learning algorithms, on the contrary, use a small set of labeled data and a large set of unlabeled data to improve predication performance and thus may be a good alternative to supervised learning algorithms for large text datasets. Although many semi-supervised learning algorithms have been proposed in the data science literature, most of these algorithms are not feasible for discrete and unstructured text data.This paper aims to improve classification accuracy of semi-supervised learning algorithms applied to text data. To achieve this goal, a novel design for convolutional neural network is employed in a co-training semi-supervised learning algorithm which adds augmented data as the second input of the convolutional neural network to predict labels of text data. we also propose a novel approach for partitioning the dataset into independent views via topic modeling to train independent classifiers. In so doing, neighbour classifiers are found and confident predictions of unlabeled data are fused into labeled data. The prediction accuracy of the combined algorithm is then compared to the state-of-the-art supervised and semi-supervised learning algorithms. Our findings show that the proposed combined algorithm outperforms the supervised and semi-supervised algorithms in terms of prediction accuracy. CCS CONCEPTS• Information systems → Content analysis and feature selection.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信