Identification of Spam Based on Dependency Syntax and Convolutional Neural Network

Qing Yu, R. Liu
{"title":"Identification of Spam Based on Dependency Syntax and Convolutional Neural Network","authors":"Qing Yu, R. Liu","doi":"10.1109/CISP-BMEI.2018.8633016","DOIUrl":null,"url":null,"abstract":"Convolution Neural Network (CNN) is an algorithm which is more suitable for classify in images and natural language recognition. For Chinese spam processing identify, this paper proposed a hybrid DDTV-CNN model for short text classification that combines deep dependency trait vectorization (DDTV) with convolutional neural network. Parse the semantics of short texts by dependency parsing, we can get a binary tree, and construct a matrix through arc in a binary tree; then, nonlinear decomposing the matrix to get the eigenvector representation of semantic; finally, divide it into two categories by convolutional neural network. This article uses the performance evaluation index commonly used in the field of text classification and information retrieval to establish a evaluation system of spam identification. The evaluation system is used to evaluate the experimental data obtained from simulation experiments, and use performance evaluation index to evaluate that often used in text classification and domain of information retrieval, we construct evaluation system through it about spam identification; and then use it to evaluate experimental data that acquire from simulation experiment, and choice appropriate kernel functions and its parameters. Through the experiment contrasts, the classifier based on DDTV-CNN is more effective and rapid than traditional.","PeriodicalId":117227,"journal":{"name":"2018 11th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 11th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CISP-BMEI.2018.8633016","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Convolution Neural Network (CNN) is an algorithm which is more suitable for classify in images and natural language recognition. For Chinese spam processing identify, this paper proposed a hybrid DDTV-CNN model for short text classification that combines deep dependency trait vectorization (DDTV) with convolutional neural network. Parse the semantics of short texts by dependency parsing, we can get a binary tree, and construct a matrix through arc in a binary tree; then, nonlinear decomposing the matrix to get the eigenvector representation of semantic; finally, divide it into two categories by convolutional neural network. This article uses the performance evaluation index commonly used in the field of text classification and information retrieval to establish a evaluation system of spam identification. The evaluation system is used to evaluate the experimental data obtained from simulation experiments, and use performance evaluation index to evaluate that often used in text classification and domain of information retrieval, we construct evaluation system through it about spam identification; and then use it to evaluate experimental data that acquire from simulation experiment, and choice appropriate kernel functions and its parameters. Through the experiment contrasts, the classifier based on DDTV-CNN is more effective and rapid than traditional.
基于依赖句法和卷积神经网络的垃圾邮件识别
卷积神经网络(CNN)是一种更适合于图像分类和自然语言识别的算法。针对中文垃圾邮件处理识别问题,提出了一种深度依赖特征矢量化(DDTV)与卷积神经网络相结合的DDTV- cnn混合短文本分类模型。对短文本的语义进行依赖解析,得到二叉树,并通过二叉树中的圆弧构造矩阵;然后,对矩阵进行非线性分解,得到语义特征向量表示;最后,利用卷积神经网络将其分为两类。本文采用文本分类和信息检索领域常用的性能评价指标,建立了垃圾邮件识别的评价体系。该评价体系用于对仿真实验得到的实验数据进行评价,并采用性能评价指标对文本分类和信息检索领域中常用的评价指标进行评价,通过它构建了垃圾邮件识别的评价体系;然后用它对仿真实验得到的实验数据进行评价,选择合适的核函数及其参数。通过实验对比,基于DDTV-CNN的分类器比传统的分类器更有效、快速。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信