Identification of Spam Based on Dependency Syntax and Convolutional Neural Network

2018 11th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI) Pub Date : 2018-10-01 DOI:10.1109/CISP-BMEI.2018.8633016

Qing Yu, R. Liu

{"title":"Identification of Spam Based on Dependency Syntax and Convolutional Neural Network","authors":"Qing Yu, R. Liu","doi":"10.1109/CISP-BMEI.2018.8633016","DOIUrl":null,"url":null,"abstract":"Convolution Neural Network (CNN) is an algorithm which is more suitable for classify in images and natural language recognition. For Chinese spam processing identify, this paper proposed a hybrid DDTV-CNN model for short text classification that combines deep dependency trait vectorization (DDTV) with convolutional neural network. Parse the semantics of short texts by dependency parsing, we can get a binary tree, and construct a matrix through arc in a binary tree; then, nonlinear decomposing the matrix to get the eigenvector representation of semantic; finally, divide it into two categories by convolutional neural network. This article uses the performance evaluation index commonly used in the field of text classification and information retrieval to establish a evaluation system of spam identification. The evaluation system is used to evaluate the experimental data obtained from simulation experiments, and use performance evaluation index to evaluate that often used in text classification and domain of information retrieval, we construct evaluation system through it about spam identification; and then use it to evaluate experimental data that acquire from simulation experiment, and choice appropriate kernel functions and its parameters. Through the experiment contrasts, the classifier based on DDTV-CNN is more effective and rapid than traditional.","PeriodicalId":117227,"journal":{"name":"2018 11th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 11th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CISP-BMEI.2018.8633016","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Convolution Neural Network (CNN) is an algorithm which is more suitable for classify in images and natural language recognition. For Chinese spam processing identify, this paper proposed a hybrid DDTV-CNN model for short text classification that combines deep dependency trait vectorization (DDTV) with convolutional neural network. Parse the semantics of short texts by dependency parsing, we can get a binary tree, and construct a matrix through arc in a binary tree; then, nonlinear decomposing the matrix to get the eigenvector representation of semantic; finally, divide it into two categories by convolutional neural network. This article uses the performance evaluation index commonly used in the field of text classification and information retrieval to establish a evaluation system of spam identification. The evaluation system is used to evaluate the experimental data obtained from simulation experiments, and use performance evaluation index to evaluate that often used in text classification and domain of information retrieval, we construct evaluation system through it about spam identification; and then use it to evaluate experimental data that acquire from simulation experiment, and choice appropriate kernel functions and its parameters. Through the experiment contrasts, the classifier based on DDTV-CNN is more effective and rapid than traditional.

查看原文本刊更多论文

基于依赖句法和卷积神经网络的垃圾邮件识别

卷积神经网络(CNN)是一种更适合于图像分类和自然语言识别的算法。针对中文垃圾邮件处理识别问题，提出了一种深度依赖特征矢量化(DDTV)与卷积神经网络相结合的DDTV- cnn混合短文本分类模型。对短文本的语义进行依赖解析，得到二叉树，并通过二叉树中的圆弧构造矩阵;然后，对矩阵进行非线性分解，得到语义特征向量表示;最后，利用卷积神经网络将其分为两类。本文采用文本分类和信息检索领域常用的性能评价指标，建立了垃圾邮件识别的评价体系。该评价体系用于对仿真实验得到的实验数据进行评价，并采用性能评价指标对文本分类和信息检索领域中常用的评价指标进行评价，通过它构建了垃圾邮件识别的评价体系;然后用它对仿真实验得到的实验数据进行评价，选择合适的核函数及其参数。通过实验对比，基于DDTV-CNN的分类器比传统的分类器更有效、快速。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2018 11th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI)

自引率

0.00%

发文量