使用聚类和分类检测趋势主题中的恶意推文

2014 International Conference on Recent Trends in Information Technology Pub Date : 2014-04-10 DOI:10.1109/ICRTIT.2014.6996188

Saini Jacob, Soman Research, Murugappan

{"title":"使用聚类和分类检测趋势主题中的恶意推文","authors":"Saini Jacob, Soman Research, Murugappan","doi":"10.1109/ICRTIT.2014.6996188","DOIUrl":null,"url":null,"abstract":"Detection of spam Twitter social networks is one of the significant research areas to discover unauthorized user accounts. A number of research works have been carried out to solve these issues but most of the existing techniques had not focused on various features and doesn't group similar user trending topics which become their major limitation. Trending topics collects the current Internet trends and topics of argument of each and every user. In order to overcome the problem of feature extraction,this work initially extracts many features such as user profile features, user activity features, location based features and text and content features. Then the extracted text features use Jenson-Shannon Divergence (JSD) measure to characterize each labeled tweet using natural language models. Different features are extracted from collected trending topics data in twitter. After features are extracted, clusters are formed to group similar trending topics of tweet user profile. Fuzzy K-means (FKM) algorithm primarily cluster the similar user profiles with same trending topics of tweet and centers are determined to similar user profiles with same trending topics of tweet from fuzzy membership function. Moreover, Extreme learning machine (ELM) algorithm is applied to analyze the growing characteristics of spam with similar topics in twitter from clustering result and acquire necessary knowledge in the detection of spam. The results are evaluated with F-measure, True Positive Rate (TPR), False Positive Rate (FPR) and Classification Accuracy with improved detection results.","PeriodicalId":422275,"journal":{"name":"2014 International Conference on Recent Trends in Information Technology","volume":"144 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"23","resultStr":"{\"title\":\"Detecting malicious tweets in trending topics using clustering and classification\",\"authors\":\"Saini Jacob, Soman Research, Murugappan\",\"doi\":\"10.1109/ICRTIT.2014.6996188\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Detection of spam Twitter social networks is one of the significant research areas to discover unauthorized user accounts. A number of research works have been carried out to solve these issues but most of the existing techniques had not focused on various features and doesn't group similar user trending topics which become their major limitation. Trending topics collects the current Internet trends and topics of argument of each and every user. In order to overcome the problem of feature extraction,this work initially extracts many features such as user profile features, user activity features, location based features and text and content features. Then the extracted text features use Jenson-Shannon Divergence (JSD) measure to characterize each labeled tweet using natural language models. Different features are extracted from collected trending topics data in twitter. After features are extracted, clusters are formed to group similar trending topics of tweet user profile. Fuzzy K-means (FKM) algorithm primarily cluster the similar user profiles with same trending topics of tweet and centers are determined to similar user profiles with same trending topics of tweet from fuzzy membership function. Moreover, Extreme learning machine (ELM) algorithm is applied to analyze the growing characteristics of spam with similar topics in twitter from clustering result and acquire necessary knowledge in the detection of spam. The results are evaluated with F-measure, True Positive Rate (TPR), False Positive Rate (FPR) and Classification Accuracy with improved detection results.\",\"PeriodicalId\":422275,\"journal\":{\"name\":\"2014 International Conference on Recent Trends in Information Technology\",\"volume\":\"144 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-04-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"23\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 International Conference on Recent Trends in Information Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICRTIT.2014.6996188\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 International Conference on Recent Trends in Information Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICRTIT.2014.6996188","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 23

摘要

垃圾Twitter社交网络的检测是发现未经授权用户账户的重要研究领域之一。为了解决这些问题，已经进行了大量的研究工作，但现有的技术大多没有关注各种特征，也没有对类似的用户趋势话题进行分组，这是它们的主要局限性。热门话题收集了当前的互联网趋势和每个用户的争论话题。为了克服特征提取问题，本工作初步提取了许多特征，如用户档案特征、用户活动特征、基于位置的特征以及文本和内容特征。然后，提取的文本特征使用JSD(詹森-香农散度)度量，使用自然语言模型对每个标记的tweet进行表征。从twitter中收集的趋势主题数据中提取不同的特征。提取特征后，形成聚类，对推特用户简介中相似的趋势话题进行分组。模糊K-means (FKM)算法主要对具有相同推文趋势主题的相似用户画像进行聚类，并通过模糊隶属函数确定中心为具有相同推文趋势主题的相似用户画像。利用极限学习机(Extreme learning machine, ELM)算法从聚类结果中分析twitter中主题相似的垃圾邮件的增长特征，获得垃圾邮件检测所需的知识。结果用f -测度、真阳性率(TPR)、假阳性率(FPR)和分类准确率进行评价，检测结果有所改善。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Detecting malicious tweets in trending topics using clustering and classification

Detection of spam Twitter social networks is one of the significant research areas to discover unauthorized user accounts. A number of research works have been carried out to solve these issues but most of the existing techniques had not focused on various features and doesn't group similar user trending topics which become their major limitation. Trending topics collects the current Internet trends and topics of argument of each and every user. In order to overcome the problem of feature extraction,this work initially extracts many features such as user profile features, user activity features, location based features and text and content features. Then the extracted text features use Jenson-Shannon Divergence (JSD) measure to characterize each labeled tweet using natural language models. Different features are extracted from collected trending topics data in twitter. After features are extracted, clusters are formed to group similar trending topics of tweet user profile. Fuzzy K-means (FKM) algorithm primarily cluster the similar user profiles with same trending topics of tweet and centers are determined to similar user profiles with same trending topics of tweet from fuzzy membership function. Moreover, Extreme learning machine (ELM) algorithm is applied to analyze the growing characteristics of spam with similar topics in twitter from clustering result and acquire necessary knowledge in the detection of spam. The results are evaluated with F-measure, True Positive Rate (TPR), False Positive Rate (FPR) and Classification Accuracy with improved detection results.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2014 International Conference on Recent Trends in Information Technology

自引率

0.00%

发文量