{"title":"Detecting malicious tweets in trending topics using clustering and classification","authors":"Saini Jacob, Soman Research, Murugappan","doi":"10.1109/ICRTIT.2014.6996188","DOIUrl":null,"url":null,"abstract":"Detection of spam Twitter social networks is one of the significant research areas to discover unauthorized user accounts. A number of research works have been carried out to solve these issues but most of the existing techniques had not focused on various features and doesn't group similar user trending topics which become their major limitation. Trending topics collects the current Internet trends and topics of argument of each and every user. In order to overcome the problem of feature extraction,this work initially extracts many features such as user profile features, user activity features, location based features and text and content features. Then the extracted text features use Jenson-Shannon Divergence (JSD) measure to characterize each labeled tweet using natural language models. Different features are extracted from collected trending topics data in twitter. After features are extracted, clusters are formed to group similar trending topics of tweet user profile. Fuzzy K-means (FKM) algorithm primarily cluster the similar user profiles with same trending topics of tweet and centers are determined to similar user profiles with same trending topics of tweet from fuzzy membership function. Moreover, Extreme learning machine (ELM) algorithm is applied to analyze the growing characteristics of spam with similar topics in twitter from clustering result and acquire necessary knowledge in the detection of spam. The results are evaluated with F-measure, True Positive Rate (TPR), False Positive Rate (FPR) and Classification Accuracy with improved detection results.","PeriodicalId":422275,"journal":{"name":"2014 International Conference on Recent Trends in Information Technology","volume":"144 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"23","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 International Conference on Recent Trends in Information Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICRTIT.2014.6996188","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 23
Abstract
Detection of spam Twitter social networks is one of the significant research areas to discover unauthorized user accounts. A number of research works have been carried out to solve these issues but most of the existing techniques had not focused on various features and doesn't group similar user trending topics which become their major limitation. Trending topics collects the current Internet trends and topics of argument of each and every user. In order to overcome the problem of feature extraction,this work initially extracts many features such as user profile features, user activity features, location based features and text and content features. Then the extracted text features use Jenson-Shannon Divergence (JSD) measure to characterize each labeled tweet using natural language models. Different features are extracted from collected trending topics data in twitter. After features are extracted, clusters are formed to group similar trending topics of tweet user profile. Fuzzy K-means (FKM) algorithm primarily cluster the similar user profiles with same trending topics of tweet and centers are determined to similar user profiles with same trending topics of tweet from fuzzy membership function. Moreover, Extreme learning machine (ELM) algorithm is applied to analyze the growing characteristics of spam with similar topics in twitter from clustering result and acquire necessary knowledge in the detection of spam. The results are evaluated with F-measure, True Positive Rate (TPR), False Positive Rate (FPR) and Classification Accuracy with improved detection results.