{"title":"A Data Mining Framework for Social Graph Generation and Analysis","authors":"Danish Kumar, Md. Imran Hossain Showrov","doi":"10.1109/ICIET48527.2019.9290584","DOIUrl":null,"url":null,"abstract":"Due to the increasing popularity and easy accessibility of social networking services, the number of users in social networks is increasing rapidly. As a result, their size and user-generated content are growing day-to-day. One of the requirements is to capture such a huge amount of data and analyze them for desired purposes, such as target marketing, recommender system design, open-source intelligence, and cybersecurity. Twitter is one of the most popular social network sites (aka microblogging site) and it is used by almost every person for the news update, information sharing, viral marketing, etc using 280 characters. In this paper, we have applied a text analytics framework to analyze Twitter data at different levels of granularity. One of the distinguishing features of the proposed framework is to exploit both content and structural information for tweets analysis. The proposed framework first models the tweets into a multi-attributed graph, wherein tweets are rep-resented as nodes and inter-tweet relationships are represented as edges. For node labeling, we have used NLP techniques to identify features from tweets, whereas edges are labeled using meta-data (such as hashtags, mentions, followers, etc.) that are common to the tweets connected by the edges. For analyzing the multi-attributed graph, we have considered two algorithms, MAG-Dist and MAG-Sim, to convert the multi-attributed graph into a simple and similarity graph. Finally, we have applied MCL (Markov Clustering) algorithm to cluster the nodes of the multi-attributed graph, each group consisting of a particular subset of tweets representing an event. The experimental evaluation of the proposed approach is done on a real dataset crawled from Twitter.","PeriodicalId":427838,"journal":{"name":"2019 2nd International Conference on Innovation in Engineering and Technology (ICIET)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 2nd International Conference on Innovation in Engineering and Technology (ICIET)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIET48527.2019.9290584","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Due to the increasing popularity and easy accessibility of social networking services, the number of users in social networks is increasing rapidly. As a result, their size and user-generated content are growing day-to-day. One of the requirements is to capture such a huge amount of data and analyze them for desired purposes, such as target marketing, recommender system design, open-source intelligence, and cybersecurity. Twitter is one of the most popular social network sites (aka microblogging site) and it is used by almost every person for the news update, information sharing, viral marketing, etc using 280 characters. In this paper, we have applied a text analytics framework to analyze Twitter data at different levels of granularity. One of the distinguishing features of the proposed framework is to exploit both content and structural information for tweets analysis. The proposed framework first models the tweets into a multi-attributed graph, wherein tweets are rep-resented as nodes and inter-tweet relationships are represented as edges. For node labeling, we have used NLP techniques to identify features from tweets, whereas edges are labeled using meta-data (such as hashtags, mentions, followers, etc.) that are common to the tweets connected by the edges. For analyzing the multi-attributed graph, we have considered two algorithms, MAG-Dist and MAG-Sim, to convert the multi-attributed graph into a simple and similarity graph. Finally, we have applied MCL (Markov Clustering) algorithm to cluster the nodes of the multi-attributed graph, each group consisting of a particular subset of tweets representing an event. The experimental evaluation of the proposed approach is done on a real dataset crawled from Twitter.