{"title":"Mining Social Media Data Using Topological Data Analysis","authors":"Khaled Almgren, Minkyu Kim, JeongKyu Lee","doi":"10.1109/IRI.2017.41","DOIUrl":null,"url":null,"abstract":"Topological data analysis is a noble method to analyze high-dimensional qualitative data using a set of properties from topology. In this paper, we explore the feasibility of topological data analysis for mining social media data by investigating the problem of image popularity. We randomly crawl images from Instagram, convert their captions to 300 dimensional numerical vectors using Word2vec, calculate cosine distances to evaluate the similarities of the caption vectors, and then apply the distances to a topological data analysis algorithm called mapper.With caption vectors, the results show that topological data analysis is able to cluster the images related to the images’ popularity. Moreover, the results show relationships between the clusters that are represented as a monotonic increase of popularity. This approach is compared with traditional clustering algorithms, including k-means and hierarchical clustering, and the results show that topological data analysis outperforms the others.","PeriodicalId":254330,"journal":{"name":"2017 IEEE International Conference on Information Reuse and Integration (IRI)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE International Conference on Information Reuse and Integration (IRI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IRI.2017.41","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11
Abstract
Topological data analysis is a noble method to analyze high-dimensional qualitative data using a set of properties from topology. In this paper, we explore the feasibility of topological data analysis for mining social media data by investigating the problem of image popularity. We randomly crawl images from Instagram, convert their captions to 300 dimensional numerical vectors using Word2vec, calculate cosine distances to evaluate the similarities of the caption vectors, and then apply the distances to a topological data analysis algorithm called mapper.With caption vectors, the results show that topological data analysis is able to cluster the images related to the images’ popularity. Moreover, the results show relationships between the clusters that are represented as a monotonic increase of popularity. This approach is compared with traditional clustering algorithms, including k-means and hierarchical clustering, and the results show that topological data analysis outperforms the others.