{"title":"Categorical Data Analysis and Pattern Mining of Top Colleges in India by Using Twitter Data","authors":"Nehal Mamgain, B. Pant, A. Mittal","doi":"10.1109/CICN.2016.73","DOIUrl":null,"url":null,"abstract":"This paper is a detailed summary of the work conducted in the novel domain of categorical data analysis of eminent colleges in India by mining Twitter data and uncovering integral traits/events characteristic of these institutes by determining key rules. The information thus collected could be beneficial to the entire academia: it can be utilized by students in making informed decisions about which college to join or by institutes themselves to address their potentially weak points and maintain the standards of their positive features. Apart from performing extensive preprocessing including spelling correction and netspeak expansion, irrelevant tweets were further segregated by means of a unigram dictionary containing education-oriented keywords. The Apriori algorithm was then applied to the dataset thus obtained resulting in characteristic markers or patterns of these institutes.","PeriodicalId":189849,"journal":{"name":"2016 8th International Conference on Computational Intelligence and Communication Networks (CICN)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 8th International Conference on Computational Intelligence and Communication Networks (CICN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CICN.2016.73","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
This paper is a detailed summary of the work conducted in the novel domain of categorical data analysis of eminent colleges in India by mining Twitter data and uncovering integral traits/events characteristic of these institutes by determining key rules. The information thus collected could be beneficial to the entire academia: it can be utilized by students in making informed decisions about which college to join or by institutes themselves to address their potentially weak points and maintain the standards of their positive features. Apart from performing extensive preprocessing including spelling correction and netspeak expansion, irrelevant tweets were further segregated by means of a unigram dictionary containing education-oriented keywords. The Apriori algorithm was then applied to the dataset thus obtained resulting in characteristic markers or patterns of these institutes.