{"title":"On the Dynamics of Username Changing Behavior on Twitter","authors":"Paridhi Jain, P. Kumaraguru","doi":"10.1145/2888451.2888452","DOIUrl":"https://doi.org/10.1145/2888451.2888452","url":null,"abstract":"People extensively use username to lookup users, their profiles and tweets that mention them via Twitter search engine. Often, the searched username is outdated due to a recent username change and no longer refers to the user of interest. Search by the user's old username results in a failed attempt to reach the user's profile, thereby making others falsely believe that the user account has been deactivated. Such search can also redirect to a different user who later picks the old username, thereby reaching to a different person altogether. Past studies show that a substantial section of Twitter users change their username over time. We also observe similar trends when tracked 8.7 million users on Twitter for a duration of two months. To this point, little is known about how and why do these users undergo changes to their username, given the consequences of unreachability. To answer this, we analyze username changing behavior of carefully selected users on Twitter and find that users change username frequently within short time intervals (a day) and choose new username un-related to the old one. Few favor a username by repeatedly choosing it multiple times. We explore few of the many reasons that may have caused username changes. We believe that studying username changing behavior can help correctly find the user of interest in addition to learning username creation strategies and uncovering plausible malicious intentions for the username change.","PeriodicalId":136431,"journal":{"name":"Proceedings of the 3rd IKDD Conference on Data Science, 2016","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114157842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mit Shah, Josif Grabocka, Nicolas Schilling, Martin Wistuba, L. Schmidt-Thieme
{"title":"Learning DTW-Shapelets for Time-Series Classification","authors":"Mit Shah, Josif Grabocka, Nicolas Schilling, Martin Wistuba, L. Schmidt-Thieme","doi":"10.1145/2888451.2888456","DOIUrl":"https://doi.org/10.1145/2888451.2888456","url":null,"abstract":"Shapelets are discriminative patterns in time series, that best predict the target variable when their distances to the respective time series are used as features for a classifier. Since the shapelet is simply any time series of some length less than or equal to the length of the shortest time series in our data set, there is an enormous amount of possible shapelets present in the data. Initially, shapelets were found by extracting numerous candidates and evaluating them for their prediction quality. Then, Grabocka et al. [2] proposed a novel approach of learning time series shapelets called LTS. A new mathematical formalization of the task via a classification objective function was proposed and a tailored stochastic gradient learning was applied. It enabled learning near-to-optimal shapelets without the overhead of trying out lots of candidates. The Euclidean distance measure was used as distance metric in the proposed approach. As a limitation, it is not able to learn a single shapelet, that can be representative of different subsequences of time series, which are just warped along time axis. To consider these cases, we propose to use Dynamic Time Warping (DTW) as a distance measure in the framework of LTS. The proposed approach was evaluated on 11 real world data sets from the UCR repository and a synthetic data set created by ourselves. The experimental results show that the proposed approach outperforms the existing methods on these data sets.","PeriodicalId":136431,"journal":{"name":"Proceedings of the 3rd IKDD Conference on Data Science, 2016","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123187974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Competing Algorithm Detection from Research Papers","authors":"S. Ganguly, Vikram Pudi","doi":"10.1145/2888451.2888473","DOIUrl":"https://doi.org/10.1145/2888451.2888473","url":null,"abstract":"We propose an unsupervised approach to extract all competing algorithms present in a given scholarly article. The algorithm names are treated as named entities and natural language processing techniques are used to extract them. All extracted entity names are linked with their respective original papers in the reference section by our novel entity-citation linking algorithm. Then these entity-citation pairs are ranked based on the number of comparison related cue-words present in the entity-citation context. We manually annotated a small subset of DBLP Computer Science conference papers and report both qualitative and quantitative results of our algorithm on it.","PeriodicalId":136431,"journal":{"name":"Proceedings of the 3rd IKDD Conference on Data Science, 2016","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129092652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Using Sort-Union to Enhance Economically-Efficient Sentiment Stream Analysis","authors":"Prateek Goel, Manajit Chakraborty, C. R. Chowdary","doi":"10.1145/2888451.2888468","DOIUrl":"https://doi.org/10.1145/2888451.2888468","url":null,"abstract":"Sentiment drifts due to people changing their opinions instantly on microblogs e.g. Twitter, are a major challenge in sentiment analysis. In this paper, we have developed a method that selects most frequent messages from a relevant message set constructed using state-of-the-art sampling approaches. Our proposed technique increases the robustness of the classifier against sentiment drifts. Experiments conducted on three publicly available standard Twitter datasets reveal that the modified version performs better in terms of reduction in training resources, error minimization and execution time.","PeriodicalId":136431,"journal":{"name":"Proceedings of the 3rd IKDD Conference on Data Science, 2016","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126420830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lilly Kumari, Sunny Dhamnani, Akshat Bhatnagar, Atanu R. Sinha, R. Sinha
{"title":"Audience Prism: Segmentation and Early Classification of Visitors Based on Reading Interests","authors":"Lilly Kumari, Sunny Dhamnani, Akshat Bhatnagar, Atanu R. Sinha, R. Sinha","doi":"10.1145/2888451.2888459","DOIUrl":"https://doi.org/10.1145/2888451.2888459","url":null,"abstract":"The largest Media and Entertainment (M&E) web portals today cater to more than 100 Million unique visitors every month. In Customer Relationship Management, customer segmentation plays an important role, with the goal of targeting different products for different segments. Marketers segment their customers based on customer attributes. In the non-subscription based media business, the customer is analogous to the visitor, the product to the content, and a purchase to consumption. Knowing which segment an audience member belongs to, enables better engagement. In this work, we address the problems: 1) How can we segment audience members of an M&E web property based on their media consumption interests? 2) When a new visitor arrives, how can we classify them into one of the above defined segments (without having to wait for consumption history)? We apply our proposed solution to a real world data-set and show that we can achieve coherent clusters and can predict cluster membership with a high level of accuracy. We also build a tool that the editors can find valuable towards understanding their audience.","PeriodicalId":136431,"journal":{"name":"Proceedings of the 3rd IKDD Conference on Data Science, 2016","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124615212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Balaji Vasan Srinivasan, Tanya Goyal, N. M. Nainani, Kartik K. Sreenivasan
{"title":"Smart filters for social retrieval","authors":"Balaji Vasan Srinivasan, Tanya Goyal, N. M. Nainani, Kartik K. Sreenivasan","doi":"10.1145/2888451.2888457","DOIUrl":"https://doi.org/10.1145/2888451.2888457","url":null,"abstract":"Social media platform are increasingly becoming a rich source of information for capturing the views and opinions of online customers. Major brands listen to the social streams to understand the general pulse of their online community. The foremost task here is to construct a \"filter\" to fetch the brand-relevant data from the social streams. Due to the nature of social platforms, simple filters/queries for retrieval yield a lot of noise leading to a need for complicated filters. Constructing such complicated filters is a non-trivial task and requires significant time-investment from a social marketer. In this paper, we propose a method to automate this task by expanding a seed set of watch keywords to maximize the number of retrieved relevant social feeds around the brand and combining them appropriately into a social query. We show the strengths and weaknesses of the proposed approach in the light of real-world social feeds for various brands.","PeriodicalId":136431,"journal":{"name":"Proceedings of the 3rd IKDD Conference on Data Science, 2016","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114259044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Approach to Allocate Advertisement Slots for Banner Advertising","authors":"V. Kavya, P. Reddy","doi":"10.1145/2888451.2888472","DOIUrl":"https://doi.org/10.1145/2888451.2888472","url":null,"abstract":"In the banner advertising scenario, an advertiser aims to reach the maximum number of potential visitors and a publisher tries to meet the requests of increased number of advertisers to maximize the revenue. In the literature, a model was introduced to extract the knowledge of coverage patterns from transactional database. In this paper, we propose an ad slots allocation approach by extending the notion of coverage patterns to select distinct sets of ad slots to meet the requests of multiple advertisers. The preliminary experimental results on a real world dataset show that the proposed approach meets the requests of increased number of advertisers when compared with the baseline approach of allocation.","PeriodicalId":136431,"journal":{"name":"Proceedings of the 3rd IKDD Conference on Data Science, 2016","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125566694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Consensus Clustering Approach for Discovering Overlapping Nodes in Social Networks","authors":"D. Shankar, S. Bhavani","doi":"10.1145/2888451.2888471","DOIUrl":"https://doi.org/10.1145/2888451.2888471","url":null,"abstract":"Community discovery is an important problem that has been addressed in social networks through multiple perspectives. Most of these algorithms discover disjoint communities and yield widely varying results with regard to number of communities as well as community membership. We utilize this information positively by interpreting the results as opinions of different algorithms regarding membership of a node in a community. A novel approach to discovering overlapping nodes is proposed based on Consensus Clustering and we design two algorithms, namely core-consensus and periphery-consensus. The algorithms are implemented on LFR networks which are synthetic bench mark data sets created for community discovery and comparative performance is presented. It is shown that overlapping nodes are detected with a high Recall of above 96 % with an average F-measure of nearly 75% for dense networks and 65% for sparse networks which are on par with high-performing algorithms in the literature.","PeriodicalId":136431,"journal":{"name":"Proceedings of the 3rd IKDD Conference on Data Science, 2016","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131438514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Feature Creation based Slicing for Privacy Preserving Data Mining","authors":"R. Priyadarsini, M. Valarmathi, S. Sivakumari","doi":"10.1145/2888451.2888462","DOIUrl":"https://doi.org/10.1145/2888451.2888462","url":null,"abstract":"In the digital era vast amount of data are collected and shared for purpose of research and analysis. These data contain sensitive information about the people and organizations which needs to be protected during the process of data mining. This work proposes Feature Creation Based Slicing [FCBS] algorithm for preserving privacy such that sensitive data are not exposed during the process of data mining in Multi Trust Level [MTL] environment. The proposed algorithm applies three layers of privacy preservation using both perturbation and non-perturbation techniques and creates new features from already existing attribute vector. Experiments are performed on real life and benchmarked datasets and the results are compared with the existing slicing and L-diversity algorithms. The results show that privacy preserved datasets generated using the proposed algorithm yields negligible hiding failure while protecting sensitive patterns during association mining and gives comparable utility during classification. Due to feature creation process in the proposed algorithm, linking and known background attacks are prevented. Also, the variance values of the proposed privacy preserved datasets show that they can prevent diversity attacks.","PeriodicalId":136431,"journal":{"name":"Proceedings of the 3rd IKDD Conference on Data Science, 2016","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134623794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Proceedings of the 3rd IKDD Conference on Data Science, 2016","authors":"M. Marathe, M. Mohania, Prateek Jain","doi":"10.1145/2888451","DOIUrl":"https://doi.org/10.1145/2888451","url":null,"abstract":"This volume contains the papers presented at CoDS 2016: Third IKDD Conference on Data Sciences held on March 13-16, 2016 in Pune.","PeriodicalId":136431,"journal":{"name":"Proceedings of the 3rd IKDD Conference on Data Science, 2016","volume":"164 12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127525668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}