{"title":"Learning to Collectively Link Entities","authors":"Ashish Kulkarni, Kanika Agarwal, Pararth Shah, Sunny Raj Rathod, Ganesh Ramakrishnan","doi":"10.1145/2888451.2888454","DOIUrl":"https://doi.org/10.1145/2888451.2888454","url":null,"abstract":"Recently Kulkarni et al. [20] proposed an approach for collective disambiguation of entity mentions occurring in natural language text. Their model achieves disambiguation by efficiently computing exact MAP inference in a binary labeled Markov Random Field. Here, we build on their disambiguation model and propose an approach to jointly learn the node and edge parameters of such a model. We use a max margin framework, which is efficiently implemented using projected subgradient, for collective learning. We leverage this in an online and interactive annotation system which incrementally trains the model as data gets curated progressively. We demonstrate the usefulness of our system by manually completing annotations for a subset of the Wikipedia collection. We have made this data publicly available. Evaluation shows that learning helps and our system performs better than several other systems including that of Kulkarni et al.","PeriodicalId":136431,"journal":{"name":"Proceedings of the 3rd IKDD Conference on Data Science, 2016","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134179521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning from Gurus: Analysis and Modeling of Reopened Questions on Stack Overflow","authors":"Rishabh Gupta, P. Reddy","doi":"10.1145/2888451.2888460","DOIUrl":"https://doi.org/10.1145/2888451.2888460","url":null,"abstract":"Community-driven Question Answering (Q&A) platforms are gaining popularity now-a-days and the number of posts on such platforms are increasing tremendously. Thus, the challenge to keep these platforms noise-free is attracting the interest of research community. Stack Overflow is one such popular computer programming related Q&A platform. The established users on Stack Overflow have learnt the acceptable format and scope of questions in due course. Even if their questions get closed, they are aware of the required edits, therefore the chances of their questions being reopened increases. On the other hand, non-established users have not adapted to the Stack Overflow system and find difficulty in editing their closed questions. In this work, we aim to identify features which help differentiate editing approaches of established and non-established users, and motivate the need of recommendation model. Such a recommendation model can assist every user to edit their closed questions leveraging the edit-style of the established users of the platform.","PeriodicalId":136431,"journal":{"name":"Proceedings of the 3rd IKDD Conference on Data Science, 2016","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121798588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Manjira Sinha, P. Varma, Gayatri Sivakumar, Mridula Singh, Tridib Mukherjee, D. Chander, K. Dasgupta
{"title":"Improving Urban Transportation through Social Media Analytics","authors":"Manjira Sinha, P. Varma, Gayatri Sivakumar, Mridula Singh, Tridib Mukherjee, D. Chander, K. Dasgupta","doi":"10.1145/2888451.2888478","DOIUrl":"https://doi.org/10.1145/2888451.2888478","url":null,"abstract":"Citizens tend to discuss issues in public forums, social media, and web blogs. Given that issues related to public transportation are most actively reported across web-based sources, we present a holistic framework for collection, categorization, aggregation and visualization of urban public transportation issues. The primary challenges in deriving useful insights from web-based sources, stem from -- (a) the number of reports; (b) incomplete or implicit spatio-temporal context; and the (c) unstructured nature of text in these reports. The work initiates with the formal complaint data from the largest public transportation agency in Bangalore, complemented by complaint reports from web-based and social media sources. Text data is categorized into different transportation related problems and spatio-temporal context is added to the text data for geo-tagging and identifying persistent issues. A well-organized dashboard is developed for efficient visualization. The dashboard is currently being piloted with the largest transportation agency in Bangalore.","PeriodicalId":136431,"journal":{"name":"Proceedings of the 3rd IKDD Conference on Data Science, 2016","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116620735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"AMEO 2015: A dataset comprising AMCAT test scores, biodata details and employment outcomes of job seekers","authors":"V. Aggarwal, Shashank Srikant, Harsh Nisar","doi":"10.1145/2888451.2892037","DOIUrl":"https://doi.org/10.1145/2888451.2892037","url":null,"abstract":"More than a million engineers enter the global workforce every year. A relevant question is what determines the jobs and salaries these engineers are offered right after graduation. Previous studies have shown the influence of various factors such as college reputation, grades, the field one specializes in and market conditions for specific industries. An important input which such analyses do not have is a standardized measures of job skills done at the time of completion of studies. We present here Aspiring Minds' Employability Outcomes 2015 (AMEO 2015), a unique dataset which provides engineering graduates' employment outcomes (salaries, job titles and job locations) together with standardized assessment scores in three fundamental areas - cognitive skills, technical skills and personality. Coupled with biodata information, AMEO 2015 provides an opportunity for a unique and comprehensive study of the entry level labor market. The data could be used to make an accurate salary predictor, but also understand what influences salary and job titles in the labor market. In this paper we describe the details of the dataset and discuss a spectrum of questions around meritocracy in labor markets, biases in labor selection and other prevalent market forces it can help uncover and answer. You can download the dataset at: http://research.aspiringminds.com/resources/","PeriodicalId":136431,"journal":{"name":"Proceedings of the 3rd IKDD Conference on Data Science, 2016","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117091361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Events Describe Places: Tagging Places with Event Based Social Network Data","authors":"Vinod Hegde, A. Mileo, A. Pozdnoukhov","doi":"10.1145/2888451.2888477","DOIUrl":"https://doi.org/10.1145/2888451.2888477","url":null,"abstract":"Location based services and Geospatial web applications have become popular in recent years due to wide adoption of mobile devices. Search and recommendation of places or Points of Interests (PoIs) are prominent services available on them. The effectiveness of these services crucially depends on the availability of tags that are descriptive of places. The major geospatial databases that contain data about places suffer from the lack of descriptive tags for places, since writing them is a time-consuming process and only a few users do it despite having knowledge about places. In order to tackle this issue and automatically generate descriptive tags for places, we propose a solution that utilizes data about a set of events that happen in a specific place and use it to extract meaningful descriptive tags for that place. We use data about events held at places on Meetup, a well known event based social network and apply Latent Dirichlet Allocation (LDA) to derive sets of probable descriptive tags for any place. In order to evaluate our approach, we measure semantic relatedness between tags derived for places on Meetup and manually assigned tags from Foursquare, a location based service. Results show that event data can be used to derive semantically relevant place tags. This shows that location based services can benefit from capturing data about events to derive place tags.","PeriodicalId":136431,"journal":{"name":"Proceedings of the 3rd IKDD Conference on Data Science, 2016","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129671726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Partha Basuchowdhuri, Satyaki Sikdar, Sonu Sreshtha, S. Majumder
{"title":"Detecting Community Structures in Social Networks by Graph Sparsification","authors":"Partha Basuchowdhuri, Satyaki Sikdar, Sonu Sreshtha, S. Majumder","doi":"10.1145/2888451.2888479","DOIUrl":"https://doi.org/10.1145/2888451.2888479","url":null,"abstract":"Community structures are inherent in social networks and finding them is an interesting and well-studied problem. Finding community structures in social networks is similar to locating densely connected clusters of nodes in a graph. One of the popular methods for finding communities is to first find the inter-community edges and then removing them to reveal the communities. It is well-known that a network centrality measure named edge betweenness can be used to detect the inter-community edges. The edges with high edge betweenness are those that fall in a large number of shortest paths out of all possible pairs of shortest paths. Finding all-pair shortest paths is a computationally expensive task, especially for large-sized graphs. So we construct a t-spanner, a known graph sparsification technique, for finding edges with high betweenness and eventually find communities by removing such edges. Using the t-spanner, we then detect the inter-community edges in O(km) running time by building a distance oracle of size O(kn1+1/k), where t = 2k-1. Compared to the traditional community detection methods dependent on calculation of betweenness values, our algorithm runs much faster. Experiments show that our algorithm finds communities of quality comparable to the other state-of-the-art community detection algorithms.","PeriodicalId":136431,"journal":{"name":"Proceedings of the 3rd IKDD Conference on Data Science, 2016","volume":"104 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134506644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CitizenPulse: A Text Analytics framework for Proactive e-Governance - A Case Study of Mygov.in","authors":"Ankit Lamba, Deepak Yadav, A. Lele","doi":"10.1145/2888451.2888463","DOIUrl":"https://doi.org/10.1145/2888451.2888463","url":null,"abstract":"Indian Citizens are beginning to express themselves via social media on a regular basis on various issues. Government of India have started an initiated called as Mygov.in as a collaborative portal where citizens can voice their opinions via free form comments. Analyzing this free form data is a huge challenge. In this paper we present a work in progress called as CitizenPulse framework, capable of performing text analytics on unstructured text using off-the-shelf text analytics components like Named Entity Recognition, Part of Speech and Stemming to name a few. Apart from integrating the text analytics components, CitizenPulse framework abstracts these building blocks as Object, and such different objects can be dragged, dropped and connected to construct a text analytics pipeline called as Analytics Softcore. As a case study we report the analysis of the Mygov.in portal specifically for the topic of Cleanliness in School Curriculum.","PeriodicalId":136431,"journal":{"name":"Proceedings of the 3rd IKDD Conference on Data Science, 2016","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123728898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Some algorithms for correlated bandits with non-stationary rewards: Regret bounds and applications","authors":"Prathamesh Mayekar, N. Hemachandra","doi":"10.1145/2888451.2888475","DOIUrl":"https://doi.org/10.1145/2888451.2888475","url":null,"abstract":"We first propose an online learning model wherein rewards for different actions/arms used by the user can be correlated and the reward stream can be non-stationary. Thus, this extends the standard multi-armed bandit learning model. We propose two algorthims, Greedy and Regression based UCB, that attempt to minimize the expected regret. We also obtain non-trivial upper bounds for the expected regret through theoretical analysis. We also provide some evidence for sub-polynomial increase in expected regret upon appropriate tuning of algorithm input parameters. These models are motivated by the problem of dynamic pricing of a product faced by a typical online retailer.","PeriodicalId":136431,"journal":{"name":"Proceedings of the 3rd IKDD Conference on Data Science, 2016","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126568183","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}