R. Mohan, Muhammad Arif, Jobin Wilson, S. Chaudhury, Brejesh Lall
{"title":"Code-Borrowedness of English words in Hindi Language","authors":"R. Mohan, Muhammad Arif, Jobin Wilson, S. Chaudhury, Brejesh Lall","doi":"10.1145/3041823.3067693","DOIUrl":"https://doi.org/10.1145/3041823.3067693","url":null,"abstract":"1.1 Ground Truth e user preference towards usage of a Hindi word in its Hindi form as opposed to its English form in a Hindi sentence, is determined through a survey. From the survey responses, dierence between the total number of instances wherein the word is preferred in its Hindi form and the instances wherein it is preferred in its English form is calculated to form the ground truth metric. e survey responses for 12 words are available from 58 participants, to measure the eectiveness of our proposed metric.","PeriodicalId":173593,"journal":{"name":"Proceedings of the 4th ACM IKDD Conferences on Data Sciences","volume":"146 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129580707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On Discovery of permanent land cover changes using time series segmentation approach","authors":"A. Aggarwal, D. Patel, M. Oza","doi":"10.1145/3041823.3041832","DOIUrl":"https://doi.org/10.1145/3041823.3041832","url":null,"abstract":"Sustainable land management is one of the crucial aspects that need to be considered in order to protect the resources for future generations. Understanding of land cover changes that occurred during the past decade is necessary to formulate policies and actions for sustainable land management. Land cover changes affect ecosystem and global climate. These changes transform natural habitats of plant and animal species and also modify air temperature and near surface moisture content which leads to many drastic changes in climate. Land cover change study helps climate and ecosystem scientists in understanding role of land cover changes in bringing climate and ecosystem changes. This paper has used segmentation-based data mining approach on MODIS NDVI (Normalized Difference Vegetation Index) data for understanding of land cover changes in states of Gujarat and Rajasthan. Data smoothing using Savitzky-Golay filtering method has been performed before applying algorithm for land cover change detection. Algorithm is able to identify the time point of change along with type of land cover change. The findings show a lot of industrialization and urbanization in Surat and its satellite towns. Other major cities such as Ahmedabad and Jaipur have shown urban growth towards periphery over a period of time. Although agricultural area has got reduced due to urban growth but barren land has got converted into agricultural area as irrigation facility has improved over time due to emergence of Narmada Canal Network.","PeriodicalId":173593,"journal":{"name":"Proceedings of the 4th ACM IKDD Conferences on Data Sciences","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114636327","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Modeling End-of-Online-Session From Streaming Data","authors":"Moumita Sinha, Harsh Jhamtani, Sanket Vaibhav Mehta, Balaji Vasan Srinivasan","doi":"10.1145/3041823.3041827","DOIUrl":"https://doi.org/10.1145/3041823.3041827","url":null,"abstract":"Engagement of consumers has become increasingly important for online marketers. When a potential consumer arrives on its online platform and interacts with it, two important and interrelated questions arise. One whether the consumer is engaged in the session or has completed the session. Two, upon completion of a session whether the consumer will return to the site. Real time answers to both these questions benefit the marketer directly by facilitating more effective retargeting, determination of which is a significant problem in online commerce. We address this problem of retargeting by using automated predictive models. Our model allows a marketer to decide in a real time manner whether a click is the last click of the session. Then the model identifies real time the consumer's propensity to return when the session actually ends. This propensity is used to decide whether and whom to retarget with a message. Tests of our model on real data from internet e-commerce sites perform well. The proposed approach is a considerable improvement over the current approach of having to wait for a pre-specified amount of time after a click, in order to identify the end of the session.","PeriodicalId":173593,"journal":{"name":"Proceedings of the 4th ACM IKDD Conferences on Data Sciences","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121546895","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Noisy Deep Dictionary Learning","authors":"Vanika Singhal, A. Majumdar","doi":"10.1145/3041823.3041826","DOIUrl":"https://doi.org/10.1145/3041823.3041826","url":null,"abstract":"In a recent work, the concept of deep dictionary learning was proposed. Learning a single level of dictionary is a well researched topic in image processing and computer vision community. In deep dictionary learning, the first level proceeds like standard dictionary learning; in subsequent layers the (scaled) output coefficients from the previous layer are used as inputs for dictionary learning. This is an unsupervised deep learning approach. The features from the final / deepest layer and representations for subsequent analysis and classification. The seminal paper of stacked denoising autoencoders have shown that robust deep models can be learnt when augmented noisy data is used for training stacked autoencoders instead of clean data. We adopt this idea into the deep dictionary learning framework; instead of using only clean data we augment the training dataset by adding noise; this improves robustness. Experimental evaluation on various benchmark datasets on classification and clustering shows that our proposal yields significant improvement.","PeriodicalId":173593,"journal":{"name":"Proceedings of the 4th ACM IKDD Conferences on Data Sciences","volume":"89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122157214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Protein Structure Optimization in 3D AB off-lattice model using Biogeography Based Optimization with Chaotic Mutation","authors":"N. D. Jana, J. Sil, Swagatam Das","doi":"10.1145/3041823.3041833","DOIUrl":"https://doi.org/10.1145/3041823.3041833","url":null,"abstract":"Protein structure prediction (PSP) from its amino acid sequence is a challenging problem in computational biology and can be considered as a global optimization problem. It is a multi-modal optimization problem and belongs to NP-hard class. In this paper, Biogeography Based Optimization with Chaotic Mutation (BBO-CM) algorithm has been developed to optimize 3D protein structure. The proposed algorithm prevents premature convergence and jumping out from the local minima during execution and converges with the optimum solution. Chaos system generates the chaotic pseudo random sequence which is utilized in mutation operation of BBO algorithm to increase the population diversity. The experiments are carried out with artificial and real protein sequences with different length to confirm the performance and robustness of the BBO-CM algorithm. Results are compared with other algorithms demonstrating the efficiency of the proposed approach.","PeriodicalId":173593,"journal":{"name":"Proceedings of the 4th ACM IKDD Conferences on Data Sciences","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128329415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Detecting Factual and Non-Factual Content in News Articles","authors":"Ishan Sahu, Debapriyo Majumdar","doi":"10.1145/3041823.3041837","DOIUrl":"https://doi.org/10.1145/3041823.3041837","url":null,"abstract":"News articles are a major source of facts about the current state and events of our surrounding world. However, not all news articles are equally rich in presenting the facts. In this paper, we consider the problem of detecting factual and non-factual parts in news articles. We present a comprehensive survey on the existing literature on fact classification on news articles as well as a related and more widely studied problem of subjectivity vs objectivity classification of statements. Combining these techniques and some new features we design a framework for classifying facts and non-facts in news articles. We present extensive experiments on this task using several features and combinations of those on two datasets, one of which was used for subjectivity classification in previous works. We show that standard textual dataset dependent features such as n-grams produce good results on both datasets, but more general features such as part of speech tags and entity types produce inconsistent results. We analyze the results based on the nature of the datasets to present insights on the usefulness of the features and their applicability in the classification task we are considering.","PeriodicalId":173593,"journal":{"name":"Proceedings of the 4th ACM IKDD Conferences on Data Sciences","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131051586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Role of Temporal Diversity in Inferring Social Ties Based on Spatio-Temporal Data","authors":"D. Desai, Harsh Nisar, Rishab Bhardawaj","doi":"10.1145/3041823.3041836","DOIUrl":"https://doi.org/10.1145/3041823.3041836","url":null,"abstract":"The last two decades have seen a tremendous surge in research on social networks and their implications. The studies include inferring social relationships, which in turn have been used for target advertising, recommendations, search customization etc. However, the offline experiences of humans, the conversations with people and face-to-face interactions that govern our lives interactions have received lesser attention. We introduce DAIICT Spatio-Temporal Network (DSSN), a spatiotemporal dataset of 0.7 million data points of continuous location data logged at an interval of every 1 minute by mobile phones of 46 subjects. Our research is focused at inferring relationship strength between students based on the spatiotemporal data and comparing the results with the self-reported data. In that pursuit we introduce Temporal Diversity, which we show to be superior in its contribution to predicting relationship strength than its counterparts. We also explore the evolving nature of Temporal Diversity with time. Our rich dataset opens various other avenues of research that require fine-grained location data with bounded movement of participants within a limited geographical area. The advantage of having a bounded geographical area such as a university campus is that it provides us with a microcosm of the real world, where each such geographic zone has an internal context and function and a high percentage of mobility is governed by schedules and time-tables. The bounded geographical region in addition to the age homogeneous population gives us a minute look into the active internal socialization of students in a university.","PeriodicalId":173593,"journal":{"name":"Proceedings of the 4th ACM IKDD Conferences on Data Sciences","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114154956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Proceedings of the 4th ACM IKDD Conferences on Data Sciences","authors":"","doi":"10.1145/3041823","DOIUrl":"https://doi.org/10.1145/3041823","url":null,"abstract":"","PeriodicalId":173593,"journal":{"name":"Proceedings of the 4th ACM IKDD Conferences on Data Sciences","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116551606","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}