{"title":"Effect of Feature Hashing on Fair Classification","authors":"Ritik Dutta, Varun Gohil, Atishay Jain","doi":"10.1145/3371158.3371230","DOIUrl":"https://doi.org/10.1145/3371158.3371230","url":null,"abstract":"Learning new representations of data to reduce correlation with sensitive attributes is one method to tackle algorithmic bias. In this paper, we explore the possibility of using feature hashing as a method for learning new representations of data for fair classification. Using Difference of Equal Odds as our metric to measure fairness, we observe that using feature hashing on the Adult Dataset leads to 5.4x improvement in metric score while losing an accuracy of 6.1% compared to when the data is used as is.","PeriodicalId":360747,"journal":{"name":"Proceedings of the 7th ACM IKDD CoDS and 25th COMAD","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124984937","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Solving Eigenvalue problem as an optimization problem on Manifold","authors":"Siddhant Katyan, Shrutimoy Das","doi":"10.1145/3371158.3371225","DOIUrl":"https://doi.org/10.1145/3371158.3371225","url":null,"abstract":"We want to solve the generalized eigenvalue problem by posing it as an optimization problem on manifold. It is based on a constrained truncated-GMRES [8] trust-region strategy to optimize the Rayleigh quotient, in the framework of a recently-proposed trust-region scheme on Riemannian manifolds [1--5].","PeriodicalId":360747,"journal":{"name":"Proceedings of the 7th ACM IKDD CoDS and 25th COMAD","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130673821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sapan Tanted, A. Agarwal, Shinjan Mitra, Chaitra Bahuman, K. Ramamritham
{"title":"Database and Caching Support for Adaptive Visualization of Large Sensor Data","authors":"Sapan Tanted, A. Agarwal, Shinjan Mitra, Chaitra Bahuman, K. Ramamritham","doi":"10.1145/3371158.3371170","DOIUrl":"https://doi.org/10.1145/3371158.3371170","url":null,"abstract":"Rapid deployment of Internet of Things (IoT) has led to ubiquitous and pervasive sensing of objects in the physical world, such as artifacts in buildings, agriculture, cities, the electric grid, etc. Meaningful visualization of large amounts of sensor data demands user-friendly, convenient and flexible tools. In this paper, we discuss the design, implementation and performance of a novel distributed caching & aggregation mechanism to handle the visualization of sensor data, which is time series data. Its features include a) bitmap indexing for capturing the dynamics of the cached data b) exploiting recency of data usage when making cache insertion and replacement decisions and c) integrating existing databases and open-source visualization platforms to provide quick and effective distributed caching solutions to handle time-series data. We evaluate our system on real-world data generated by sensors deployed in an academic building and demonstrate empirically that the system adapts to evolving workload patterns and makes it attractive for a variety of workloads.","PeriodicalId":360747,"journal":{"name":"Proceedings of the 7th ACM IKDD CoDS and 25th COMAD","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134407223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Arindam Bhattacharya, Srikanta J. Bedathur, A. Bagchi
{"title":"Adaptive Learned Bloom Filters under Incremental Workloads","authors":"Arindam Bhattacharya, Srikanta J. Bedathur, A. Bagchi","doi":"10.1145/3371158.3371171","DOIUrl":"https://doi.org/10.1145/3371158.3371171","url":null,"abstract":"The recently proposed paradigm of learned Bloom filters (LBF) seems to offer significant advantages over traditional Bloom filters in terms of low memory footprint and overall performance as evidenced by empirical evaluations over static data. Its behavior in presence of updates to the set of keys being stored in Bloom filters is not very well understood. At the same time, maintaining the false positive rates (FPR) of traditional Bloom filters in presence of dynamics has been studied and extensions to carefully expand memory footprint of the filters without sacrificing FPR have been proposed. Building on these, we propose two distinct approaches for handling data updates encountered in practical uses of LBF: (i) CA-LBF, where we adjust the learned model (e.g., by retraining) to accommodate the new \"unseen\" data, resulting in classifier adaptive methods, and (ii) IA-LBF, where we replace the traditional Bloom filter with its adaptive version while keeping the learned model unchanged, leading to an index adaptive method. In this paper, we explore these two approaches in detail under incremental workloads, evaluating them in terms of their adaptability, memory footprint and false positive rates. Our empirical results using a variety of datasets and learned models of varying complexity show that our proposed methods' ability to handle incremental updates is quite robust.","PeriodicalId":360747,"journal":{"name":"Proceedings of the 7th ACM IKDD CoDS and 25th COMAD","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130231052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On-Device Information Extraction from Screenshots in form of tags","authors":"Sumit Kumar, Gopi Ramena, Manoj Goyal, D. Mohanty, Ankur Agarwal, Benu Changmai, Sukumar Moharana","doi":"10.1145/3371158.3371200","DOIUrl":"https://doi.org/10.1145/3371158.3371200","url":null,"abstract":"We propose a method to make mobile Screenshots easily searchable. In this paper, we present the workflow in which we: 1) pre-processed a collection of screenshots, 2) identified script present in image, 3) extracted unstructured text from images, 4) identified language of the extracted text, 5) extracted keywords from the text, 6) identified tags based on image features, 7) expanded tag set by identifying related keywords, 8) inserted image tags with relevant images after ranking and indexed them to make it searchable on device. We made the pipeline which supports multiple languages and executed it on-device, which addressed privacy concerns. We developed novel architectures for components in the pipeline, optimized performance and memory for on-device computation. We observed from experimentation that the solution developed can reduce overall user effort and improve end user experience while searching, whose results are published.","PeriodicalId":360747,"journal":{"name":"Proceedings of the 7th ACM IKDD CoDS and 25th COMAD","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116268149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Approaches to biomedical coreference resolution","authors":"Ishani Mondal","doi":"10.1145/3371158.3371217","DOIUrl":"https://doi.org/10.1145/3371158.3371217","url":null,"abstract":"Coreference resolution is an important task in natural language processing which aims to group the mention pairs referring to a single entity. In the biomedical domain, it significantly poses some unique challenges. In this work, we make use of both hand-crafted features and neural word embedding based features to solve the task of coreference resolution on a standard benchmark biomedical coreference dataset, i.e the BioNLP-2011 Protein Coreference data. Experimental results show that the neural model performs significantly better in terms of mention-referent linking when compared to the hand-crafted feature-based coreference resolution approaches.","PeriodicalId":360747,"journal":{"name":"Proceedings of the 7th ACM IKDD CoDS and 25th COMAD","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128415807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Cardinality Extraction from Text for Ontology Learning","authors":"Monika Jain, Paramita Mirza, Raghava Mutharaju","doi":"10.1145/3371158.3371223","DOIUrl":"https://doi.org/10.1145/3371158.3371223","url":null,"abstract":"","PeriodicalId":360747,"journal":{"name":"Proceedings of the 7th ACM IKDD CoDS and 25th COMAD","volume":"150 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116086705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Protecting Elections","authors":"Garima Shakya","doi":"10.1145/3371158.3371213","DOIUrl":"https://doi.org/10.1145/3371158.3371213","url":null,"abstract":"Election control considers the issues where an external agent wants to change the structure of the election to change the outcome. Researchers model the problem as a Stackelberg game with two players, Defender and Attacker. The questions we focus on are, is it possible for the attacker to influence the outcome of the election? How hard is it to change the winner by deleting or manipulating a finite number of voter groups? How hard is it to defend the election with a finite budget? How the problem of attack and defend varies with the limited information the defender and attacker have?","PeriodicalId":360747,"journal":{"name":"Proceedings of the 7th ACM IKDD CoDS and 25th COMAD","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127727656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Extensive Analysis on Deep Neural Architecture for Classification of Subject-Independent Cognitive States","authors":"Sumanto Dutta, Anup Nandy","doi":"10.1145/3371158.3371181","DOIUrl":"https://doi.org/10.1145/3371158.3371181","url":null,"abstract":"Human mental state can be measured by analyzing and understanding EEG (Electroencephalogram) signal in various applications such as neuro-science, brain-computer interfaces, etc. It is an important area of research where machine learning algorithms are being used to develop tools for mental state classification. The modern deep learning algorithms can be used on large EEG data set after applying the data augmentation process on them. In this paper, we apply the Deep Belief Network (DBN) model based on the Restricted Boltzmann Machine (RBM) for unsupervised feature learning of EEG signals to extract salient features for classification. This DBN model provides an unsupervised taxonomy-based system without human intervention. The efficiency of this model is evaluated on the ambulatory EEG signal with other deep learning algorithms. Experimental results demonstrate that DBN with Recurrent Neural Network-Long Short Term Memory (DBN-RNN-LSTM) provides an accuracy of 98.3% which is better than RNN-LSTM and other classical machine learning algorithm.","PeriodicalId":360747,"journal":{"name":"Proceedings of the 7th ACM IKDD CoDS and 25th COMAD","volume":"144 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131662338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"AVADHAN: System for Open-Domain Telugu Question Answering","authors":"Priyanka Ravva, Ashok Urlana, Manish Shrivastava","doi":"10.1145/3371158.3371193","DOIUrl":"https://doi.org/10.1145/3371158.3371193","url":null,"abstract":"This paper presents the Question Answering (QA) system for a low resource language like 'Telugu' named 'AVADHAN'. This work started with preparing a pre-tagged data set for Telugu Question Classification (QC). We also explained the ambiguities and complexities involved in the data set. AVADHAN exhibits the comparisons between Support Vector Machine (SVM), Logistic Regression (LR) and Multi-Layer Perceptron (MLP) classifiers for achieving the plausible answers. After performing various experiments the overall accuracies obtained, for both 'exact match' and 'partial match' based approaches, were for SVM (31.6%, 68.5%), LR (31%, 66.6%) and for MLP (30%, 67%) respectively.","PeriodicalId":360747,"journal":{"name":"Proceedings of the 7th ACM IKDD CoDS and 25th COMAD","volume":"100 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131685792","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}