{"title":"Robust Learning of Multi-Label Classifiers under Label Noise","authors":"Himanshu Kumar, Naresh Manwani, P. Sastry","doi":"10.1145/3371158.3371169","DOIUrl":"https://doi.org/10.1145/3371158.3371169","url":null,"abstract":"In this paper, we address the problem of robust learning of multi-label classifiers when the training data has label noise. We consider learning algorithms in the risk-minimization framework. We define what we call symmetric label noise in multi-label settings which is a useful noise model for many random errors in the labeling of data. We prove that risk minimization is robust to symmetric label noise if the loss function satisfies some conditions. We show that Hamming loss and a surrogate of Hamming loss satisfy these sufficient conditions and hence are robust. By learning feedforward neural networks on some benchmark multi-label datasets, we provide empirical evidence to illustrate our theoretical results on the robust learning of multi-label classifiers under label noise.","PeriodicalId":360747,"journal":{"name":"Proceedings of the 7th ACM IKDD CoDS and 25th COMAD","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115594998","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Unified System for Aggression Identification in English Code-Mixed and Uni-Lingual Texts","authors":"Anant Khandelwal, Niraj Kumar","doi":"10.1145/3371158.3371165","DOIUrl":"https://doi.org/10.1145/3371158.3371165","url":null,"abstract":"Wide usage of social media platforms has increased the risk of aggression, which results in mental stress and affects the lives of people negatively like psychological agony, fighting behavior, and disrespect to others. Majority of such conversations contains code-mixed languages[28]. Additionally, the way used to express thought or communication style also changes from one social media platform to another platform (e.g., communication styles are different in twitter and Facebook). These all have increased the complexity of the problem. To solve these problems, we have introduced a unified and robust multi-modal deep learning architecture which works for English code-mixed dataset and uni-lingual English dataset both. The devised system, uses psycho-linguistic features and very basic linguistic features. Our multi-modal deep learning architecture contains, Deep Pyramid CNN, Pooled BiLSTM, and Disconnected RNN(with Glove and FastText embedding, both). Finally, the system takes the decision based on model averaging. We evaluated our system on English Code-Mixed TRAC1 2018 dataset and uni-lingual English dataset obtained from Kaggle2. Experimental results show that our proposed system outperforms all the previous approaches on English code-mixed dataset and uni-lingual English dataset.","PeriodicalId":360747,"journal":{"name":"Proceedings of the 7th ACM IKDD CoDS and 25th COMAD","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114137477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Deepak Narayanan, Apoorv Agnihotri, Nipun Batra
{"title":"Active Learning for Air Quality Station Location Recommendation","authors":"S. Deepak Narayanan, Apoorv Agnihotri, Nipun Batra","doi":"10.1145/3371158.3371208","DOIUrl":"https://doi.org/10.1145/3371158.3371208","url":null,"abstract":"Motivation: Recent years have seen a decline in air quality across the planet, with studies suggesting that a significant proportion of global population has reduced life expectancy by up to 4 years [1, 2, 5]. To tackle this increasing growth in air pollution and its adverse effects, governments across the world have set up air quality monitoring stations that measure concentrations of various pollutants like NO2, SO2 and PM2.5, of which PM2.5 especially has significant health impact and is used for measuring air quality. One major issue with the deployment of these stations is the massive cost involved. Owing to the high installation and maintenance costs, the spatial resolution of air quality monitoring is generally poor. In this current work, we propose active learning methods to choose the next location to install an air quality monitor, motivated by sparse spatial air quality monitoring and expensive sensing equipment. Related Work: Previous work has predominantly focused on interpolation and forecasting of air quality [7, 8]. Work on air quality station location recommendation has largely been limited [4]. Previous work [4, 7, 8] has shown that installing air quality stations uniformly to maximize spatial coverage does not work well in practice, which acts as a major motivation for our work. Problem Statement: Given a set S of air quality monitoring stations, along with their corresponding values of PM2.5 over a period of time {d1,d2, ....dn }, where di represents day i , we want to choose a new location s ′, such that installing a station at s ′ gives us the best estimate of air quality at unknown locations. Approach: We perform active learning using Query by Committee (QBC) [6].Wemaintain three sets of stations the train set, the test set, and the pool set. The train set contains currently monitored locations, test set contains the locations where we wish to estimate the air quality and the pool set contains candidate stations for querying, i.e., we query from the pool set and observe how our estimation improves on the test set. To query from the pool set, we need a measure of uncertainty for the stations in the pool set. To obtain this uncertainty, we train an ensemble of learners, and take the standard deviation of their predictions for each station in the pool set. We add the station with maximum standard deviation to our train set, and remove the same station from the pool set. We repeat this process as time progresses. We use K Neighbors Regressor (KNN) as our main model inspired by the fact that nearby days will likely have similar air quality (temporal locality), and so will nearby stations (spatial","PeriodicalId":360747,"journal":{"name":"Proceedings of the 7th ACM IKDD CoDS and 25th COMAD","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117167341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Adversarial Demotion of Bias in Natural Language Generation","authors":"M. Jegadeesan","doi":"10.1145/3371158.3371229","DOIUrl":"https://doi.org/10.1145/3371158.3371229","url":null,"abstract":"Natural Language Generation models have been a critical area of research in application-oriented artificial intelligence tasks, such as dialogue systems, machine translation, and question answering. The next crucial step in this direction is to ensure quality of generated text. This work proposes a novel method based on adversarial training to mitigate gender bias in generation systems, and can be extended to remove any unwanted characteristics in the generated text.","PeriodicalId":360747,"journal":{"name":"Proceedings of the 7th ACM IKDD CoDS and 25th COMAD","volume":"95 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121867579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Predicting Outcomes in Limited-Overs Cricket Matches","authors":"Natwar Modani, Manoj Kilaru, Anjan Kaur, Ritwik Sinha, Harsh Khetan","doi":"10.1145/3371158.3371166","DOIUrl":"https://doi.org/10.1145/3371158.3371166","url":null,"abstract":"Cricket is a popular sport in the commonwealth countries, particularly the limited over formats. As with any sport, predicting the outcome of the game of cricket is of popular interest. For the first innings, the task is to predict the eventual score that the team batting first will reach. For the second innings, the task is to predict the match result. Existing algorithms for predicting the outcome of limited over cricket matches are simplistic and their performance leaves room for improvement. In this paper, we provide novel features including team strength indicators that capture the situation of the match more comprehensively and accurately. We use a collection of state-of-the-art supervised Machine Learning (ML) approaches for the prediction tasks. Further, we also present an approach based on Long-Short Term Memory (LSTM) Networks to incorporate the oft-mentioned concept of 'momentum' for predicting the outcomes. We show with real data that the mentioned ML models outperform the current state of art (WASP) in outcome prediction for cricket. Further, we show that incorporating the proposed features improves prediction accuracy. Finally, the LSTM model outperforms all other models with the same set of features, thereby confirming that 'momentum' indeed helps us in better prediction of outcomes.","PeriodicalId":360747,"journal":{"name":"Proceedings of the 7th ACM IKDD CoDS and 25th COMAD","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125441101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abhirut Gupta, Sandipan Sikdar, P. Mohapatra, Niloy Ganguly
{"title":"Topic Influence Graph Based Analysis of Seminal Papers","authors":"Abhirut Gupta, Sandipan Sikdar, P. Mohapatra, Niloy Ganguly","doi":"10.1145/3371158.3371191","DOIUrl":"https://doi.org/10.1145/3371158.3371191","url":null,"abstract":"Every scientific article attempts to derive knowledge from existing literature and augment it with new insights. This dynamics of knowledge is commonly explored through references (it draws knowledge from) and citations (its impact on the field). In this paper, we propose to explore this phenomenon through construction of a topic influence graph (TIG) based on topic similarity between articles. More importantly, out of the set of possible TIGs, we determine an optimal TIG by using knowledge from citation graphs. Construction of TIG leverages traditional network analysis tools like community (sub-field) identification. In this paper, we construct the TIG on the ACL Anthology Network (AAN) dataset and leverage it to analyze the properties of seminal papers. Interestingly, we observe that seminal papers disseminate knowledge across different communities, trigger more research within its own community and apart from introducing new ideas, string together ideas from different communities.","PeriodicalId":360747,"journal":{"name":"Proceedings of the 7th ACM IKDD CoDS and 25th COMAD","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126247811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jyotsana Khatri, V. Rudra Murthy, P. Bhattacharyya
{"title":"A Study of Efficacy of Cross-lingual Word Embeddings for Indian Languages","authors":"Jyotsana Khatri, V. Rudra Murthy, P. Bhattacharyya","doi":"10.1145/3371158.3371219","DOIUrl":"https://doi.org/10.1145/3371158.3371219","url":null,"abstract":"Cross-lingual word embeddings have become ubiquitous for various NLP tasks. Existing literature primarily evaluate the quality of cross-lingual word embeddings on the task of Bilingual Lexicon Induction. They report very high accuracies for European languages. In this paper, we report the accuracy of Bilingual Lexicon Induction (BLI) task for cross-lingual word embeddings generated using two mapping based unsupervised approaches: VecMap and MUSE for Indian languages on a dataset created using linked Indian Wordnet. We also show the comparison of these approaches with a simple baseline where the embeddings for all languages are trained using fast-text on the combined corpora of 11 Indian languages. Our experiments show that existing cross-lingual word embedding approaches give low accuracy on bilingual lexicon induction for cognate words. Given the high cognate overlap of several Indian languages, this is a serious limitation of existing approaches.","PeriodicalId":360747,"journal":{"name":"Proceedings of the 7th ACM IKDD CoDS and 25th COMAD","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133271136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Knowledge Graph based Automated Generation of Test Cases in Software Engineering","authors":"Anmol Nayak, Vaibhav Kesri, R. Dubey","doi":"10.1145/3371158.3371202","DOIUrl":"https://doi.org/10.1145/3371158.3371202","url":null,"abstract":"Knowledge Graph (KG) is extremely efficient in storing and retrieving information from data that contains complex relationships between entities. Such a representation is relevant in software engineering projects, which contain large amounts of inter-dependencies between classes, modules, functions etc. In this paper, we propose a methodology to create a KG from software engineering documents that will be used for automated generation of test cases from natural (domain) language requirement statements. We propose a KG creation tool that includes a novel Constituency Parse Tree (CPT) based path finding algorithm for test intent extraction, Conditional Random field (CRF) based Named Entity Recognition (NER) model with automatic feature engineering and a Sentence vector embedding based signal extraction. This paper demonstrates the contributions on an automotive domain software project.","PeriodicalId":360747,"journal":{"name":"Proceedings of the 7th ACM IKDD CoDS and 25th COMAD","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115429741","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Stance Detection in Hindi-English Code-Mixed Data","authors":"Jethva Utsav, Dhaiwat Kabaria, Ribhu Vajpeyi, Mohit Mina, Vivek Srivastava","doi":"10.1145/3371158.3371226","DOIUrl":"https://doi.org/10.1145/3371158.3371226","url":null,"abstract":"Social media sites such as Twitter, Facebook, and many other microblogging forums have emerged as a platform for people to express their opinions and perspectives on different events. People often tend to take a stance; in favor, against or neutral towards a particular topic on these platforms. Hindi and English are the most widely used languages on social media platforms in India, and the user predominantly expresses their opinions in Hindi-English code-mixed texts. As a result, knowing the diverse opinions of the masses is difficult. We target to classify Hindi-English code-mixed tweets based on their stance. A dataset consisting of 3545 English-Hindi code-mixed tweets with Demonetisation in the target is used in the experiments so far. We present a new stance annotated dataset of English-Hindi 4219 code-mixed tweets with the abrogation of article 370 in focus.","PeriodicalId":360747,"journal":{"name":"Proceedings of the 7th ACM IKDD CoDS and 25th COMAD","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126496954","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Interaction dynamics between hate and counter users on Twitter","authors":"Binny Mathew, Navish Kumar, Pawan Goyal, Animesh Mukherjee","doi":"10.1145/3371158.3371172","DOIUrl":"https://doi.org/10.1145/3371158.3371172","url":null,"abstract":"Social media platforms usually tackle the proliferation of hate speech by blocking/suspending the message or account. One of the major drawback of such measures is the restriction of free speech. In this paper, we investigate the interaction of hatespeech and the responses that counter it (aka counter-speech). One of the prime contribution of this work is that we developed and released1 a dataset where we annotate pairs of hate and counter users. We perform several lexical, linguistic and psycholinguistic analysis on these annotated accounts and observe that the couterspeakers of the target communities employ different strategies to tackle the hatespeech. The hate users seem to be more popular as we observe that they are more subjective, express more negative sentiment, tweet more and have more followers. While the hate users seem to use words more about envy, hate, negative emotion, swearing terms, ugliness, the counter users use more words related to government, law, leader. Finally, we build a classifier to detect if a user is a hateful or counter speaker. This identification can help the platform to devise different incentive mechanisms to demote hate and promote counter speakers. Overall, our study unfolds for the first time, the interaction dynamics of the hate and counter users which could pave a more effective way for combating hate content on Twitter than just suspending the hate accounts.","PeriodicalId":360747,"journal":{"name":"Proceedings of the 7th ACM IKDD CoDS and 25th COMAD","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124376953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}