Proceedings of the 7th ACM IKDD CoDS and 25th COMAD最新文献_第4页

Innovation and Revenue: Deep Diving into the Temporal Rank-shifts of Fortune 500 Companies 创新与收入:深入探究财富500强企业的时序排名变化

Proceedings of the 7th ACM IKDD CoDS and 25th COMAD Pub Date : 2020-01-05 DOI: 10.1145/3371158.3371199

M. Singh, Arindam Pal, Lipika Dey, Animesh Mukherjee

引用次数: 3

Causal Inference and Counterfactual Reasoning 因果推理和反事实推理

Proceedings of the 7th ACM IKDD CoDS and 25th COMAD Pub Date : 2020-01-05 DOI: 10.1145/3371158.3371231

Amit Sharma, Emre Kıcıman

{"title":"Causal Inference and Counterfactual Reasoning","authors":"Amit Sharma, Emre Kıcıman","doi":"10.1145/3371158.3371231","DOIUrl":"https://doi.org/10.1145/3371158.3371231","url":null,"abstract":"As computing systems are more frequently and more actively intervening to improve people's work and daily lives, it is critical to correctly predict and understand the causal effects of these interventions. Conventional machine learning methods, built on pattern recognition and correlational analyses, are insufficient for causal analysis. This tutorial will introduce participants to concepts in causal inference and counterfactual reasoning, drawing from a broad literature on the topic from statistics, social sciences and machine learning. We will motivate the use of causal inference through examples in domains such as recommender systems, social media datasets, health, education and governance. To tackle such questions, we will introduce the key ingredient that causal analysis depends on---counterfactual reasoning---and describe the two most popular frameworks based on Bayesian graphical models and potential outcomes. Based on this, we will cover methods suitable for doing causal inference with large-scale online data, including randomized experiments, observational methods like matching and stratification, and natural experiment-based methods such as instrumental variables and regression discontinuity. We will also focus on best practices for evaluation and validation of causal inference techniques, drawing from our own experiences. We will show application of these techniques using DoWhy, a Python library for causal inference. Throughout, the emphasis will be on considerations of working with large-scale data, such as logs of user interactions or social data.","PeriodicalId":360747,"journal":{"name":"Proceedings of the 7th ACM IKDD CoDS and 25th COMAD","volume":"33 15","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120824722","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Benchmarking Synchronous and Asynchronous Stream Processing Systems 同步和异步流处理系统的基准测试

Proceedings of the 7th ACM IKDD CoDS and 25th COMAD Pub Date : 2020-01-05 DOI: 10.1145/3371158.3371206

V. E. Venugopal, M. Theobald

{"title":"Benchmarking Synchronous and Asynchronous Stream Processing Systems","authors":"V. E. Venugopal, M. Theobald","doi":"10.1145/3371158.3371206","DOIUrl":"https://doi.org/10.1145/3371158.3371206","url":null,"abstract":"With the recent advancements in Big Data and Internet-of-Things (IoT) applications, we observe a continued growth in the generation of streaming data produced by sensor and social networks, broadcasting systems, e-commerce, and many others. Even though Big Data platforms such as Apache Hadoop [12], Spark [13], Storm [11] and Kafka [9] would serve the purpose, their underlying batch mode of operation makes it necessary to first split the incoming data streams into batches, and to then synchronously execute a given analytical workflow over these data batches. To overcome the limitations of these synchronous stream-processing architectures, asynchronous stream-processing (ASP) engines such as such as Apache Flink [1], Samza [10] and Naiad [7, 8] have recently emerged. Although the asynchronous way of handling streams is reported to be the prime reason for the performance gains (in terms of sustainable throughput [4, 6] and per-window latencies) of ASP engines, we believe that their architectural similarity with the original design of Hadoop still is not critically enough investigated. Given the inherent deviances of distributed computations (due to communication and network delays, scheduling algorithms, time spent on processing, serialization/deserialization, etc.), the performance of platforms built on a master-client architecture still is often bound by hidden synchronization barriers and the constant need of state exchange (and hence communication) between the master and the client nodes. To understand the upper bound of the maximum sustainable throughput [5] that is possible for a given node configuration, we have designed multiple hard-coded multi-threaded processes (called ad-hoc dataflows1) in C++ using Message Passing Interface (MPI) and Pthread libraries, for two use-cases, namely Yahoo! streaming benchmark (YSB) [2] and Simple WindowedAggregation (SWA), such that they could collectively process an input stream based on the logic of the use-case. These dataflows once deployed could asynchronously communicate with each other to perform the use-case specific operations with 100% accuracy. The performance of these light-weight ad-hoc dataflows is compared against the main competitors among the stream data processing","PeriodicalId":360747,"journal":{"name":"Proceedings of the 7th ACM IKDD CoDS and 25th COMAD","volume":"110 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130221108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

SwaGrader

Proceedings of the 7th ACM IKDD CoDS and 25th COMAD Pub Date : 2020-01-05 DOI: 10.1145/3371158.3371205

Somu Prajapati, Ayushi Gupta, S. Nigam, Swaprava Nath

{"title":"SwaGrader","authors":"Somu Prajapati, Ayushi Gupta, S. Nigam, Swaprava Nath","doi":"10.1145/3371158.3371205","DOIUrl":"https://doi.org/10.1145/3371158.3371205","url":null,"abstract":"Massive open online courses pose a massive challenge for grading the answer scripts at a high accuracy. Peer grading is often viewed as a scalable solution to this challenge, which largely depends on the altruism of the peer graders. In this paper, we propose to demonstrate a tool designed for strategic peer-grading with the help of a structured and typical grading workflow. SwaGrader, a modular, secure and customizable (to any grading workflow) peer-grading tool enables the instructor to handle large courses (MOOCs and offline) with limited participation by teaching staff via a web-based application (extensible to any front-end framework based application) and a mechanism called TRUPEQA[1]. TRUPEQA (a) uses a constant number of instructor-graded answer-scripts to quantitatively measure the accuracies of the peer graders and corrects the scores accordingly, and (b) penalizes deliberate under-performing. We show that this mechanism is unique in its class to satisfy certain properties. Our human subject experiments show that TRUPEQA improves the grading quality over the mechanisms currently used in standard MOOCs. Our mechanism outperforms several standard peer grading techniques used in practice, even at times when the graders are non-manipulative.","PeriodicalId":360747,"journal":{"name":"Proceedings of the 7th ACM IKDD CoDS and 25th COMAD","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127919878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Memeify

Proceedings of the 7th ACM IKDD CoDS and 25th COMAD Pub Date : 2020-01-05 DOI: 10.1145/3371158.3371403

S. R. Vyalla, Vishaal Udandarao

引用次数: 5

An improved human-in-the-loop model for fine-grained object recognition with batch-based question answering 一种改进的基于批量问答的细粒度目标识别的人在环模型

Proceedings of the 7th ACM IKDD CoDS and 25th COMAD Pub Date : 2020-01-05 DOI: 10.1145/3371158.3371174

V. Gutta, N. Unnam, P. Reddy

{"title":"An improved human-in-the-loop model for fine-grained object recognition with batch-based question answering","authors":"V. Gutta, N. Unnam, P. Reddy","doi":"10.1145/3371158.3371174","DOIUrl":"https://doi.org/10.1145/3371158.3371174","url":null,"abstract":"Fine-grained object recognition refers to a subordinate level of object recognition such as recognition of bird species and car models. It has become crucial for recognition of previously unknown classes. While fine-grained object recognition has seen unprecedented progress with the advent of neural networks, many of the existing works are cost-sensitive as they are acutely picture-dependent and fail without the adequate number of quality pictures. Efforts have been made in the literature for a picture-independent recognition with hybrid human-computer recognition methods via single question answering with a human-in-the-loop. To this end, we propose an improved batch-based local question answering method for making the recognition efficient and picture-independent. When pictures are unavailable, at each time-step, the proposed method mines a batch of binary cluster-centric local questions to pose to a human-in-the-loop and incorporates the responses received to the questions into the model. After a preset number of time-steps, the most probable class of the target object is returned as the final prediction. When pictures are available, our model facilitates the plug-in of computer vision algorithms into the framework for better performance. Experiments on three challenging datasets show significant performance improvement with respect to accuracy and computation time as compared to the existing schemes.","PeriodicalId":360747,"journal":{"name":"Proceedings of the 7th ACM IKDD CoDS and 25th COMAD","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125590371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Ranking and Discovering Anomalous Neighborhoods in Attributed Multiplex Networks 归属复用网络中异常邻域的排序与发现

Proceedings of the 7th ACM IKDD CoDS and 25th COMAD Pub Date : 2020-01-05 DOI: 10.1145/3371158.3371164

Monika Bansal, Dolly Sharma

{"title":"Ranking and Discovering Anomalous Neighborhoods in Attributed Multiplex Networks","authors":"Monika Bansal, Dolly Sharma","doi":"10.1145/3371158.3371164","DOIUrl":"https://doi.org/10.1145/3371158.3371164","url":null,"abstract":"The attributed multiplex network is a set of attributed networks in which each network represents a different type of interaction between the same set of nodes. Individual networks are termed as layers or dimensions and network nodes are characterized by attribute vectors. Neighborhood, in general, refers to any dense connected subgraph. We refer neighborhood1 as subgraph induced on graph node and its neighbors. It is usually observed that majority of the nodes in multilayer networks are active only on small number of layers except some outliers [18]. However, node activity is not strictly correlated to the edges incident in a node. A node might be active at few layers with relatively large number of incident edges and at the same time, multi-active node might not have many links even on single layer. Moreover, each layer has distinct importance in the multiplex networks2 and the structure and size of neighborhood formed by these multiplex nodes are different on each layer. Nodes with different attributes come together on different layers in the attributed multiplex networks. This node and layer heterogeneity should be considered while identifying anomalous neighborhoods in the attributed multiplex networks. Thus, a measure is required to quantify the quality of neighborhoods formed by active nodes on different layers. Existing approaches do not consider heterogeneity among network layers and do quantify the structure of networks either separately for each layer or its aggregated network and ignore the attributes of nodes. In this work, we define a novel quality measure Multi-Normality which utilizes the structure and attributes together of each layer and detect attribute coherence in neighborhoods between layers. We also propose an algorithm exhausting multi-normality to identify anomalous neighborhoods in multiplex networks and is named as Anomaly Detection of Entity Neighborhoods in Multiplex Networks (ADENMN). We evaluate the effectiveness of the proposed algorithm in anomaly detection by comparing its performance with three existing baseline approaches including ADOMS, AMM and AGG+AD on five real-world attributed multiplex networks including Amazon, YouTube, Noordin top terrorist network, DBLP_C, and Aarhus. The results of experiments demonstrate that multi-normality outperforms baseline algorithms.","PeriodicalId":360747,"journal":{"name":"Proceedings of the 7th ACM IKDD CoDS and 25th COMAD","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134355811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

A Natural Language and Interactive End-to-End Querying and Reporting System 一个自然语言和交互式端到端查询和报告系统

Proceedings of the 7th ACM IKDD CoDS and 25th COMAD Pub Date : 2020-01-05 DOI: 10.1145/3371158.3371198

S. Joshi, Bharath Venkatesh, Dawn Thomas, Yue Jiao, Shourya Roy

{"title":"A Natural Language and Interactive End-to-End Querying and Reporting System","authors":"S. Joshi, Bharath Venkatesh, Dawn Thomas, Yue Jiao, Shourya Roy","doi":"10.1145/3371158.3371198","DOIUrl":"https://doi.org/10.1145/3371158.3371198","url":null,"abstract":"Natural language query understanding for unstructured textual sources has seen significant progress over the last couple of decades. For structured data, while the ecosystem has evolved with regard to data storage and retrieval mechanisms, the query language has remained predominantly SQL (or SQL-like). Towards making the latter more natural there has been recent research emphasis on Natural Language Interface to DataBases (NLIDB) systems. Piggybacking on the rise of 'deep learning' systems, the state-of-the-art NLIDB solutions over large parallel and standard benchmarks (viz, WikiSQL and Spider) primarily rely on attention based sequence-to-sequence models. Building industry grade NLIDB solutions for making big data ecosystem accessible by truly natural and unstructured querying mechanism presents several challenges. These include lack of availability of parallel corpora, diversity in underlying data schema, wide variability in the nature of queries to context and dialog management in interactive systems. In this paper, we present an end-to-end system Query Enterprise Data (QED) towards making enterprise descriptive analytics and reporting easier and natural. We elaborate in detail how we addressed the challenges mentioned above and novel features such as handling incomplete queries in incremental fashion as well as highlight the role of an assistive user interface that provides a better user experience. Finally, we conclude the paper with observations and lessons learnt from the experience of transferring and deploying a research solution to industry grade practical deployment.","PeriodicalId":360747,"journal":{"name":"Proceedings of the 7th ACM IKDD CoDS and 25th COMAD","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132904596","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Enhancing Neural Sentiment Analysis with Aspect Weights 用方面权增强神经情感分析

Proceedings of the 7th ACM IKDD CoDS and 25th COMAD Pub Date : 2020-01-05 DOI: 10.1145/3371158.3371211

Urmi Saha, Abhijeet Dubey, Aditya Joshi, Pushpak Bhattachharyya

引用次数: 0

A Hybrid Distributed Model for Learning Representation of Short Texts with Attribute Labels 带有属性标签的短文本学习表示的混合分布式模型

Proceedings of the 7th ACM IKDD CoDS and 25th COMAD Pub Date : 2020-01-05 DOI: 10.1145/3371158.3371195

Shashi Kumar, S. Roy, Vishal Pathak

引用次数: 0