Proceedings of the 7th ACM IKDD CoDS and 25th COMAD最新文献

筛选
英文 中文
Innovation and Revenue: Deep Diving into the Temporal Rank-shifts of Fortune 500 Companies 创新与收入:深入探究财富500强企业的时序排名变化
Proceedings of the 7th ACM IKDD CoDS and 25th COMAD Pub Date : 2020-01-05 DOI: 10.1145/3371158.3371199
M. Singh, Arindam Pal, Lipika Dey, Animesh Mukherjee
{"title":"Innovation and Revenue: Deep Diving into the Temporal Rank-shifts of Fortune 500 Companies","authors":"M. Singh, Arindam Pal, Lipika Dey, Animesh Mukherjee","doi":"10.1145/3371158.3371199","DOIUrl":"https://doi.org/10.1145/3371158.3371199","url":null,"abstract":"Research and innovation is an important agenda for any company to remain competitive in the market. The relationship between innovation and revenue is a key metric for companies to decide on the amount to be invested for future research. Two important parameters to evaluate innovation are the quantity and quality of scientific papers and patents. Our work studies the relationship between innovation and patenting activities for several Fortune 500 companies over a period of time. We perform a comprehensive study of the patent citation dataset available in the Reed Technology Index collected from the US Patent Office. We observe several interesting relations between parameters like the number of (i) patent applications, (ii) patent grants, (iii) patent citations and Fortune 500 ranks of companies. We also study the trends of these parameters varying over the years and derive causal explanations for these with qualitative and intuitive reasoning. To facilitate reproducible research, we make all the processed patent dataset publicly available.1","PeriodicalId":360747,"journal":{"name":"Proceedings of the 7th ACM IKDD CoDS and 25th COMAD","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114103744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Causal Inference and Counterfactual Reasoning 因果推理和反事实推理
Proceedings of the 7th ACM IKDD CoDS and 25th COMAD Pub Date : 2020-01-05 DOI: 10.1145/3371158.3371231
Amit Sharma, Emre Kıcıman
{"title":"Causal Inference and Counterfactual Reasoning","authors":"Amit Sharma, Emre Kıcıman","doi":"10.1145/3371158.3371231","DOIUrl":"https://doi.org/10.1145/3371158.3371231","url":null,"abstract":"As computing systems are more frequently and more actively intervening to improve people's work and daily lives, it is critical to correctly predict and understand the causal effects of these interventions. Conventional machine learning methods, built on pattern recognition and correlational analyses, are insufficient for causal analysis. This tutorial will introduce participants to concepts in causal inference and counterfactual reasoning, drawing from a broad literature on the topic from statistics, social sciences and machine learning. We will motivate the use of causal inference through examples in domains such as recommender systems, social media datasets, health, education and governance. To tackle such questions, we will introduce the key ingredient that causal analysis depends on---counterfactual reasoning---and describe the two most popular frameworks based on Bayesian graphical models and potential outcomes. Based on this, we will cover methods suitable for doing causal inference with large-scale online data, including randomized experiments, observational methods like matching and stratification, and natural experiment-based methods such as instrumental variables and regression discontinuity. We will also focus on best practices for evaluation and validation of causal inference techniques, drawing from our own experiences. We will show application of these techniques using DoWhy, a Python library for causal inference. Throughout, the emphasis will be on considerations of working with large-scale data, such as logs of user interactions or social data.","PeriodicalId":360747,"journal":{"name":"Proceedings of the 7th ACM IKDD CoDS and 25th COMAD","volume":"33 15","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120824722","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Benchmarking Synchronous and Asynchronous Stream Processing Systems 同步和异步流处理系统的基准测试
Proceedings of the 7th ACM IKDD CoDS and 25th COMAD Pub Date : 2020-01-05 DOI: 10.1145/3371158.3371206
V. E. Venugopal, M. Theobald
{"title":"Benchmarking Synchronous and Asynchronous Stream Processing Systems","authors":"V. E. Venugopal, M. Theobald","doi":"10.1145/3371158.3371206","DOIUrl":"https://doi.org/10.1145/3371158.3371206","url":null,"abstract":"With the recent advancements in Big Data and Internet-of-Things (IoT) applications, we observe a continued growth in the generation of streaming data produced by sensor and social networks, broadcasting systems, e-commerce, and many others. Even though Big Data platforms such as Apache Hadoop [12], Spark [13], Storm [11] and Kafka [9] would serve the purpose, their underlying batch mode of operation makes it necessary to first split the incoming data streams into batches, and to then synchronously execute a given analytical workflow over these data batches. To overcome the limitations of these synchronous stream-processing architectures, asynchronous stream-processing (ASP) engines such as such as Apache Flink [1], Samza [10] and Naiad [7, 8] have recently emerged. Although the asynchronous way of handling streams is reported to be the prime reason for the performance gains (in terms of sustainable throughput [4, 6] and per-window latencies) of ASP engines, we believe that their architectural similarity with the original design of Hadoop still is not critically enough investigated. Given the inherent deviances of distributed computations (due to communication and network delays, scheduling algorithms, time spent on processing, serialization/deserialization, etc.), the performance of platforms built on a master-client architecture still is often bound by hidden synchronization barriers and the constant need of state exchange (and hence communication) between the master and the client nodes. To understand the upper bound of the maximum sustainable throughput [5] that is possible for a given node configuration, we have designed multiple hard-coded multi-threaded processes (called ad-hoc dataflows1) in C++ using Message Passing Interface (MPI) and Pthread libraries, for two use-cases, namely Yahoo! streaming benchmark (YSB) [2] and Simple WindowedAggregation (SWA), such that they could collectively process an input stream based on the logic of the use-case. These dataflows once deployed could asynchronously communicate with each other to perform the use-case specific operations with 100% accuracy. The performance of these light-weight ad-hoc dataflows is compared against the main competitors among the stream data processing","PeriodicalId":360747,"journal":{"name":"Proceedings of the 7th ACM IKDD CoDS and 25th COMAD","volume":"110 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130221108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
SwaGrader
Proceedings of the 7th ACM IKDD CoDS and 25th COMAD Pub Date : 2020-01-05 DOI: 10.1145/3371158.3371205
Somu Prajapati, Ayushi Gupta, S. Nigam, Swaprava Nath
{"title":"SwaGrader","authors":"Somu Prajapati, Ayushi Gupta, S. Nigam, Swaprava Nath","doi":"10.1145/3371158.3371205","DOIUrl":"https://doi.org/10.1145/3371158.3371205","url":null,"abstract":"Massive open online courses pose a massive challenge for grading the answer scripts at a high accuracy. Peer grading is often viewed as a scalable solution to this challenge, which largely depends on the altruism of the peer graders. In this paper, we propose to demonstrate a tool designed for strategic peer-grading with the help of a structured and typical grading workflow. SwaGrader, a modular, secure and customizable (to any grading workflow) peer-grading tool enables the instructor to handle large courses (MOOCs and offline) with limited participation by teaching staff via a web-based application (extensible to any front-end framework based application) and a mechanism called TRUPEQA[1]. TRUPEQA (a) uses a constant number of instructor-graded answer-scripts to quantitatively measure the accuracies of the peer graders and corrects the scores accordingly, and (b) penalizes deliberate under-performing. We show that this mechanism is unique in its class to satisfy certain properties. Our human subject experiments show that TRUPEQA improves the grading quality over the mechanisms currently used in standard MOOCs. Our mechanism outperforms several standard peer grading techniques used in practice, even at times when the graders are non-manipulative.","PeriodicalId":360747,"journal":{"name":"Proceedings of the 7th ACM IKDD CoDS and 25th COMAD","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127919878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Memeify
Proceedings of the 7th ACM IKDD CoDS and 25th COMAD Pub Date : 2020-01-05 DOI: 10.1145/3371158.3371403
S. R. Vyalla, Vishaal Udandarao
{"title":"Memeify","authors":"S. R. Vyalla, Vishaal Udandarao","doi":"10.1145/3371158.3371403","DOIUrl":"https://doi.org/10.1145/3371158.3371403","url":null,"abstract":"Interest in the research areas related to meme propagation and generation has been increasing rapidly in the last couple of years. Meme datasets available online are either specific to a context or contain no class information. Here, we prepare a large-scale dataset of memes with captions and class labels. The dataset consists of 1.1 million meme captions from 128 classes. We also provide a reasoning for the existence of broad categories, called 'themes' across the meme dataset; each theme consists of multiple meme classes. Our generation system uses a trained state-of-the-art transformer based model for caption generation by employing an encoderdecoder architecture. We develop a web interface, called Memeify for users to generate memes of their choice, and explain in detail, the working of individual components of the system. We also perform qualitative evaluation of the generated memes by conducting a user study. A link to the demonstration of the Memeify system is https://youtu.be/P_Tfs0X-czs.","PeriodicalId":360747,"journal":{"name":"Proceedings of the 7th ACM IKDD CoDS and 25th COMAD","volume":"518 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116244724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
An improved human-in-the-loop model for fine-grained object recognition with batch-based question answering 一种改进的基于批量问答的细粒度目标识别的人在环模型
Proceedings of the 7th ACM IKDD CoDS and 25th COMAD Pub Date : 2020-01-05 DOI: 10.1145/3371158.3371174
V. Gutta, N. Unnam, P. Reddy
{"title":"An improved human-in-the-loop model for fine-grained object recognition with batch-based question answering","authors":"V. Gutta, N. Unnam, P. Reddy","doi":"10.1145/3371158.3371174","DOIUrl":"https://doi.org/10.1145/3371158.3371174","url":null,"abstract":"Fine-grained object recognition refers to a subordinate level of object recognition such as recognition of bird species and car models. It has become crucial for recognition of previously unknown classes. While fine-grained object recognition has seen unprecedented progress with the advent of neural networks, many of the existing works are cost-sensitive as they are acutely picture-dependent and fail without the adequate number of quality pictures. Efforts have been made in the literature for a picture-independent recognition with hybrid human-computer recognition methods via single question answering with a human-in-the-loop. To this end, we propose an improved batch-based local question answering method for making the recognition efficient and picture-independent. When pictures are unavailable, at each time-step, the proposed method mines a batch of binary cluster-centric local questions to pose to a human-in-the-loop and incorporates the responses received to the questions into the model. After a preset number of time-steps, the most probable class of the target object is returned as the final prediction. When pictures are available, our model facilitates the plug-in of computer vision algorithms into the framework for better performance. Experiments on three challenging datasets show significant performance improvement with respect to accuracy and computation time as compared to the existing schemes.","PeriodicalId":360747,"journal":{"name":"Proceedings of the 7th ACM IKDD CoDS and 25th COMAD","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125590371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Ranking and Discovering Anomalous Neighborhoods in Attributed Multiplex Networks 归属复用网络中异常邻域的排序与发现
Proceedings of the 7th ACM IKDD CoDS and 25th COMAD Pub Date : 2020-01-05 DOI: 10.1145/3371158.3371164
Monika Bansal, Dolly Sharma
{"title":"Ranking and Discovering Anomalous Neighborhoods in Attributed Multiplex Networks","authors":"Monika Bansal, Dolly Sharma","doi":"10.1145/3371158.3371164","DOIUrl":"https://doi.org/10.1145/3371158.3371164","url":null,"abstract":"The attributed multiplex network is a set of attributed networks in which each network represents a different type of interaction between the same set of nodes. Individual networks are termed as layers or dimensions and network nodes are characterized by attribute vectors. Neighborhood, in general, refers to any dense connected subgraph. We refer neighborhood1 as subgraph induced on graph node and its neighbors. It is usually observed that majority of the nodes in multilayer networks are active only on small number of layers except some outliers [18]. However, node activity is not strictly correlated to the edges incident in a node. A node might be active at few layers with relatively large number of incident edges and at the same time, multi-active node might not have many links even on single layer. Moreover, each layer has distinct importance in the multiplex networks2 and the structure and size of neighborhood formed by these multiplex nodes are different on each layer. Nodes with different attributes come together on different layers in the attributed multiplex networks. This node and layer heterogeneity should be considered while identifying anomalous neighborhoods in the attributed multiplex networks. Thus, a measure is required to quantify the quality of neighborhoods formed by active nodes on different layers. Existing approaches do not consider heterogeneity among network layers and do quantify the structure of networks either separately for each layer or its aggregated network and ignore the attributes of nodes. In this work, we define a novel quality measure Multi-Normality which utilizes the structure and attributes together of each layer and detect attribute coherence in neighborhoods between layers. We also propose an algorithm exhausting multi-normality to identify anomalous neighborhoods in multiplex networks and is named as Anomaly Detection of Entity Neighborhoods in Multiplex Networks (ADENMN). We evaluate the effectiveness of the proposed algorithm in anomaly detection by comparing its performance with three existing baseline approaches including ADOMS, AMM and AGG+AD on five real-world attributed multiplex networks including Amazon, YouTube, Noordin top terrorist network, DBLP_C, and Aarhus. The results of experiments demonstrate that multi-normality outperforms baseline algorithms.","PeriodicalId":360747,"journal":{"name":"Proceedings of the 7th ACM IKDD CoDS and 25th COMAD","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134355811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
A Natural Language and Interactive End-to-End Querying and Reporting System 一个自然语言和交互式端到端查询和报告系统
Proceedings of the 7th ACM IKDD CoDS and 25th COMAD Pub Date : 2020-01-05 DOI: 10.1145/3371158.3371198
S. Joshi, Bharath Venkatesh, Dawn Thomas, Yue Jiao, Shourya Roy
{"title":"A Natural Language and Interactive End-to-End Querying and Reporting System","authors":"S. Joshi, Bharath Venkatesh, Dawn Thomas, Yue Jiao, Shourya Roy","doi":"10.1145/3371158.3371198","DOIUrl":"https://doi.org/10.1145/3371158.3371198","url":null,"abstract":"Natural language query understanding for unstructured textual sources has seen significant progress over the last couple of decades. For structured data, while the ecosystem has evolved with regard to data storage and retrieval mechanisms, the query language has remained predominantly SQL (or SQL-like). Towards making the latter more natural there has been recent research emphasis on Natural Language Interface to DataBases (NLIDB) systems. Piggybacking on the rise of 'deep learning' systems, the state-of-the-art NLIDB solutions over large parallel and standard benchmarks (viz, WikiSQL and Spider) primarily rely on attention based sequence-to-sequence models. Building industry grade NLIDB solutions for making big data ecosystem accessible by truly natural and unstructured querying mechanism presents several challenges. These include lack of availability of parallel corpora, diversity in underlying data schema, wide variability in the nature of queries to context and dialog management in interactive systems. In this paper, we present an end-to-end system Query Enterprise Data (QED) towards making enterprise descriptive analytics and reporting easier and natural. We elaborate in detail how we addressed the challenges mentioned above and novel features such as handling incomplete queries in incremental fashion as well as highlight the role of an assistive user interface that provides a better user experience. Finally, we conclude the paper with observations and lessons learnt from the experience of transferring and deploying a research solution to industry grade practical deployment.","PeriodicalId":360747,"journal":{"name":"Proceedings of the 7th ACM IKDD CoDS and 25th COMAD","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132904596","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Enhancing Neural Sentiment Analysis with Aspect Weights 用方面权增强神经情感分析
Proceedings of the 7th ACM IKDD CoDS and 25th COMAD Pub Date : 2020-01-05 DOI: 10.1145/3371158.3371211
Urmi Saha, Abhijeet Dubey, Aditya Joshi, Pushpak Bhattachharyya
{"title":"Enhancing Neural Sentiment Analysis with Aspect Weights","authors":"Urmi Saha, Abhijeet Dubey, Aditya Joshi, Pushpak Bhattachharyya","doi":"10.1145/3371158.3371211","DOIUrl":"https://doi.org/10.1145/3371158.3371211","url":null,"abstract":"Sentiment analysis is a challenging task and has impactful applications, including analyzing customer feedback on social media. In this paper, we propose a novel approach which enhances a neural architecture to predict the overall sentiment of restaurant reviews which may contain multiple aspect-level sentiments. We calculate the weights of different aspects of a restaurant and incorporate them in a neural architecture. We also compare our results with the current state-of-the-art approach (ULMFiT [1]) and show an absolute improvement of 7% in the F-score and 6% in the accuracy. To the best of our knowledge, this is the first work in the line of research investigating the incorporation of aspect weights into a neural architecture for sentiment analysis, culminating in a detector thereof.","PeriodicalId":360747,"journal":{"name":"Proceedings of the 7th ACM IKDD CoDS and 25th COMAD","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133287539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Hybrid Distributed Model for Learning Representation of Short Texts with Attribute Labels 带有属性标签的短文本学习表示的混合分布式模型
Proceedings of the 7th ACM IKDD CoDS and 25th COMAD Pub Date : 2020-01-05 DOI: 10.1145/3371158.3371195
Shashi Kumar, S. Roy, Vishal Pathak
{"title":"A Hybrid Distributed Model for Learning Representation of Short Texts with Attribute Labels","authors":"Shashi Kumar, S. Roy, Vishal Pathak","doi":"10.1145/3371158.3371195","DOIUrl":"https://doi.org/10.1145/3371158.3371195","url":null,"abstract":"Short text documents in real-world applications, such as incident tickets, bug tickets, feedback texts etc. contain fixed field entries in the form of certain attribute instances as well as free text entries capturing the summaries of them. We propose an approach based on the Paragraph Vector (due to Le and Mikolov) to learn fixed-length feature representation from these short texts of varying lengths appended with attribute instances. Our method contributes to the existing approach by learning representation from summary of tickets as well as their attribute contents captured using fixed field entries. Further we show such representation of short texts produce better performance on a few learning tasks compared to the other popular representations.","PeriodicalId":360747,"journal":{"name":"Proceedings of the 7th ACM IKDD CoDS and 25th COMAD","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133422872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信