{"title":"Interpretable Complex Question Answering","authors":"Soumen Chakrabarti","doi":"10.1145/3366423.3380764","DOIUrl":"https://doi.org/10.1145/3366423.3380764","url":null,"abstract":"We will review cross-community co-evolution of question answering (QA) with the advent of large-scale knowledge graphs (KGs), continuous representations of text and graphs, and deep sequence analysis. Early QA systems were information retrieval (IR) systems enhanced to extract named entity spans from high-scoring passages. Starting with WordNet, a series of structured curations of language and world knowledge, called KGs, enabled further improvements. Corpus is unstructured and messy to exploit for QA. If a question can be answered using the KG alone, it is attractive to ‘interpret’ the free-form question into a structured query, which is then executed on the structured KG. This process is called KGQA. Answers can be high-quality and explainable if the KG has an answer, but manual curation results in low coverage. KGs were soon found useful to harness corpus information. Named entity mention spans could be tagged with fine-grained types (e.g., scientist), or even specific entities (e.g., Einstein). The QA system can learn to decompose a query into functional parts, e.g., “which scientist” and “played the violin”. With increasing success of such systems, ambition grew to address multi-hop or multi-clause queries, e.g., “the father of the director of La La Land teaches at which university?” or “who directed an award-winning movie and is the son of a Princeton University professor?” Questions limited to simple path traversals in KGs have been encoded to a vector representation, which a decoder then uses to guide the KG traversal. Recently the corpus counterpart of such strategies has also been proposed. However, for general multi-clause queries that do not necessarily translate to paths, and seek to bind multiple variables to satisfy multiple clauses, or involve logic, comparison, aggregation and other arithmetic, neural programmer-interpreter systems have seen some success. Our key focus will be on identifying situations where manual introduction of structural bias is essential for accuracy, as against cases where sufficient data can get around distant or no supervision.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"26 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87923520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xi Tong Lee, Arijit Khan, Sourav Sengupta, Yu-Han Ong, Xu Liu
{"title":"Measurements, Analyses, and Insights on the Entire Ethereum Blockchain Network","authors":"Xi Tong Lee, Arijit Khan, Sourav Sengupta, Yu-Han Ong, Xu Liu","doi":"10.1145/3366423.3380103","DOIUrl":"https://doi.org/10.1145/3366423.3380103","url":null,"abstract":"Blockchains are increasingly becoming popular due to the prevalence of cryptocurrencies and decentralized applications. Ethereum is a distributed public blockchain network that focuses on running code (smart contracts) for decentralized applications. More simply, it is a platform for sharing information in a global state that cannot be manipulated or changed. Ethereum blockchain introduces a novel ecosystem of human users and autonomous agents (smart contracts). In this network, we are interested in all possible interactions: user-to-user, user-to-contract, contract-to-user, and contract-to-contract. This requires us to construct interaction networks from the entire Ethereum blockchain data, where vertices are accounts (users, contracts) and arcs denote interactions. Our analyses on the networks reveal new insights by combining information from the four networks. We perform an in-depth study of these networks based on several graph properties consisting of both local and global properties, discuss their similarities and differences with social networks and the Web, draw interesting conclusions, and highlight important, future research directions.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"73 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81733198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"PG2S+: Stack Distance Construction Using Popularity, Gap and Machine Learning","authors":"Jiangwei Zhang, Y. Tay","doi":"10.1145/3366423.3380176","DOIUrl":"https://doi.org/10.1145/3366423.3380176","url":null,"abstract":"Stack distance characterizes temporal locality of workloads and plays a vital role in cache analysis since the 1970s. However, exact stack distance calculation is too costly, and impractical for online use. Hence, much work was done to optimize the exact computation, or approximate it through sampling or modeling. This paper introduces a new approximation technique PG2S that is based on reference popularity and gap distance. This approximation is exact under the Independent Reference Model (IRM). The technique is further extended, using machine learning, to PG2S+ for non-IRM reference patterns. Extensive experiments show that PG2S+ is much more accurate and robust than other state-of-the-art algorithms for determining stack distance. PG2S+ is the first technique to exploit the strong correlation among reference popularity, gap distance and stack distance.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83022829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Vasudevan Nagendra, A. Bhattacharya, V. Yegneswaran, Amir Rahmati, Samir R Das
{"title":"An Intent-Based Automation Framework for Securing Dynamic Consumer IoT Infrastructures","authors":"Vasudevan Nagendra, A. Bhattacharya, V. Yegneswaran, Amir Rahmati, Samir R Das","doi":"10.1145/3366423.3380234","DOIUrl":"https://doi.org/10.1145/3366423.3380234","url":null,"abstract":"Consumer IoT networks are characterized by heterogeneous devices with diverse functionality and programming interfaces. This lack of homogeneity makes the integration and secure management of IoT infrastructures a daunting task for users and administrators. In this paper, we introduce VISCR, a Vendor-Independent policy Specification and Conflict Resolution engine that enables intent-based conflict-free policy specification and enforcement in IoT environments. VISCR converts the topology of the IoT infrastructure into a tree-based abstraction and translates existing policies from heterogeneous vendor-specific programming languages, such as Groovy-based SmartThings, OpenHAB, IFTTT-based templates, and MUD-based profiles, into a vendor-independent graph-based specification. These are then used to automatically detect rogue policies, policy conflicts, and automation bugs. We evaluated VISCR using a dataset of 907 IoT apps, programmed using heterogeneous automation specifications, in a simulated smart-building IoT infrastructure. In our experiments, among 907 IoT apps, VISCR exposed 342 of IoT apps as exhibiting one or more violations, while also running 14.2x faster than the state-of-the-art tool (Soteria). VISCR detected 100% of violations reported by Soteria while also detecting new types of violations in 266 additional apps.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"23 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84596450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Privacy-preserving AI Services Through Data Decentralization","authors":"Christian Meurisch, Bekir Bayrak, M. Mühlhäuser","doi":"10.1145/3366423.3380106","DOIUrl":"https://doi.org/10.1145/3366423.3380106","url":null,"abstract":"User services increasingly base their actions on AI models, e.g., to offer personalized and proactive support. However, the underlying AI algorithms require a continuous stream of personal data—leading to privacy issues, as users typically have to share this data out of their territory. Current privacy-preserving concepts are either not applicable to such AI-based services or to the disadvantage of any party. This paper presents PrivAI, a new decentralized and privacy-by-design platform for overcoming the need for sharing user data to benefit from personalized AI services. In short, PrivAI complements existing approaches to personal data stores, but strictly enforces the confinement of raw user data. PrivAI further addresses the resulting challenges by (1) dividing AI algorithms into cloud-based general model training, subsequent local personalization, and community-based sharing of model updates for new users; by (2) loading confidential AI models into a trusted execution environment, and thus, protecting provider’s intellectual property (IP). Our experiments show the feasibility and effectiveness of PrivAI with comparable performance as currently-practiced approaches.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"15 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74247680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Category-Aware Deep Model for Successive POI Recommendation on Sparse Check-in Data","authors":"Fuqiang Yu, Li-zhen Cui, Wei Guo, Xudong Lu, Qingzhong Li, Hua Lu","doi":"10.1145/3366423.3380202","DOIUrl":"https://doi.org/10.1145/3366423.3380202","url":null,"abstract":"As considerable amounts of POI check-in data have been accumulated, successive point-of-interest (POI) recommendation is increasingly popular. Existing successive POI recommendation methods only predict where user will go next, ignoring when this behavior will occur. In this work, we focus on predicting POIs that will be visited by users in the next 24 hours. As check-in data is very sparse, it is challenging to accurately capture user preferences in temporal patterns. To this end, we propose a category-aware deep model CatDM that incorporates POI category and geographical influence to reduce search space to overcome data sparsity. We design two deep encoders based on LSTM to model the time series data. The first encoder captures user preferences in POI categories, whereas the second exploits user preferences in POIs. Considering clock influence in the second encoder, we divide each user’s check-in history into several different time windows and develop a personalized attention mechanism for each window to facilitate CatDM to exploit temporal patterns. Moreover, to sort the candidate set, we consider four specific dependencies: user-POI, user-category, POI-time and POI-user current preferences. Extensive experiments are conducted on two large real datasets. The experimental results demonstrate that our CatDM outperforms the state-of-the-art models for successive POI recommendation on sparse check-in data.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"54 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74289044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient Online Multi-Task Learning via Adaptive Kernel Selection","authors":"Peng Yang, P. Li","doi":"10.1145/3366423.3379993","DOIUrl":"https://doi.org/10.1145/3366423.3379993","url":null,"abstract":"Conventional multi-task model restricts the task structure to be linearly related, which may not be suitable when data is linearly nonseparable. To remedy this issue, we propose a kernel algorithm for online multi-task classification, as the large approximation space provided by reproducing kernel Hilbert spaces often contains an accurate function. Specifically, it maintains a local-global Gaussian distribution over each task model that guides the direction and scale of parameter updates. Nonetheless, optimizing over this space is computationally expensive. Moreover, most multi-task learning methods require accessing to the entire training instances, which is luxury unavailable in the large-scale streaming learning scenario. To overcome this issue, we propose a randomized kernel sampling technique across multiple tasks. Instead of requiring all inputs’ labels, the proposed algorithm determines whether to query a label or not via considering the confidence from the related tasks over label prediction. Theoretically, the algorithm trained on actively sampled labels can achieve a comparable result with one learned on all labels. Empirically, the proposed algorithm is able to achieve promising learning efficacy, while reducing the computational complexity and labeling cost simultaneously.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89504862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient Implicit Unsupervised Text Hashing using Adversarial Autoencoder","authors":"Khoa D Doan, C. Reddy","doi":"10.1145/3366423.3380150","DOIUrl":"https://doi.org/10.1145/3366423.3380150","url":null,"abstract":"Searching for documents with semantically similar content is a fundamental problem in the information retrieval domain with various challenges, primarily, in terms of efficiency and effectiveness. Despite the promise of modeling structured dependencies in documents, several existing text hashing methods lack an efficient mechanism to incorporate such vital information. Additionally, the desired characteristics of an ideal hash function, such as robustness to noise, low quantization error and bit balance/uncorrelation, are not effectively learned with existing methods. This is because of the requirement to either tune additional hyper-parameters or optimize these heuristically and explicitly constructed cost functions. In this paper, we propose a Denoising Adversarial Binary Autoencoder (DABA) model which presents a novel representation learning framework that captures structured representation of text documents in the learned hash function. Also, adversarial training provides an alternative direction to implicitly learn a hash function that captures all the desired characteristics of an ideal hash function. Essentially, DABA adopts a novel single-optimization adversarial training procedure that minimizes the Wasserstein distance in its primal domain to regularize the encoder’s output of either a recurrent neural network or a convolutional autoencoder. We empirically demonstrate the effectiveness of our proposed method in capturing the intrinsic semantic manifold of the related documents. The proposed method outperforms the current state-of-the-art shallow and deep unsupervised hashing methods for the document retrieval task on several prominent document collections.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"52 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81116063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Democratizing Content Creation and Dissemination through AI Technology","authors":"Wei-Ying Ma","doi":"10.1145/3366423.3382669","DOIUrl":"https://doi.org/10.1145/3366423.3382669","url":null,"abstract":"With the rise of mobile video, user-generated content, and social networks, there is a massive opportunity for disruptive innovations in the media and content industry. It is now a fast-changing landscape with rapid advances in AI-powered content creation, dissemination and interaction technologies. I believe the current trends are leading us towards a world where everyone is equally empowered to produce high-quality content in video, music, augmented reality or more – and to share their information, knowledge, and stories with a large global audience. This new AI- powered content platform can further lead to innovations in advertising, e-commerce, online education, and productivity. I will share the current research efforts at ByteDance connected to this emerging new platform through products such as Douyin and TikTok, and discuss the challenges and the direction of our future research.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"116 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80371635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhuoren Jiang, Zheng Gao, Jinjiong Lan, Hongxia Yang, Yao Lu, Xiaozhong Liu
{"title":"Task-Oriented Genetic Activation for Large-Scale Complex Heterogeneous Graph Embedding","authors":"Zhuoren Jiang, Zheng Gao, Jinjiong Lan, Hongxia Yang, Yao Lu, Xiaozhong Liu","doi":"10.1145/3366423.3380230","DOIUrl":"https://doi.org/10.1145/3366423.3380230","url":null,"abstract":"The recent success of deep graph embedding innovates the graphical information characterization methodologies. However, in real-world applications, such a method still struggles with the challenges of heterogeneity, scalability, and multiplex. To address these challenges, in this study, we propose a novel solution, Genetic hEterogeneous gRaph eMbedding (GERM), which enables flexible and efficient task-driven vertex embedding in a complex heterogeneous graph. Unlike prior efforts for this track of studies, we employ a task-oriented genetic activation strategy to efficiently generate the “Edge Type Activated Vector” (ETAV) over the edge types in the graph. The generated ETAV can not only reduce the incompatible noise and navigate the heterogeneous graph random walk at the graph-schema level, but also activate an optimized subgraph for efficient representation learning. By revealing the correlation between the graph structure and task information, the model interpretability can be enhanced as well. Meanwhile, an activated heterogeneous skip-gram framework is proposed to encapsulate both topological and task-specific information of a given heterogeneous graph. Through extensive experiments on both scholarly and e-commerce datasets, we demonstrate the efficacy and scalability of the proposed methods via various search/recommendation tasks. GERM can significantly reduces the running time and remove expert-intervention without sacrificing the performance (or even modestly improve) by comparing with baselines.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"52 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90469714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}