Proceedings of The Web Conference 2020最新文献_第7页

A Generic Solver Combining Unsupervised Learning and Representation Learning for Breaking Text-Based Captchas 结合无监督学习和表示学习的破解文本验证码的通用求解器

Proceedings of The Web Conference 2020 Pub Date : 2020-04-20 DOI: 10.1145/3366423.3380166

Sheng Tian, T. Xiong

{"title":"A Generic Solver Combining Unsupervised Learning and Representation Learning for Breaking Text-Based Captchas","authors":"Sheng Tian, T. Xiong","doi":"10.1145/3366423.3380166","DOIUrl":"https://doi.org/10.1145/3366423.3380166","url":null,"abstract":"Although there are many alternative captcha schemes available, text-based captchas are still one of the most popular security mechanism to maintain Internet security and prevent malicious attacks, due to the user preferences and ease of design. Over the past decade, different methods of breaking captchas have been proposed, which helps captcha keep evolving and become more robust. However, these previous works generally require heavy expert involvement and gradually become ineffective with the introduction of new security features. This paper proposes a generic solver combining unsupervised learning and representation learning to automatically remove the noisy background of captchas and solve text-based captchas. We introduce a new training scheme for constructing mini-batches, which contain a large number of unlabeled hard examples, to improve the efficiency of representation learning. Unlike existing deep learning algorithms, our method requires significantly fewer labeled samples and surpasses the recognition performance of a fully-supervised model with the same network architecture. Moreover, extensive experiments show that the proposed method outperforms state-of-the-art by delivering a higher accuracy on various captcha schemes. We provide further discussions of potential applications of the proposed unified framework. We hope that our work can inspire the community to enhance the security of text-based captchas.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"341 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76394348","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

Multi-Context Attention for Entity Matching 实体匹配的多上下文关注

Proceedings of The Web Conference 2020 Pub Date : 2020-04-20 DOI: 10.1145/3366423.3380017

Dongxiang Zhang, Yuyang Nie, Sai Wu, Yanyan Shen, K. Tan

引用次数: 24

Multimodal Post Attentive Profiling for Influencer Marketing 影响者营销的多模式后关注分析

Proceedings of The Web Conference 2020 Pub Date : 2020-04-20 DOI: 10.1145/3366423.3380052

Seungbae Kim, Jyun-Yu Jiang, Masaki Nakada, Jinyoung Han, Wei Wang

{"title":"Multimodal Post Attentive Profiling for Influencer Marketing","authors":"Seungbae Kim, Jyun-Yu Jiang, Masaki Nakada, Jinyoung Han, Wei Wang","doi":"10.1145/3366423.3380052","DOIUrl":"https://doi.org/10.1145/3366423.3380052","url":null,"abstract":"Influencer marketing has become a key marketing method for brands in recent years. Hence, brands have been increasingly utilizing influencers’ social networks to reach niche markets, and researchers have been studying various aspects of influencer marketing. However, brands have often suffered from searching and hiring the right influencers with specific interests/topics for their marketing due to a lack of available influencer data and/or limited capacity of marketing agencies. This paper proposes a multimodal deep learning model that uses text and image information from social media posts (i) to classify influencers into specific interests/topics (e.g., fashion, beauty) and (ii) to classify their posts into certain categories. We use the attention mechanism to select the posts that are more relevant to the topics of influencers, thereby generating useful influencer representations. We conduct experiments on the dataset crawled from Instagram, which is the most popular social media for influencer marketing. The experimental results show that our proposed model significantly outperforms existing user profiling methods by achieving 98% and 96% accuracy in classifying influencers and their posts, respectively. We release our influencer dataset of 33,935 influencers labeled with specific topics based on 10,180,500 posts to facilitate future research.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"51 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90042786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 20

Few-Sample and Adversarial Representation Learning for Continual Stream Mining 连续流挖掘的少样本和对抗表示学习

Proceedings of The Web Conference 2020 Pub Date : 2020-04-20 DOI: 10.1145/3366423.3380153

Zhuoyi Wang, Yigong Wang, Yu Lin, Evan Delord, L. Khan

{"title":"Few-Sample and Adversarial Representation Learning for Continual Stream Mining","authors":"Zhuoyi Wang, Yigong Wang, Yu Lin, Evan Delord, L. Khan","doi":"10.1145/3366423.3380153","DOIUrl":"https://doi.org/10.1145/3366423.3380153","url":null,"abstract":"Deep Neural Networks (DNNs) have primarily been demonstrated to be useful for closed-world classification problems where the number of categories is fixed. However, DNNs notoriously fail when tasked with label prediction in a non-stationary data stream scenario, which has the continuous emergence of the unknown or novel class (categories not in the training set). For example, new topics continually emerge in social media or e-commerce. To solve this challenge, a DNN should not only be able to detect the novel class effectively but also incrementally learn new concepts from limited samples over time. Literature that addresses both problems simultaneously is limited. In this paper, we focus on improving the generalization of the model on the novel classes, and making the model continually learn from only a few samples from the novel categories. Different from existing approaches that rely on abundant labeled instances to re-train/update the model, we propose a new approach based on Few Sample and Adversarial Representation Learning (FSAR). The key novelty is that we introduce the adversarial confusion term into both the representation learning and few-sample learning process, which reduces the over-confidence of the model on the seen classes, further enhance the generalization of the model to detect and learn new categories with only a few samples. We train the FSAR operated in two stages: first, FSAR learns an intra-class compacted and inter-class separated feature embedding to detect the novel classes; next, we collect a few labeled samples belong to the new categories, utilize episode-training to exploit the intrinsic features for few-sample learning. We evaluated FSAR on different datasets, using extensive experimental results from various simulated stream benchmarks to show that FSAR effectively outperforms current state-of-the-art approaches.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"74 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88059010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

Deconstructing Google’s Web Light Service 解构b谷歌的Web Light Service

Proceedings of The Web Conference 2020 Pub Date : 2020-04-20 DOI: 10.1145/3366423.3380168

Ammar Tahir, Muhammad Tahir Munir, Shaiq Munir Malik, Z. Qazi, I. Qazi

{"title":"Deconstructing Google’s Web Light Service","authors":"Ammar Tahir, Muhammad Tahir Munir, Shaiq Munir Malik, Z. Qazi, I. Qazi","doi":"10.1145/3366423.3380168","DOIUrl":"https://doi.org/10.1145/3366423.3380168","url":null,"abstract":"Web Light is a transcoding service introduced by Google to show lighter and faster webpages to users searching on slow mobile clients. The service detects slow clients (e.g., users on 2G) and tries to convert webpages on the fly into a version optimized for these clients. Web Light claims to significantly reduce page load times, save user data, and substantially increase traffic to such webpages. However, there are several concerns around this service, including, its effectiveness in, preserving relevant content on a page, showing third-party advertisements, improving user performance as well as privacy concerns for users and publishers. In this paper, we perform the first independent, empirical analysis of Google’s Web Light service to shed light on these concerns. Through a combination of experiments with thousands of real Web Light pages as well as controlled experiments with synthetic Web Light pages, we (i) deconstruct how Web Light modifies webpages, (ii) investigate how ads are shown on Web Light and which ad networks are supported, (iii) measure and compare Web Light’s page load performance, (iv) discuss privacy concerns for users and publishers and (v) investigate the potential use of Web Light as a censorship circumvention tool.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"72 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85962422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Dynamic Composition for Conversational Domain Exploration 会话领域探索的动态组合

Proceedings of The Web Conference 2020 Pub Date : 2020-04-20 DOI: 10.1145/3366423.3380167

Idan Szpektor, Deborah Cohen, G. Elidan, Michael Fink, A. Hassidim, Orgad Keller, Sayalı, Kulkarni, E. Ofek, S. Pudinsky, Asaf Revach, Shimi Salant

{"title":"Dynamic Composition for Conversational Domain Exploration","authors":"Idan Szpektor, Deborah Cohen, G. Elidan, Michael Fink, A. Hassidim, Orgad Keller, Sayalı, Kulkarni, E. Ofek, S. Pudinsky, Asaf Revach, Shimi Salant","doi":"10.1145/3366423.3380167","DOIUrl":"https://doi.org/10.1145/3366423.3380167","url":null,"abstract":"We study conversational domain exploration (CODEX), where the user’s goal is to enrich her knowledge of a given domain by conversing with an informative bot. Such conversations should be well grounded in high-quality domain knowledge as well as engaging and open-ended. A CODEX bot should be proactive and introduce relevant information even if not directly asked for by the user. The bot should also appropriately pivot the conversation to undiscovered regions of the domain. To address these dialogue characteristics, we introduce a novel approach termed dynamic composition that decouples candidate content generation from the flexible composition of bot responses. This allows the bot to control the source, correctness and quality of the offered content, while achieving flexibility via a dialogue manager that selects the most appropriate contents in a compositional manner. We implemented a CODEX bot based on dynamic composition and integrated it into the Google Assistant . As an example domain, the bot conversed about the NBA basketball league in a seamless experience, such that users were not aware whether they were conversing with the vanilla system or the one augmented with our CODEX bot. Results are positive and offer insights into what makes for a good conversation. To the best of our knowledge, this is the first real user experiment of open-ended dialogues as part of a commercial assistant system.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"17 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80124187","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Fast Computation of Explanations for Inconsistency in Large-Scale Knowledge Graphs 大规模知识图中不一致解释的快速计算

Proceedings of The Web Conference 2020 Pub Date : 2020-04-20 DOI: 10.1145/3366423.3380014

T. Tran, Mohamed H. Gad-Elrab, D. Stepanova, E. Kharlamov, Jannik Strotgen

引用次数: 9

Measurements, Analyses, and Insights on the Entire Ethereum Blockchain Network 整个以太坊区块链网络的测量、分析和见解

Proceedings of The Web Conference 2020 Pub Date : 2020-04-20 DOI: 10.1145/3366423.3380103

Xi Tong Lee, Arijit Khan, Sourav Sengupta, Yu-Han Ong, Xu Liu

{"title":"Measurements, Analyses, and Insights on the Entire Ethereum Blockchain Network","authors":"Xi Tong Lee, Arijit Khan, Sourav Sengupta, Yu-Han Ong, Xu Liu","doi":"10.1145/3366423.3380103","DOIUrl":"https://doi.org/10.1145/3366423.3380103","url":null,"abstract":"Blockchains are increasingly becoming popular due to the prevalence of cryptocurrencies and decentralized applications. Ethereum is a distributed public blockchain network that focuses on running code (smart contracts) for decentralized applications. More simply, it is a platform for sharing information in a global state that cannot be manipulated or changed. Ethereum blockchain introduces a novel ecosystem of human users and autonomous agents (smart contracts). In this network, we are interested in all possible interactions: user-to-user, user-to-contract, contract-to-user, and contract-to-contract. This requires us to construct interaction networks from the entire Ethereum blockchain data, where vertices are accounts (users, contracts) and arcs denote interactions. Our analyses on the networks reveal new insights by combining information from the four networks. We perform an in-depth study of these networks based on several graph properties consisting of both local and global properties, discuss their similarities and differences with social networks and the Web, draw interesting conclusions, and highlight important, future research directions.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"73 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81733198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 54

PG2S+: Stack Distance Construction Using Popularity, Gap and Machine Learning PG2S+:使用流行度，差距和机器学习构建堆栈距离

Proceedings of The Web Conference 2020 Pub Date : 2020-04-20 DOI: 10.1145/3366423.3380176

Jiangwei Zhang, Y. Tay

引用次数: 3

Interpretable Complex Question Answering 可解释的复杂问题回答

Proceedings of The Web Conference 2020 Pub Date : 2020-04-20 DOI: 10.1145/3366423.3380764

Soumen Chakrabarti

{"title":"Interpretable Complex Question Answering","authors":"Soumen Chakrabarti","doi":"10.1145/3366423.3380764","DOIUrl":"https://doi.org/10.1145/3366423.3380764","url":null,"abstract":"We will review cross-community co-evolution of question answering (QA) with the advent of large-scale knowledge graphs (KGs), continuous representations of text and graphs, and deep sequence analysis. Early QA systems were information retrieval (IR) systems enhanced to extract named entity spans from high-scoring passages. Starting with WordNet, a series of structured curations of language and world knowledge, called KGs, enabled further improvements. Corpus is unstructured and messy to exploit for QA. If a question can be answered using the KG alone, it is attractive to ‘interpret’ the free-form question into a structured query, which is then executed on the structured KG. This process is called KGQA. Answers can be high-quality and explainable if the KG has an answer, but manual curation results in low coverage. KGs were soon found useful to harness corpus information. Named entity mention spans could be tagged with fine-grained types (e.g., scientist), or even specific entities (e.g., Einstein). The QA system can learn to decompose a query into functional parts, e.g., “which scientist” and “played the violin”. With increasing success of such systems, ambition grew to address multi-hop or multi-clause queries, e.g., “the father of the director of La La Land teaches at which university?” or “who directed an award-winning movie and is the son of a Princeton University professor?” Questions limited to simple path traversals in KGs have been encoded to a vector representation, which a decoder then uses to guide the KG traversal. Recently the corpus counterpart of such strategies has also been proposed. However, for general multi-clause queries that do not necessarily translate to paths, and seek to bind multiple variables to satisfy multiple clauses, or involve logic, comparison, aggregation and other arithmetic, neural programmer-interpreter systems have seen some success. Our key focus will be on identifying situations where manual introduction of structural bias is essential for accuracy, as against cases where sufficient data can get around distant or no supervision.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"26 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87923520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4