{"title":"Evaluating Human-AI Hybrid Conversational Systems with Chatbot Message Suggestions","authors":"Zihan Gao, Jiepu Jiang","doi":"10.1145/3459637.3482340","DOIUrl":"https://doi.org/10.1145/3459637.3482340","url":null,"abstract":"AI chatbots can offer suggestions to help humans answer questions by reducing text entry effort and providing relevant knowledge for unfamiliar questions. We study whether chatbot suggestions can help people answer knowledge-demanding questions in a conversation and influence response quality and efficiency. We conducted a large-scale crowdsourcing user study and evaluated 20 hybrid system variants and a human-only baseline. The hybrid systems used four chatbots of varied response quality and differed in the number of suggestions and whether to preset the message box with top suggestions. Experimental results show that chatbot suggestions---even using poor-performing chatbots---have consistently improved response efficiency. Compared with the human-only setting, hybrid systems have reduced response time by 12%--35% and keystrokes by 33%--60%, and users have adopted a suggestion for the final response without any changes in 44%--68% of the cases. In contrast, crowd workers in the human-only setting typed most of the response texts and copied 5% of the answers from other sites. However, we also found that chatbot suggestions did not always help response quality. Specifically, in hybrid systems equipped with poor-performing chatbots, users responded with lower-quality answers than others in the human-only setting. It seems that users would not simply ignore poor suggestions and compose responses as they could without seeing the suggestions. Besides, presetting the message box has improved reply efficiency without hurting response quality. We did not find that showing more suggestions helps or hurts response quality or efficiency consistently. Our study reveals how and when AI chatbot suggestions can help people answer questions in hybrid conversational systems.","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115093375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"LiteGT","authors":"Cong Chen, Chaofan Tao, Ngai Wong","doi":"10.1145/3459637.3482272","DOIUrl":"https://doi.org/10.1145/3459637.3482272","url":null,"abstract":"Transformers have shown great potential for modeling long-term dependencies for natural language processing and computer vision. However, little study has applied transformers to graphs, which is challenging due to the poor scalability of the attention mechanism and the under-exploration of graph inductive bias. To bridge this gap, we propose a Lite Graph Transformer (LiteGT) that learns on arbitrary graphs efficiently. First, a node sampling strategy is proposed to sparsify the considered nodes in self-attention with only O (Nlog N) time. Second, we devise two kernelization approaches to form two-branch attention blocks, which not only leverage graph-specific topology information, but also reduce computation further to O (1 over 2 Nlog N). Third, the nodes are updated with different attention schemes during training, thus largely mitigating over-smoothing problems when the model layers deepen. Extensive experiments demonstrate that LiteGT achieves competitive performance on both node classification and link prediction on datasets with millions of nodes. Specifically, Jaccard + Sampling + Dim. reducing setting reduces more than 100x computation and halves the model size without performance degradation.","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114257420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Failure Prediction for Large-scale Water Pipe Networks Using GNN and Temporal Failure Series","authors":"Shuming Liang, Zhidong Li, Binxin Liang, Yu Ding, Yang Wang, Fang Chen","doi":"10.1145/3459637.3481918","DOIUrl":"https://doi.org/10.1145/3459637.3481918","url":null,"abstract":"Pipe failure prediction in the water industry aims to prioritize the pipes that are at high risk of failure for proactive maintenance. However, existing statistical or machine learning models that rely on historical failures and asset attributes can hardly leverage the structure information of pipe networks. In this work, we develop a failure prediction framework for pipe networks by jointly considering the pipes' features, the network structure, the geographical neighboring effect, and the temporal failure series. We apply a multi-hop Graph Neural Network (GNN) to failure prediction. We propose a method of constructing a geographical graph structure depending on not only the physical connections but also geographical distances between pipes. To differentiate the pipes with diverse properties, we employ an attention mechanism in the neighborhood aggregation process of each GNN layer. Also, residual connections and layer-wise aggregation are used to avoid the over-smoothing issue in deep GNNs. The historical failures exhibit a strong temporal pattern. Inspired by point process, we develop a module to learn the pipes' evolutionary effect and the time-decayed excitement of historical failures on the current state of the pipe. The proposed framework is evaluated on two real-world large-scale pipe networks. It outperforms the existing statistical, machine learning, and state-of-the-art GNN baselines. Our framework provides the water utility with core data-driven support for proactive maintenance including regular pipe inspection, pipe renewal planning, and sensor system deployment. It can be extended to other infrastructure networks in the future.","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"206 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114213850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Toon Koppelaars, Xavier Oriol, Ernest Teniente, Sérgio Curto, E. Pujol
{"title":"UQJG","authors":"Toon Koppelaars, Xavier Oriol, Ernest Teniente, Sérgio Curto, E. Pujol","doi":"10.1145/3459637.3482210","DOIUrl":"https://doi.org/10.1145/3459637.3482210","url":null,"abstract":"An SQL assertion is a declarative statement about data that must always be satisfied in any database state. Assertions were introduced in the SQL92 standard but no commercial DBMS has implemented them so far. Some approaches have been proposed to incrementally determine whether a transaction violates an SQL assertion, but they assume that transactions are applied in isolation, hence not considering the problem of concurrent transaction executions that collaborate to violate an assertion. This is the main stopper for its commercial implementation. To handle this problem, we have developed a technique for efficiently serializing concurrent transactions that might interact to violate an SQL assertion.","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114850514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"To Be or not to Be, Tail Labels in Extreme Multi-label Learning","authors":"Zhiqi Ge, Ximing Li","doi":"10.1145/3459637.3482303","DOIUrl":"https://doi.org/10.1145/3459637.3482303","url":null,"abstract":"EXtreme Multi-label Learning (XML) aims to predict each instance its most relevant subset of labels from an extremely huge label space, often exceeding one million or even larger in many real applications. In XML scenarios, the labels exhibit a long tail distribution, where a significant number of labels appear in very few instances, referred to as tail labels. Unfortunately, due to the lack of positive instances, the tail labels are intractable to learn as well as predict. Several previous studies even suggested that the tail labels can be directly removed by referring to their label frequencies. We consider that such violent principle may miss many significant tail labels, because the predictive accuracy is not strictly consistent with the label frequency especially for tail labels. In this paper, we are interested in finding a reasonable principle to determine whether a tail label should be removed, not only depending on their label frequencies. To this end, we investigate a method named Nearest Neighbor Positive Proportion Score (N2P2S) to score the tail labels by annotations of the instance neighbors. Extensive empirical results indicate that the proposed N2P2S can effectively screen the tail labels, where many preserved tail labels can be learned and accurately predicted even with very few positive instances.","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115877915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Self-Supervised Learning based on Sentiment Analysis with Word Weight Calculation","authors":"Dongcheol Son, Youngjoong Ko","doi":"10.1145/3459637.3482180","DOIUrl":"https://doi.org/10.1145/3459637.3482180","url":null,"abstract":"Learning domain information for a downstream task is important to improve the performance of sentiment analysis. However, the labeling task to obtain a sufficient amount of training data in an application domain tends to be highly time-consuming and tedious. To solve this problem, we propose a novel method to effectively learn domain information and improve sentiment analysis performance with a small amount of training data. We use the masked language model (MLM), which is a self-supervised learning model, to calculate word weights and improve a downstream fine-tuning task for sentiment analysis. In particular, the MLM with the calculated word weights is executed simultaneously with the fine-tuning task. The results show that the proposed model achieves better performances than previous models in four different datasets for sentiment analysis.","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115140127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
You Peng, Wenjie Zhao, Wenjie Zhang, Xuemin Lin, Ying Zhang
{"title":"DLQ","authors":"You Peng, Wenjie Zhao, Wenjie Zhang, Xuemin Lin, Ying Zhang","doi":"10.1145/3459637.3481978","DOIUrl":"https://doi.org/10.1145/3459637.3481978","url":null,"abstract":"Label-Constraint Reachability query (LCR) which extracts of reachability information from large edge-labeled graphs, has attracted tremendous interest. Various LCR algorithms have been proposed to solve this fundamental query, which has a wide range of applications in social networks, biological networks, economic networks, etc. In this paper, we implement the state-of-the-art P2H+ algorithm as well as functions to analyze the effectiveness. Moreover, our Dynamic LCR Query (DLQ) system also supports dynamic updates with the 2-hop labeling method. In this demonstration, we present the DLQ system for Label-Constrained Reachability Queries that utilize the 2-hop labeling algorithm with dynamic graph maintenance.","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115394549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Task Allocation with Geographic Partition in Spatial Crowdsourcing","authors":"Guanyu Ye, Yan Zhao, Xuanhao Chen, Kai Zheng","doi":"10.1145/3459637.3482300","DOIUrl":"https://doi.org/10.1145/3459637.3482300","url":null,"abstract":"Recent years have witnessed a revolution in Spatial Crowdsourcing (SC), in which people with mobile connectivity can perform spatio-temporal tasks that involve travel to specified locations. In this paper, we identify and study in depth a new multi-center-based task allocation problem in the context of SC, where multiple allocation centers exist. In particular, we aim to maximize the total number of the allocated tasks while minimizing the average allocated task number difference. To solve the problem, we propose a two-phase framework, called Task Allocation with Geographic Partition, consisting of a geographic partition phase and a task allocation phase. The first phase is to divide the whole study area based on the allocation centers by using both a basic Voronoi diagram-based algorithm and an adaptive weighted Voronoi diagram-based algorithm. In the allocation phase, we utilize a Reinforcement Learning method to achieve the task allocation, where a graph neural network with the attention mechanism is used to learn the embeddings of allocation centers, delivery points and workers. Extensive experiments give insight into the effectiveness and efficiency of the proposed solutions.","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115468905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tian-Qi Wang, Fuzhen Zhuang, Zhiqiang Zhang, Daixin Wang, Jun Zhou, Qing He
{"title":"Low-dimensional Alignment for Cross-Domain Recommendation","authors":"Tian-Qi Wang, Fuzhen Zhuang, Zhiqiang Zhang, Daixin Wang, Jun Zhou, Qing He","doi":"10.1145/3459637.3482137","DOIUrl":"https://doi.org/10.1145/3459637.3482137","url":null,"abstract":"Cold start problem is one of the most challenging and long-standing problems in recommender systems, and cross-domain recommendation (CDR) methods are effective for tackling it. Most cold-start related CDR methods require training a mapping function between high-dimensional embedding space using overlapping user data. However, the overlapping data is scarce in many recommendation tasks, which makes it difficult to train the mapping function. In this paper, we propose a new approach for CDR, which aims to alleviate the training difficulty. The proposed method can be viewed as a special parameterization of the mapping function without hurting expressiveness, which makes use of non-overlapping user data and leads to effective optimization. Extensive experiments on two real-world CDR tasks are performed to evaluate the proposed method. In the case that there are few overlapping data, the proposed method outperforms the existed state-of-the-art method by 14% (relative improvement).","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"618 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123201056","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Negar Arabzadeh, A. Bigdeli, Shirin Seyedsalehi, Morteza Zihayat, E. Bagheri
{"title":"Matches Made in Heaven: Toolkit and Large-Scale Datasets for Supervised Query Reformulation","authors":"Negar Arabzadeh, A. Bigdeli, Shirin Seyedsalehi, Morteza Zihayat, E. Bagheri","doi":"10.1145/3459637.3482009","DOIUrl":"https://doi.org/10.1145/3459637.3482009","url":null,"abstract":"Researchers have already shown that it is possible to improve retrieval effectiveness through the systematic reformulation of users' queries. Traditionally, most query reformulation techniques relied on unsupervised approaches such as query expansion through pseudo-relevance feedback. More recently and with the increasing effectiveness of neural sequence-to-sequence architectures, the problem of query reformulation has been studied as a supervised query translation problem, which learns to rewrite a query into a more effective alternative. While quite effective in practice, such supervised query reformulation methods require a large number of training instances. In this paper, we present three large-scale query reformulation datasets, namely Diamond, Platinum and Gold datasets, based on the queries in the MS MARCO dataset. The Diamond dataset consists of over 188,000 query pairs where the original source query is matched with an alternative query that has a perfect retrieval effectiveness (an average precision of 1). To the best of our knowledge, this is the first set of datasets for supervised query reformulation that offers perfect query reformulations for a large number of queries. The implementation of our fully automated tool, which is based on a transformer architecture, and our three datasets are made publicly available. We also establish a neural query reformulation baseline performance on our datasets by reporting the performance of strong neural query reformulation baselines. It is our belief that our datasets will significantly impact the development of supervised query reformulation methods in the future.","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121638382","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}