{"title":"CrisICSum: Interpretable Classification and Summarization Platform for Crisis Events from Microblogs","authors":"Thi-Huyen Nguyen, M. Shaltev, Koustav Rudra","doi":"10.1145/3511808.3557191","DOIUrl":"https://doi.org/10.1145/3511808.3557191","url":null,"abstract":"Microblogging platforms such as Twitter, receive massive messages during crisis events. Real-time insights are crucial for emergency response. Hence, there is a need to develop faithful tools for efficiently digesting information. In this paper, we present CrisICSum, a platform for classification and summarization of crisis events. The objective of CrisICSum is to classify user posts during disaster events into different humanitarian classes (i.e., damage, affected people, etc.) and generate summaries of class-level messages. Unlike existing systems, CrisICSum employs an interpretable by design backend classifier. It can generate explanations for output decisions. Besides, the platform allows user feedback on both classification and summarization phases. CrisICSum is designed and run as an easily integrated web application. Backend models are interchangeable. The system can assist users and human organizations in improving response efforts during disaster situations. CrisICSum is available at https://crisicsum.l3s.uni-hannover.de","PeriodicalId":389624,"journal":{"name":"Proceedings of the 31st ACM International Conference on Information & Knowledge Management","volume":"110 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130368141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Identify Relevant Entities Through Text Understanding","authors":"Poojan Oza","doi":"10.1145/3511808.3557819","DOIUrl":"https://doi.org/10.1145/3511808.3557819","url":null,"abstract":"An Entity Retrieval system is a fundamental task of Information Retrieval that provides direct answer to an information need of user. Prior work of entity retrieval utilizes either the Knowledge Graph fields or the text relevant to the query via pseudo-relevance feedback to improve the performance. Recently, Knowledge Graph embeddings or other entity representations, which capture the entity information from a knowledge graph are shown to be beneficial for entity retrieval. However, such embeddings are query-agnostic. In this dissertation work, we aim to improve entity retrieval by exploring the pseudo-relevance feedback to generate entity representations that capture query-aware entity information to determine the relevance of entities. We study the effectiveness of pseudo-relevance feedback against Knowledge Graph fields and investigate the efficacy of the Knowledge Graph embeddings for entity retrieval. We aim to understand the importance of utilization of query-aware signals and modeling of such signals with Knowledge Graph embeddings. Our results show that pseudo-relevance feedback is more effective than the Knowledge Graph fields by 30%.","PeriodicalId":389624,"journal":{"name":"Proceedings of the 31st ACM International Conference on Information & Knowledge Management","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125205976","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Real-time Post-processing System for Itinerary Recommendation","authors":"Linge Jiang, Guiyang Wang, Zhibo Zhu, Binghao Wang, Runsheng Gan, Ziqi Liu, Jun Zhou","doi":"10.1145/3511808.3557190","DOIUrl":"https://doi.org/10.1145/3511808.3557190","url":null,"abstract":"Post-processing is crucial to modern recommendation systems to achieve various purposes, e.g., improving diversity, and giving reasonable itineraries which consist of combinations of items, but is merely studied in the literature. We decouple the recommendation system into two modules including a reward estimation module and a post-processing module. Our real-time post-processing module built on Ray abstracts the common post-processing problems in the itinerary recommendation as combinatorial optimization problems. Under the goal of maximizing the click-through rate, the more reasonable recommendation results are obtained by imposing various constraints on the candidate items. However, the optimization problems are typically mixed integer programming problems with quadratic terms in practice, which are NP-hard. In real-time scenarios, there are extremely high requirements for the speed of the solving process. We speed up the problem solving by linearizing and relaxing the original problem and use Ray serving as the underlying service to provide stable and efficient technical support. At last, We provide services to users by deploying the post-processing module in the itinerary recommendation scenario at Alipay's built-in applet named ''What's nearby''. The online A/B experiment shows that the user exposure click rate can be significantly improved.","PeriodicalId":389624,"journal":{"name":"Proceedings of the 31st ACM International Conference on Information & Knowledge Management","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131715281","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hyeshin Chu, Joohee Kim, S. Kim, H. Lim, Hyunwoo Lee, Seungmin Jin, Jongeun Lee, Taehwan Kim, Sungahn Ko
{"title":"An Empirical Study on How People Perceive AI-generated Music","authors":"Hyeshin Chu, Joohee Kim, S. Kim, H. Lim, Hyunwoo Lee, Seungmin Jin, Jongeun Lee, Taehwan Kim, Sungahn Ko","doi":"10.1145/3511808.3557235","DOIUrl":"https://doi.org/10.1145/3511808.3557235","url":null,"abstract":"Music creation is difficult because one must express one's creativity while following strict rules. The advancement of deep learning technologies has diversified the methods to automate complex processes and express creativity in music composition. However, prior research has not paid much attention to exploring the audiences' subjective satisfaction to improve music generation models. In this paper, we evaluate human satisfaction with the state-of-the-art automatic symbolic music generation models using deep learning. In doing so, we define a taxonomy for music generation models and suggest nine subjective evaluation metrics. Through an evaluation study, we obtained more than 700 evaluations from 100 participants, using the suggested metrics. Our evaluation study reveals that the token representation method and models' characteristics affect subjective satisfaction. Through our qualitative analysis, we deepen our understanding of AI-generated music and suggested evaluation metrics. Lastly, we present lessons learned and discuss future research directions of deep learning models for music creation.","PeriodicalId":389624,"journal":{"name":"Proceedings of the 31st ACM International Conference on Information & Knowledge Management","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131736579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Saber Zerhoudi, S. Günther, Kim Plassmeier, Timo Borst, C. Seifert, Matthias Hagen, M. Granitzer
{"title":"The SimIIR 2.0 Framework: User Types, Markov Model-Based Interaction Simulation, and Advanced Query Generation","authors":"Saber Zerhoudi, S. Günther, Kim Plassmeier, Timo Borst, C. Seifert, Matthias Hagen, M. Granitzer","doi":"10.1145/3511808.3557711","DOIUrl":"https://doi.org/10.1145/3511808.3557711","url":null,"abstract":"Simulated user retrieval system interactions enable studies with controlled user behavior. To this end, the SimIIR framework offers static, rule-based methods. We present an extended SimIIR 2.0 version with new components for dynamic user type-specific Markov model-based interactions and more realistic query generation. A flexible modularization ensures that the SimIIR 2.0 framework can serve as a platform to implement, combine, and run the growing number of proposed search behavior and query simulation ideas.","PeriodicalId":389624,"journal":{"name":"Proceedings of the 31st ACM International Conference on Information & Knowledge Management","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132029398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mustafa Abdallah, Ryan A. Rossi, K. Mahadik, Sungchul Kim, Handong Zhao, S. Bagchi
{"title":"AutoForecast: Automatic Time-Series Forecasting Model Selection","authors":"Mustafa Abdallah, Ryan A. Rossi, K. Mahadik, Sungchul Kim, Handong Zhao, S. Bagchi","doi":"10.1145/3511808.3557241","DOIUrl":"https://doi.org/10.1145/3511808.3557241","url":null,"abstract":"In this work, we develop techniques for fast automatic selection of the best forecasting model for a new unseen time-series dataset, without having to first train (or evaluate) all the models on the new time-series data to select the best one. In particular, we develop a forecasting meta-learning approach called AutoForecast that allows for the quick inference of the best time-series forecasting model for an unseen dataset. Our approach learns both forecasting models performances over time horizon of same dataset and task similarity across different datasets. The experiments demonstrate the effectiveness of the approach over state-of-the-art (SOTA) single and ensemble methods and several SOTA meta-learners (adapted to our problem) in terms of selecting better forecasting models (i.e., 2X gain) for unseen tasks for univariate and multivariate testbeds.","PeriodicalId":389624,"journal":{"name":"Proceedings of the 31st ACM International Conference on Information & Knowledge Management","volume":"202 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131538321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chen Xu, Jun Xu, Xu Chen, Zhenhua Dong, Jirong Wen
{"title":"Dually Enhanced Propensity Score Estimation in Sequential Recommendation","authors":"Chen Xu, Jun Xu, Xu Chen, Zhenhua Dong, Jirong Wen","doi":"10.1145/3511808.3557299","DOIUrl":"https://doi.org/10.1145/3511808.3557299","url":null,"abstract":"Sequential recommender systems train their models based on a large amount of implicit user feedback data and may be subject to biases when users are systematically under/over-exposed to certain items. Unbiased learning based on inverse propensity scores (IPS), which estimate the probability of observing a user-item pair given the historical information, has been proposed to address the issue. In these methods, propensity score estimation is usually limited to the view of item, that is, treating the feedback data as sequences of items that interacted with the users. However, the feedback data can also be treated from the view of user, as the sequences of users that interact with the items. Moreover, the two views can jointly enhance the propensity score estimation. Inspired by the observation, we propose to estimate the propensity scores from the views of user and item, called Dually Enhanced Propensity Score Estimation (DEPS). Specifically, given a target user-item pair and the corresponding item and user interaction sequences, DEPS first constructs a time-aware causal graph to represent the user-item observational probability. According to the graph, two complementary propensity scores are estimated from the views of item and user, respectively, based on the same set of user feedback data. Finally, two transformers are designed to make use of the two propensity scores and make the final preference prediction. Theoretical analysis showed the unbiasedness and variance of DEPS. Experimental results on three publicly available benchmarks and a proprietary industrial dataset demonstrated that DEPS can significantly outperform the state-of-the-art baselines.","PeriodicalId":389624,"journal":{"name":"Proceedings of the 31st ACM International Conference on Information & Knowledge Management","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134042508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning-to-Spell: Weak Supervision based Query Correction in E-Commerce Search with Small Strong Labels","authors":"Madhura Pande, Vishal Kakkar, M. Bansal, Surender Kumar, Chinmay Sharma, Himanshu Malhotra, Praneet Mehta","doi":"10.1145/3511808.3557113","DOIUrl":"https://doi.org/10.1145/3511808.3557113","url":null,"abstract":"For an E-commerce search engine, users finding the right product critically depend on spell correction. A misspelled query can fetch totally unrelated results which in turn leads to a bad customer experience. Around 32% of queries have spelling mistakes on our e-commerce search engine. The spell problem becomes more challenging when most spell errors arise from customers with little or no exposure to the English language besides the usual source of accidental mistyping on keyboard. These spell errors are heavily influenced by the colloquial and spoken accents of the customers. This limits the benefit from using generic spell correction systems which are learnt from cleaner English sources like Brown Corpus and Wikipedia with a very low focus on phonetic/vernacular spell errors. In this work, we present a novel approach towards spell correction that effectively solves a very diverse set of spell errors and outperforms several state-of-the-art systems in the domain of E-commerce search. Our strategy combines Learning-to-Rank on a small strongly labelled data with multiple learners trained with weakly labelled data. We report the effectiveness of our solution WellSpell (Weak and strong Labels for Learning to Spell) with both the offline evaluations and online A/B experiment.","PeriodicalId":389624,"journal":{"name":"Proceedings of the 31st ACM International Conference on Information & Knowledge Management","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115872770","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuyao Guo, Haoming Li, Xiang Ao, Min Lu, Dapeng Liu, Lei Xiao, Jie Jiang, Qing He
{"title":"Calibrated Conversion Rate Prediction via Knowledge Distillation under Delayed Feedback in Online Advertising","authors":"Yuyao Guo, Haoming Li, Xiang Ao, Min Lu, Dapeng Liu, Lei Xiao, Jie Jiang, Qing He","doi":"10.1145/3511808.3557557","DOIUrl":"https://doi.org/10.1145/3511808.3557557","url":null,"abstract":"Prevailing calibration methods may fail to generalize well due to the pervasively delayed feedback issue in online advertising. That is, the labels of recent samples are more likely to be inaccurate because of the delayed feedback by users, while the old samples with complete feedback may suffer from the data shift compared to the recent ones. In this paper, we propose to calibrate conversion rate prediction models considering delayed feedback via the knowledge distillation technique. Specifically, we deploy a teacher model modeling by the samples with complete feedback to learn long-term conversion patterns and a student model modeling by the recent data to reduce the impact of data shift. We also devise a distillation loss to buoy the student model to learn from the teacher. Experimental results on two real-world advertising conversion rate prediction datasets demonstrate that our method can provide more calibrated predictions compared with the existing ones. We also exhibit that our method can be extended to different base models.","PeriodicalId":389624,"journal":{"name":"Proceedings of the 31st ACM International Conference on Information & Knowledge Management","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131946261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Luyao Chen, Ruqing Zhang, J. Guo, Yixing Fan, Xueqi Cheng
{"title":"Discriminative Language Model via Self-Teaching for Dense Retrieval","authors":"Luyao Chen, Ruqing Zhang, J. Guo, Yixing Fan, Xueqi Cheng","doi":"10.1145/3511808.3557582","DOIUrl":"https://doi.org/10.1145/3511808.3557582","url":null,"abstract":"Dense retrieval (DR) has shown promising results in many information retrieval (IR) related tasks, whose foundation is high-quality text representations for effective search. Taking the pre-trained language models (PLMs) as the text encoders has become a popular choice in DR. However, the learned representations based on these PLMs often lose the discriminative power, and thus hurt the recall performance, particularly as PLMs consider too much content of the input texts. Therefore, in this work, we propose to pre-train a discriminative language representation model, called DiscBERT, for DR. The key idea is that a good text representation should be able to automatically keep those discriminative features that could well distinguish different texts from each other in the semantic space. Specifically, inspired by knowledge distillation, we employ a simple yet effective training method, called self-teaching, to distill the model's knowledge constructed when training on the sampled representative tokens of a text sequence into the model's knowledge for the entire text sequence. By further fine-tuning on publicly available retrieval benchmark datasets, DiscBERT can outperform the state-of-the-art retrieval methods.","PeriodicalId":389624,"journal":{"name":"Proceedings of the 31st ACM International Conference on Information & Knowledge Management","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131947898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}