{"title":"Learning to Infer Product Attribute Values From Descriptive Texts and Images","authors":"Pablo Montalvo, Aghiles Salah","doi":"10.1145/3539597.3575786","DOIUrl":"https://doi.org/10.1145/3539597.3575786","url":null,"abstract":"Online marketplaces are able to offer a staggering array of products that no physical store can match. While this makes it more likely for customers to find what they want, in order for online providers to ensure a smooth and efficient user experience, they must maintain well-organized catalogs, which depends greatly on the availability of per-product attribute values such as color, material, brand, to name a few. Unfortunately, such information is often incomplete or even missing in practice, and therefore we have to resort to predictive models as well as other sources of information to impute missing attribute values. In this talk we present the deep learning-based approach that we have developed at Rakuten Group to extract attribute values from product descriptive texts and images. Starting from pretrained architectures to encode textual and visual modalities, we discuss several refinements and improvements that we find necessary to achieve satisfactory performance and meet strict business requirements, namely improving recall while maintaining a high precision (>= 95%). Our methodology is driven by a systematic investigation into several practical research questions surrounding multimodality, which we revisit in this talk. At the heart of our multimodal architecture, is a new method to combine modalities inspired by empirical cross-modality comparisons. We present the latter component in details, point out one of its major limitations, namely exacerbating the issue of modality collapse, i.e., when the model forgets one modality, and describe our mitigation to this problem based on a principled regularization scheme. We present various empirical results on both Rakuten data as well as public benchmark datasets, which provide evidence of the benefits of our approach compared to several strong baselines. We also share some insights to characterise the circumstances in which the proposed model offers the most significant improvements. We conclude this talk by criticising the current model and discussing possible future developments and improvements. Our model is successfully deployed in Rakuten Ichiba - a Rakuten marketplace - and we believe that our investigation into multimodal attribute value extraction for e-commerce will benefit other researchers and practitioners alike embarking on similar journeys.","PeriodicalId":227804,"journal":{"name":"Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining","volume":"202 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133935246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning to Understand Audio and Multimodal Content","authors":"Rosie Jones","doi":"10.1145/3539597.3572333","DOIUrl":"https://doi.org/10.1145/3539597.3572333","url":null,"abstract":"Music, podcasts and audiobooks are rich audio content types with strong listener engagement. Search and recommendation across these content types can be greatly enhanced with a deep understanding of their content; across audio, text, and other multimodal content. In this talk, I discuss some of the challenges and opportunities in understanding this content. This deep understanding of content enables us to delight our users and expand the reach of our content creators. As part of enabling wider academic research into podcast content understanding, Spotify Research [1] has released a podcast dataset [2] with 120,000 hours of podcasts in English [3] and Portuguese [4].","PeriodicalId":227804,"journal":{"name":"Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133252446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qi Yang, Christos Tzelepis, Sergey I. Nikolenko, I. Patras, Aleksandr Farseev
{"title":"\"Just To See You Smile\": SMILEY, a Voice-Guided GUY GAN","authors":"Qi Yang, Christos Tzelepis, Sergey I. Nikolenko, I. Patras, Aleksandr Farseev","doi":"10.1145/3539597.3573031","DOIUrl":"https://doi.org/10.1145/3539597.3573031","url":null,"abstract":"In this technical demonstration, we present SMILEY, a voice-guided virtual assistant. The system utilizes a deep neural architecture ContraCLIP to manipulate facial attributes using voice instructions, allowing for deeper speaker engagement and smoother customer experience when being used in the \"virtual concierge\" scenario. We validate the effectiveness of SMILEY and ContraCLIP via a successful real-world case study in Singapore and a large-scale quantitative evaluation.","PeriodicalId":227804,"journal":{"name":"Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124397045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jun Zhou, Ke Zhang, Feng Zhu, Qitao Shi, Wenjing Fang, Lin Wang, Yi Wang
{"title":"ElasticDL: A Kubernetes-native Deep Learning Framework with Fault-tolerance and Elastic Scheduling","authors":"Jun Zhou, Ke Zhang, Feng Zhu, Qitao Shi, Wenjing Fang, Lin Wang, Yi Wang","doi":"10.1145/3539597.3573037","DOIUrl":"https://doi.org/10.1145/3539597.3573037","url":null,"abstract":"The power of artificial intelligence (AI) models originates with sophisticated model architecture as well as the sheer size of the model. These large-scale AI models impose new and challenging system requirements regarding scalability, reliability, and flexibility. One of the most promising solutions in the industry is to train these large-scale models on distributed deep-learning frameworks. With the power of all distributed computations, it is desired to achieve a training process with excellent scalability, elastic scheduling (flexibility), and fault tolerance (reliability). In this paper, we demonstrate the scalability, flexibility, and reliability of our open-source Elastic Deep Learning (ElasticDL) framework. Our ElasticDL utilizes an open-source system, i.e., Kubernetes, for automating deployment, scaling, and management of containerized application features to provide fault tolerance and support elastic scheduling for DL tasks.","PeriodicalId":227804,"journal":{"name":"Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116913594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Trustworthy Algorithmic Ranking Systems","authors":"M. Schedl, Emilia Gómez, E. Lex","doi":"10.1145/3539597.3572723","DOIUrl":"https://doi.org/10.1145/3539597.3572723","url":null,"abstract":"This tutorial aims at providing its audience an interdisciplinary overview about the topics of fairness and non-discrimination, diversity, and transparency as relevant dimensions of trustworthy AI systems, tailored to algorithmic ranking systems such as search engines and recommender systems. We will equip the mostly technical audience of WSDM with the necessary understanding of the social and ethical implications of their research and development on the one hand, and of recent ethical guidelines and regulatory frameworks addressing the aforementioned dimensions on the other hand. While the tutorial foremost takes a European perspective, starting from the concept of trustworthy AI and discussing EU regulation in this area currently in the implementation stages, we also consider related initiatives worldwide. Since ensuring non-discrimination, diversity, and transparency in retrieval and recommendation systems is an endeavor in which academic institutions and companies in different parts of the world should collaborate, this tutorial is relevant for researchers and practitioners interested in the ethical, social, and legal impact of their work. The tutorial, therefore, targets both academic scholars and practitioners around the globe, by reviewing recent research and providing practical examples addressing these particular trustworthiness aspects, and showcasing how new regulations affect the audience's daily work.","PeriodicalId":227804,"journal":{"name":"Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129399808","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Social Public Health Infrastructure for a Smart City Citizen Patient: Advances and Opportunities for AI Driven Disruptive Innovation","authors":"A. Teredesai","doi":"10.1145/3539597.3575779","DOIUrl":"https://doi.org/10.1145/3539597.3575779","url":null,"abstract":"Promoting health, preventing disease, and prolonging life are central to the success of any smart city initiative. Today, wireless communication, data infrastructure, and low-cost sensors such as lifestyle and activity trackers are making it increasingly possible for cities to collect, collate, and innovate on developing a smart infrastructure. Combining this with AI driven disruptions for human behavior change can fundamentally transform delivery of public health for the citizen patient[4]. Most urban development government bodies consider such infrastructure to be a distributed ecosystem consisting of physical infrastructure, institutional infrastructure, social infrastructure and economic infrastructure[1]. In this talk we will focus mostly on the social health infrastructure component and first showcase some of the recent initiatives created with the purpose of addressing health which is a key social goal, as a smart city goal. Specifically we will discuss how technical advances in IoT, recommendation systems, geospatial computing, and digital health therapeutics are creating a new future. Yet, such advances are not addressing the issues of making healthy behavior change sustainable with broad health equity for those that need it most[3]. Overwhelming nature of siloed apps and digital health solutions which often leave the citizens overwhelmed and those most in need underserved[2]. This talk highlights the advances and opportunities created when behavioral economics and public health combine with AI and cloud infrastructure to make smart public health initiatives personalized to each individual citizen patient.","PeriodicalId":227804,"journal":{"name":"Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130529750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Graphs: Privacy and Generation through ML","authors":"Rucha Bhalchandra Joshi","doi":"10.1145/3539597.3572987","DOIUrl":"https://doi.org/10.1145/3539597.3572987","url":null,"abstract":"Graphs are ubiquitous, which makes machine learning on graphs an important research area. While there are many aspects to this field, our research is focused primarily on two aspects of it. The first research question concerns privacy in graphs, where our work primarily focuses on preserving structural privacy in graphs. The second research question is about generating graphs. With applications in various fields such as drug discovery, designing novel proteins, etc., graph generation is emerging as an essential problem. This paper briefly describes the problems and the methodology to address them.","PeriodicalId":227804,"journal":{"name":"Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132853055","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Simplifying Graph-based Collaborative Filtering for Recommendation","authors":"Li He, Xianzhi Wang, Dingxian Wang, Haoyuan Zou, Hongzhi Yin, Guandong Xu","doi":"10.1145/3539597.3570451","DOIUrl":"https://doi.org/10.1145/3539597.3570451","url":null,"abstract":"Graph Convolutional Networks (GCNs) are a popular type of machine learning models that use multiple layers of convolutional aggregation operations and non-linear activations to represent data. Recent studies apply GCNs to Collaborative Filtering (CF)-based recommender systems (RSs) by modeling user-item interactions as a bipartite graph and achieve superior performance. However, these models face difficulty in training with non-linear activations on large graphs. Besides, most GCN-based models could not model deeper layers due to the over-smoothing effect with the graph convolution operation. In this paper, we improve the GCN-based CF models from two aspects. First, we remove non-linearities to enhance recommendation performance, which is consistent with the theories in simple graph convolutional networks. Second, we obtain the initialization of the embedding for each node in the graph by computing the network embedding on the condensed graph, which alleviates the over smoothing problem in graph convolution aggregation operation with sparse interaction data. The proposed model is a linear model that is easy to train, scalable to large datasets, and shown to yield better efficiency and effectiveness on four real datasets.","PeriodicalId":227804,"journal":{"name":"Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129226295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shilei Liu, Lin Li, Jun Song, Yonghua Yang, Xiaoyi Zeng
{"title":"Multimodal Pre-Training with Self-Distillation for Product Understanding in E-Commerce","authors":"Shilei Liu, Lin Li, Jun Song, Yonghua Yang, Xiaoyi Zeng","doi":"10.1145/3539597.3570423","DOIUrl":"https://doi.org/10.1145/3539597.3570423","url":null,"abstract":"Product understanding refers to a series of product-centric tasks, such as classification, alignment and attribute values prediction, which requires fine-grained fusion of various modalities of products. Excellent product modeling ability will enhance the user experience and benefit search and recommendation systems. In this paper, we propose MBSD, a pre-trained vision-and-language model which can integrate the heterogeneous information of product in a single stream BERT-style architecture. Compared with current approaches, MBSD uses a lightweight convolutional neural network instead of a heavy feature extractor for image encoding, which has lower latency. Besides, we cleverly utilize user behavior data to design a two-stage pre-training task to understand products from different perspectives. In addition, there is an underlying imbalanced problem in multimodal pre-training, which will impairs downstream tasks. To this end, we propose a novel self-distillation strategy to transfer the knowledge in dominated modality to weaker modality, so that each modality can be fully tapped during pre-training. Experimental results on several product understanding tasks demonstrate that the performance of MBSD outperforms the competitive baselines.","PeriodicalId":227804,"journal":{"name":"Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining","volume":"95 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125526220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Visual Matching is Enough for Scene Text Retrieval","authors":"L. Wen, Yingrong Wang, Dongxiang Zhang, Gang Chen","doi":"10.1145/3539597.3570428","DOIUrl":"https://doi.org/10.1145/3539597.3570428","url":null,"abstract":"Given a text query, the task of scene text retrieval aims at searching and localizing all the text instances that are contained in an image gallery. The state-of-the-art method learns a cross-modal similarity between the query text and the detected text regions in natural images to facilitate retrieval. However, this cross-modal approach still cannot well bridge the heterogeneity gap between the text and image modalities. In this paper, we propose a new paradigm that converts the task into a single-modality retrieval problem. Unlike previous works that rely on character recognition or embedding, we directly leverage pictorial information by rendering query text into images to learn the glyph feature of each character, which can be utilized to capture the similarity between query and scene text images. With the extracted visual features, we devise a synthetic label image guided feature alignment mechanism that is robust to different scene text styles and layouts. The modules of glyph feature learning, text instance detection, and visual matching are jointly trained in an end-to-end framework. Experimental results show that our proposed paradigm achieves the best performance in multiple benchmark datasets. As a side product, our method can also be easily generalized to support text queries with unseen characters or languages in a zero-shot manner.","PeriodicalId":227804,"journal":{"name":"Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125538527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}