Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining最新文献_第2页

Learning to Infer Product Attribute Values From Descriptive Texts and Images 学习从描述性文本和图像中推断产品属性值

Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining Pub Date : 2023-02-27 DOI: 10.1145/3539597.3575786

Pablo Montalvo, Aghiles Salah

{"title":"Learning to Infer Product Attribute Values From Descriptive Texts and Images","authors":"Pablo Montalvo, Aghiles Salah","doi":"10.1145/3539597.3575786","DOIUrl":"https://doi.org/10.1145/3539597.3575786","url":null,"abstract":"Online marketplaces are able to offer a staggering array of products that no physical store can match. While this makes it more likely for customers to find what they want, in order for online providers to ensure a smooth and efficient user experience, they must maintain well-organized catalogs, which depends greatly on the availability of per-product attribute values such as color, material, brand, to name a few. Unfortunately, such information is often incomplete or even missing in practice, and therefore we have to resort to predictive models as well as other sources of information to impute missing attribute values. In this talk we present the deep learning-based approach that we have developed at Rakuten Group to extract attribute values from product descriptive texts and images. Starting from pretrained architectures to encode textual and visual modalities, we discuss several refinements and improvements that we find necessary to achieve satisfactory performance and meet strict business requirements, namely improving recall while maintaining a high precision (>= 95%). Our methodology is driven by a systematic investigation into several practical research questions surrounding multimodality, which we revisit in this talk. At the heart of our multimodal architecture, is a new method to combine modalities inspired by empirical cross-modality comparisons. We present the latter component in details, point out one of its major limitations, namely exacerbating the issue of modality collapse, i.e., when the model forgets one modality, and describe our mitigation to this problem based on a principled regularization scheme. We present various empirical results on both Rakuten data as well as public benchmark datasets, which provide evidence of the benefits of our approach compared to several strong baselines. We also share some insights to characterise the circumstances in which the proposed model offers the most significant improvements. We conclude this talk by criticising the current model and discussing possible future developments and improvements. Our model is successfully deployed in Rakuten Ichiba - a Rakuten marketplace - and we believe that our investigation into multimodal attribute value extraction for e-commerce will benefit other researchers and practitioners alike embarking on similar journeys.","PeriodicalId":227804,"journal":{"name":"Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining","volume":"202 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133935246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Learning to Understand Audio and Multimodal Content 学习理解音频和多模态内容

Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining Pub Date : 2023-02-27 DOI: 10.1145/3539597.3572333

Rosie Jones

引用次数: 0

"Just To See You Smile": SMILEY, a Voice-Guided GUY GAN “只是为了看到你的微笑”:SMILEY，一个声音引导的GUY GAN

Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining Pub Date : 2023-02-27 DOI: 10.1145/3539597.3573031

Qi Yang, Christos Tzelepis, Sergey I. Nikolenko, I. Patras, Aleksandr Farseev

引用次数: 2

ElasticDL: A Kubernetes-native Deep Learning Framework with Fault-tolerance and Elastic Scheduling ElasticDL:一个kubernetes原生的深度学习框架，具有容错和弹性调度功能

Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining Pub Date : 2023-02-27 DOI: 10.1145/3539597.3573037

Jun Zhou, Ke Zhang, Feng Zhu, Qitao Shi, Wenjing Fang, Lin Wang, Yi Wang

引用次数: 4

Trustworthy Algorithmic Ranking Systems 值得信赖的算法排名系统

Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining Pub Date : 2023-02-27 DOI: 10.1145/3539597.3572723

M. Schedl, Emilia Gómez, E. Lex

{"title":"Trustworthy Algorithmic Ranking Systems","authors":"M. Schedl, Emilia Gómez, E. Lex","doi":"10.1145/3539597.3572723","DOIUrl":"https://doi.org/10.1145/3539597.3572723","url":null,"abstract":"This tutorial aims at providing its audience an interdisciplinary overview about the topics of fairness and non-discrimination, diversity, and transparency as relevant dimensions of trustworthy AI systems, tailored to algorithmic ranking systems such as search engines and recommender systems. We will equip the mostly technical audience of WSDM with the necessary understanding of the social and ethical implications of their research and development on the one hand, and of recent ethical guidelines and regulatory frameworks addressing the aforementioned dimensions on the other hand. While the tutorial foremost takes a European perspective, starting from the concept of trustworthy AI and discussing EU regulation in this area currently in the implementation stages, we also consider related initiatives worldwide. Since ensuring non-discrimination, diversity, and transparency in retrieval and recommendation systems is an endeavor in which academic institutions and companies in different parts of the world should collaborate, this tutorial is relevant for researchers and practitioners interested in the ethical, social, and legal impact of their work. The tutorial, therefore, targets both academic scholars and practitioners around the globe, by reviewing recent research and providing practical examples addressing these particular trustworthiness aspects, and showcasing how new regulations affect the audience's daily work.","PeriodicalId":227804,"journal":{"name":"Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129399808","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Social Public Health Infrastructure for a Smart City Citizen Patient: Advances and Opportunities for AI Driven Disruptive Innovation 智慧城市市民患者的社会公共卫生基础设施:人工智能驱动的颠覆性创新的进展和机遇

Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining Pub Date : 2023-02-27 DOI: 10.1145/3539597.3575779

A. Teredesai

{"title":"Social Public Health Infrastructure for a Smart City Citizen Patient: Advances and Opportunities for AI Driven Disruptive Innovation","authors":"A. Teredesai","doi":"10.1145/3539597.3575779","DOIUrl":"https://doi.org/10.1145/3539597.3575779","url":null,"abstract":"Promoting health, preventing disease, and prolonging life are central to the success of any smart city initiative. Today, wireless communication, data infrastructure, and low-cost sensors such as lifestyle and activity trackers are making it increasingly possible for cities to collect, collate, and innovate on developing a smart infrastructure. Combining this with AI driven disruptions for human behavior change can fundamentally transform delivery of public health for the citizen patient[4]. Most urban development government bodies consider such infrastructure to be a distributed ecosystem consisting of physical infrastructure, institutional infrastructure, social infrastructure and economic infrastructure[1]. In this talk we will focus mostly on the social health infrastructure component and first showcase some of the recent initiatives created with the purpose of addressing health which is a key social goal, as a smart city goal. Specifically we will discuss how technical advances in IoT, recommendation systems, geospatial computing, and digital health therapeutics are creating a new future. Yet, such advances are not addressing the issues of making healthy behavior change sustainable with broad health equity for those that need it most[3]. Overwhelming nature of siloed apps and digital health solutions which often leave the citizens overwhelmed and those most in need underserved[2]. This talk highlights the advances and opportunities created when behavioral economics and public health combine with AI and cloud infrastructure to make smart public health initiatives personalized to each individual citizen patient.","PeriodicalId":227804,"journal":{"name":"Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130529750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Graphs: Privacy and Generation through ML 图:隐私和通过ML生成

Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining Pub Date : 2023-02-27 DOI: 10.1145/3539597.3572987

Rucha Bhalchandra Joshi

引用次数: 1

Simplifying Graph-based Collaborative Filtering for Recommendation 简化基于图的推荐协同过滤

Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining Pub Date : 2023-02-27 DOI: 10.1145/3539597.3570451

Li He, Xianzhi Wang, Dingxian Wang, Haoyuan Zou, Hongzhi Yin, Guandong Xu

{"title":"Simplifying Graph-based Collaborative Filtering for Recommendation","authors":"Li He, Xianzhi Wang, Dingxian Wang, Haoyuan Zou, Hongzhi Yin, Guandong Xu","doi":"10.1145/3539597.3570451","DOIUrl":"https://doi.org/10.1145/3539597.3570451","url":null,"abstract":"Graph Convolutional Networks (GCNs) are a popular type of machine learning models that use multiple layers of convolutional aggregation operations and non-linear activations to represent data. Recent studies apply GCNs to Collaborative Filtering (CF)-based recommender systems (RSs) by modeling user-item interactions as a bipartite graph and achieve superior performance. However, these models face difficulty in training with non-linear activations on large graphs. Besides, most GCN-based models could not model deeper layers due to the over-smoothing effect with the graph convolution operation. In this paper, we improve the GCN-based CF models from two aspects. First, we remove non-linearities to enhance recommendation performance, which is consistent with the theories in simple graph convolutional networks. Second, we obtain the initialization of the embedding for each node in the graph by computing the network embedding on the condensed graph, which alleviates the over smoothing problem in graph convolution aggregation operation with sparse interaction data. The proposed model is a linear model that is easy to train, scalable to large datasets, and shown to yield better efficiency and effectiveness on four real datasets.","PeriodicalId":227804,"journal":{"name":"Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129226295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Multimodal Pre-Training with Self-Distillation for Product Understanding in E-Commerce 基于自蒸馏的电子商务产品理解多模态预训练

Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining Pub Date : 2023-02-27 DOI: 10.1145/3539597.3570423

Shilei Liu, Lin Li, Jun Song, Yonghua Yang, Xiaoyi Zeng

{"title":"Multimodal Pre-Training with Self-Distillation for Product Understanding in E-Commerce","authors":"Shilei Liu, Lin Li, Jun Song, Yonghua Yang, Xiaoyi Zeng","doi":"10.1145/3539597.3570423","DOIUrl":"https://doi.org/10.1145/3539597.3570423","url":null,"abstract":"Product understanding refers to a series of product-centric tasks, such as classification, alignment and attribute values prediction, which requires fine-grained fusion of various modalities of products. Excellent product modeling ability will enhance the user experience and benefit search and recommendation systems. In this paper, we propose MBSD, a pre-trained vision-and-language model which can integrate the heterogeneous information of product in a single stream BERT-style architecture. Compared with current approaches, MBSD uses a lightweight convolutional neural network instead of a heavy feature extractor for image encoding, which has lower latency. Besides, we cleverly utilize user behavior data to design a two-stage pre-training task to understand products from different perspectives. In addition, there is an underlying imbalanced problem in multimodal pre-training, which will impairs downstream tasks. To this end, we propose a novel self-distillation strategy to transfer the knowledge in dominated modality to weaker modality, so that each modality can be fully tapped during pre-training. Experimental results on several product understanding tasks demonstrate that the performance of MBSD outperforms the competitive baselines.","PeriodicalId":227804,"journal":{"name":"Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining","volume":"95 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125526220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Visual Matching is Enough for Scene Text Retrieval 视觉匹配足以实现场景文本检索

Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining Pub Date : 2023-02-27 DOI: 10.1145/3539597.3570428

L. Wen, Yingrong Wang, Dongxiang Zhang, Gang Chen

{"title":"Visual Matching is Enough for Scene Text Retrieval","authors":"L. Wen, Yingrong Wang, Dongxiang Zhang, Gang Chen","doi":"10.1145/3539597.3570428","DOIUrl":"https://doi.org/10.1145/3539597.3570428","url":null,"abstract":"Given a text query, the task of scene text retrieval aims at searching and localizing all the text instances that are contained in an image gallery. The state-of-the-art method learns a cross-modal similarity between the query text and the detected text regions in natural images to facilitate retrieval. However, this cross-modal approach still cannot well bridge the heterogeneity gap between the text and image modalities. In this paper, we propose a new paradigm that converts the task into a single-modality retrieval problem. Unlike previous works that rely on character recognition or embedding, we directly leverage pictorial information by rendering query text into images to learn the glyph feature of each character, which can be utilized to capture the similarity between query and scene text images. With the extracted visual features, we devise a synthetic label image guided feature alignment mechanism that is robust to different scene text styles and layouts. The modules of glyph feature learning, text instance detection, and visual matching are jointly trained in an end-to-end framework. Experimental results show that our proposed paradigm achieves the best performance in multiple benchmark datasets. As a side product, our method can also be easily generalized to support text queries with unseen characters or languages in a zero-shot manner.","PeriodicalId":227804,"journal":{"name":"Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125538527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2