No Two Users Are Alike: Generating Audiences with Neural Clustering for Temporal Point Processes

IF 0.5 4区数学 Q3 MATHEMATICS

Doklady Mathematics Pub Date : 2024-03-25 DOI:10.1134/S1064562423701661

V. Zhuzhel, V. Grabar, N. Kaploukhaya, R. Rivera-Castro, L. Mironova, A. Zaytsev, E. Burnaev

{"title":"No Two Users Are Alike: Generating Audiences with Neural Clustering for Temporal Point Processes","authors":"V. Zhuzhel, V. Grabar, N. Kaploukhaya, R. Rivera-Castro, L. Mironova, A. Zaytsev, E. Burnaev","doi":"10.1134/S1064562423701661","DOIUrl":null,"url":null,"abstract":"<p>Identifying the right user to target is a common problem for different Internet platforms. Although numerous systems address this task, they are heavily tailored for specific environments and settings. It is challenging for practitioners to apply these findings to their problems. The reason is that most systems are designed for settings with millions of highly active users and with personal information, as is the case in social networks or other services with high virality. There exists a gap in the literature for systems that are for medium-sized data and where the only data available are the event sequences of a user. It motivates us to present Look-A-Liker (LAL) as an unsupervised deep cluster system. It uses temporal point processes to identify similar users for targeting tasks. We use data from the leading Internet marketplace for the gastronomic sector for experiments. LAL generalizes beyond proprietary data. Using event sequences of users, it is possible to obtain state-of-the-art results compared to novel methods such as Transformer architectures and multimodal learning. Our approach produces the up to 20% ROC AUC score improvement on real-world datasets from 0.803 to 0.959. Although LAL focuses on hundreds of thousands of sequences, we show how it quickly expands to millions of user sequences. We provide a fully reproducible implementation with code and datasets in https://github.com/adasegroup/sequence_clusterers.</p>","PeriodicalId":531,"journal":{"name":"Doklady Mathematics","volume":"108 2 supplement","pages":"S511 - S528"},"PeriodicalIF":0.5000,"publicationDate":"2024-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Doklady Mathematics","FirstCategoryId":"100","ListUrlMain":"https://link.springer.com/article/10.1134/S1064562423701661","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MATHEMATICS","Score":null,"Total":0}

引用次数: 0

Abstract

Identifying the right user to target is a common problem for different Internet platforms. Although numerous systems address this task, they are heavily tailored for specific environments and settings. It is challenging for practitioners to apply these findings to their problems. The reason is that most systems are designed for settings with millions of highly active users and with personal information, as is the case in social networks or other services with high virality. There exists a gap in the literature for systems that are for medium-sized data and where the only data available are the event sequences of a user. It motivates us to present Look-A-Liker (LAL) as an unsupervised deep cluster system. It uses temporal point processes to identify similar users for targeting tasks. We use data from the leading Internet marketplace for the gastronomic sector for experiments. LAL generalizes beyond proprietary data. Using event sequences of users, it is possible to obtain state-of-the-art results compared to novel methods such as Transformer architectures and multimodal learning. Our approach produces the up to 20% ROC AUC score improvement on real-world datasets from 0.803 to 0.959. Although LAL focuses on hundreds of thousands of sequences, we show how it quickly expands to millions of user sequences. We provide a fully reproducible implementation with code and datasets in https://github.com/adasegroup/sequence_clusterers.

Abstract Image

查看原文本刊更多论文

没有两个用户是相同的：用神经聚类生成时点过程受众

摘要识别正确的目标用户是不同互联网平台面临的共同问题。虽然有许多系统可以解决这一问题，但它们都是针对特定环境和设置而量身定制的。对于从业人员来说，将这些研究成果应用到他们的问题中具有挑战性。原因在于，大多数系统都是针对拥有数百万高活跃度用户和个人信息的环境而设计的，如社交网络或其他具有高病毒性的服务。对于中等规模数据的系统，以及仅有用户事件序列数据的系统，文献中存在空白。这促使我们提出了无监督深度聚类系统 Look-A-Liker (LAL)。它利用时间点过程来识别目标任务中的相似用户。我们使用领先的美食行业互联网市场的数据进行实验。LAL 不局限于专有数据。通过使用用户的事件序列，我们可以获得与 Transformer 架构和多模态学习等新方法相比最先进的结果。在实际数据集上，我们的方法可将 ROC AUC 分数从 0.803 提高到 0.959，最高提高 20%。虽然 LAL 专注于数十万个序列，但我们展示了它如何快速扩展到数百万个用户序列。我们在 https://github.com/adasegroup/sequence_clusterers 中提供了完全可重现的实现方法，包括代码和数据集。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Doklady Mathematics 数学-数学

CiteScore

1.00

自引率

16.70%

发文量

审稿时长

3-6 weeks

期刊介绍： Doklady Mathematics is a journal of the Presidium of the Russian Academy of Sciences. It contains English translations of papers published in Doklady Akademii Nauk (Proceedings of the Russian Academy of Sciences), which was founded in 1933 and is published 36 times a year. Doklady Mathematics includes the materials from the following areas: mathematics, mathematical physics, computer science, control theory, and computers. It publishes brief scientific reports on previously unpublished significant new research in mathematics and its applications. The main contributors to the journal are Members of the RAS, Corresponding Members of the RAS, and scientists from the former Soviet Union and other foreign countries. Among the contributors are the outstanding Russian mathematicians.