An ensemble approach for multi-label classification of item click sequences

A. Murat, Ya Gcı, Tevfik Aytekin, F. Gürgen
{"title":"An ensemble approach for multi-label classification of item click sequences","authors":"A. Murat, Ya Gcı, Tevfik Aytekin, F. Gürgen","doi":"10.1145/2813448.2813516","DOIUrl":null,"url":null,"abstract":"In this paper, we describe our approach to RecSys 2015 challenge problem. Given a dataset of item click sessions, the problem is to predict whether a session results in a purchase and which items are purchased if the answer is yes. We define a simpler analogous problem where given an item and its session, we try to predict the probability of purchase for the given item. For each session, the predictions result in a set of purchased items or often an empty set. We apply monthly time windows over the dataset. For each item in a session, we engineer features regarding the session, the item properties, and the time window. Then, a balanced random forest classifier is trained to perform predictions on the test set. The dataset is particularly challenging due to privacy-preserving definition of a session, the class imbalance problem, and the volume of data. We report our findings with respect to feature engineering, the choice of sampling schemes, and classifier ensembles. Experimental results together with benefits and shortcomings of the proposed approach are discussed. The solution is efficient and practical in commodity computers.","PeriodicalId":324873,"journal":{"name":"Proceedings of the 2015 International ACM Recommender Systems Challenge","volume":"50 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2015 International ACM Recommender Systems Challenge","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2813448.2813516","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

Abstract

In this paper, we describe our approach to RecSys 2015 challenge problem. Given a dataset of item click sessions, the problem is to predict whether a session results in a purchase and which items are purchased if the answer is yes. We define a simpler analogous problem where given an item and its session, we try to predict the probability of purchase for the given item. For each session, the predictions result in a set of purchased items or often an empty set. We apply monthly time windows over the dataset. For each item in a session, we engineer features regarding the session, the item properties, and the time window. Then, a balanced random forest classifier is trained to perform predictions on the test set. The dataset is particularly challenging due to privacy-preserving definition of a session, the class imbalance problem, and the volume of data. We report our findings with respect to feature engineering, the choice of sampling schemes, and classifier ensembles. Experimental results together with benefits and shortcomings of the proposed approach are discussed. The solution is efficient and practical in commodity computers.
项目点击序列多标签分类的集成方法
在本文中,我们描述了我们解决RecSys 2015挑战问题的方法。给定一个项目点击会话的数据集,问题是预测会话是否导致购买,以及如果答案是肯定的,购买哪些项目。我们定义了一个更简单的类似问题,其中给定一个商品及其会话,我们尝试预测该商品的购买概率。对于每个会话,预测的结果是一组已购买的物品,或者通常是一组空物品。我们对数据集应用月时间窗。对于会话中的每个项,我们设计有关会话、项属性和时间窗口的特性。然后,训练一个平衡随机森林分类器对测试集进行预测。由于会话的隐私保护定义、类不平衡问题和数据量,数据集特别具有挑战性。我们报告了我们在特征工程、采样方案的选择和分类器集成方面的发现。实验结果以及该方法的优点和不足进行了讨论。该解决方案在商用计算机中是高效和实用的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信