{"title":"An ensemble approach for multi-label classification of item click sequences","authors":"A. Murat, Ya Gcı, Tevfik Aytekin, F. Gürgen","doi":"10.1145/2813448.2813516","DOIUrl":null,"url":null,"abstract":"In this paper, we describe our approach to RecSys 2015 challenge problem. Given a dataset of item click sessions, the problem is to predict whether a session results in a purchase and which items are purchased if the answer is yes. We define a simpler analogous problem where given an item and its session, we try to predict the probability of purchase for the given item. For each session, the predictions result in a set of purchased items or often an empty set. We apply monthly time windows over the dataset. For each item in a session, we engineer features regarding the session, the item properties, and the time window. Then, a balanced random forest classifier is trained to perform predictions on the test set. The dataset is particularly challenging due to privacy-preserving definition of a session, the class imbalance problem, and the volume of data. We report our findings with respect to feature engineering, the choice of sampling schemes, and classifier ensembles. Experimental results together with benefits and shortcomings of the proposed approach are discussed. The solution is efficient and practical in commodity computers.","PeriodicalId":324873,"journal":{"name":"Proceedings of the 2015 International ACM Recommender Systems Challenge","volume":"50 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2015 International ACM Recommender Systems Challenge","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2813448.2813516","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Abstract
In this paper, we describe our approach to RecSys 2015 challenge problem. Given a dataset of item click sessions, the problem is to predict whether a session results in a purchase and which items are purchased if the answer is yes. We define a simpler analogous problem where given an item and its session, we try to predict the probability of purchase for the given item. For each session, the predictions result in a set of purchased items or often an empty set. We apply monthly time windows over the dataset. For each item in a session, we engineer features regarding the session, the item properties, and the time window. Then, a balanced random forest classifier is trained to perform predictions on the test set. The dataset is particularly challenging due to privacy-preserving definition of a session, the class imbalance problem, and the volume of data. We report our findings with respect to feature engineering, the choice of sampling schemes, and classifier ensembles. Experimental results together with benefits and shortcomings of the proposed approach are discussed. The solution is efficient and practical in commodity computers.