基于滑动窗口的移动商务购买预测多类特征工程

Qiang Li, Maojie Gu, Keren Zhou, Xiaoming Sun
{"title":"基于滑动窗口的移动商务购买预测多类特征工程","authors":"Qiang Li, Maojie Gu, Keren Zhou, Xiaoming Sun","doi":"10.1109/ICDMW.2015.172","DOIUrl":null,"url":null,"abstract":"Mobile devices become more and more prevalent in recent years, especially in young groups. The rapid progress of mobile devices promotes the development of M-Commerce business. The purchase on mobile terminals accounts for a considerable percentage in the total trading volume of E-Commerce and begins to draw the attention of E-Commerce corporation. Alibaba held a Mobile Recommendation Algorithm Competition aiming to recommend appropriate items for mobile users at the right time and place. The dataset provided by Alibaba consists of about 6 billion operation logs made by 5 million Taobao users towards over 150 million items spanning a period of one month. Compared with traditional scenarios in purchase predicting, the competition raised three challenges: (1)The dataset is too large to be processed in personal computers, (2)Some days with great discounts provided by Taobao Marketplace are within the period of dataset, (3)Positive samples are too few compared to the dimension of features. In this paper we study the problem of predicting the purchase behaviour of M-Commerce users, by exploring the solution for Alibaba's Mobile Recommendation Algorithm Competition. We first deeply study the habit of customers and filter many outliers. After that we adopt the method of \"sliding window\" to supply positive samples of training dataset and smooth the burst of sales near Dec 12th. We design a feature engineering framework to extract 6 categories of features that aim to capture the buying potential of user-item pairs. Our features exploit the interaction of user-item pair, user's shopping habit and item' attraction for users. Then we apply Gradient Boost Decision Trees (GBDT) as the training model. In the end, we combine outputs of individual GBDT together by Logistic Regression to get the final predictions. Our solution achieves 8.66% F1 score, and ranks the third place in the final round.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"134 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":"{\"title\":\"Multi-Classes Feature Engineering with Sliding Window for Purchase Prediction in Mobile Commerce\",\"authors\":\"Qiang Li, Maojie Gu, Keren Zhou, Xiaoming Sun\",\"doi\":\"10.1109/ICDMW.2015.172\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Mobile devices become more and more prevalent in recent years, especially in young groups. The rapid progress of mobile devices promotes the development of M-Commerce business. The purchase on mobile terminals accounts for a considerable percentage in the total trading volume of E-Commerce and begins to draw the attention of E-Commerce corporation. Alibaba held a Mobile Recommendation Algorithm Competition aiming to recommend appropriate items for mobile users at the right time and place. The dataset provided by Alibaba consists of about 6 billion operation logs made by 5 million Taobao users towards over 150 million items spanning a period of one month. Compared with traditional scenarios in purchase predicting, the competition raised three challenges: (1)The dataset is too large to be processed in personal computers, (2)Some days with great discounts provided by Taobao Marketplace are within the period of dataset, (3)Positive samples are too few compared to the dimension of features. In this paper we study the problem of predicting the purchase behaviour of M-Commerce users, by exploring the solution for Alibaba's Mobile Recommendation Algorithm Competition. We first deeply study the habit of customers and filter many outliers. After that we adopt the method of \\\"sliding window\\\" to supply positive samples of training dataset and smooth the burst of sales near Dec 12th. We design a feature engineering framework to extract 6 categories of features that aim to capture the buying potential of user-item pairs. Our features exploit the interaction of user-item pair, user's shopping habit and item' attraction for users. Then we apply Gradient Boost Decision Trees (GBDT) as the training model. In the end, we combine outputs of individual GBDT together by Logistic Regression to get the final predictions. Our solution achieves 8.66% F1 score, and ranks the third place in the final round.\",\"PeriodicalId\":192888,\"journal\":{\"name\":\"2015 IEEE International Conference on Data Mining Workshop (ICDMW)\",\"volume\":\"134 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-11-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"14\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 IEEE International Conference on Data Mining Workshop (ICDMW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDMW.2015.172\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDMW.2015.172","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 14

摘要

近年来,移动设备变得越来越普遍,尤其是在年轻群体中。移动设备的快速进步促进了移动商务业务的发展。移动端采购在电子商务总交易额中占有相当大的比重,并开始引起电子商务企业的重视。阿里巴巴举办了移动推荐算法大赛,旨在为移动用户在合适的时间和地点推荐合适的商品。阿里巴巴提供的数据集包括500万淘宝用户在一个月内对1.5亿多件商品的约60亿次操作日志。与传统的购买预测场景相比,竞争提出了三个挑战:(1)数据集太大,无法在个人电脑上处理;(2)淘宝提供的大折扣天数在数据集的周期内;(3)与特征维数相比,正样本太少。本文通过探索阿里巴巴移动推荐算法竞赛的解决方案,研究移动商务用户购买行为预测问题。我们首先深入研究顾客的习惯,过滤掉很多异常值。之后我们采用“滑动窗口”的方法提供训练数据集的正样本,平滑12月12日附近的销售爆发。我们设计了一个特征工程框架来提取6类特征,旨在捕捉用户-物品对的购买潜力。我们的特征利用了用户-物品对的交互、用户的购物习惯和物品对用户的吸引力。然后应用梯度提升决策树(GBDT)作为训练模型。最后,我们通过逻辑回归将各个GBDT的输出组合在一起,得到最终的预测结果。我们的方案达到了8.66%的F1得分,在最后一轮中排名第三。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Multi-Classes Feature Engineering with Sliding Window for Purchase Prediction in Mobile Commerce
Mobile devices become more and more prevalent in recent years, especially in young groups. The rapid progress of mobile devices promotes the development of M-Commerce business. The purchase on mobile terminals accounts for a considerable percentage in the total trading volume of E-Commerce and begins to draw the attention of E-Commerce corporation. Alibaba held a Mobile Recommendation Algorithm Competition aiming to recommend appropriate items for mobile users at the right time and place. The dataset provided by Alibaba consists of about 6 billion operation logs made by 5 million Taobao users towards over 150 million items spanning a period of one month. Compared with traditional scenarios in purchase predicting, the competition raised three challenges: (1)The dataset is too large to be processed in personal computers, (2)Some days with great discounts provided by Taobao Marketplace are within the period of dataset, (3)Positive samples are too few compared to the dimension of features. In this paper we study the problem of predicting the purchase behaviour of M-Commerce users, by exploring the solution for Alibaba's Mobile Recommendation Algorithm Competition. We first deeply study the habit of customers and filter many outliers. After that we adopt the method of "sliding window" to supply positive samples of training dataset and smooth the burst of sales near Dec 12th. We design a feature engineering framework to extract 6 categories of features that aim to capture the buying potential of user-item pairs. Our features exploit the interaction of user-item pair, user's shopping habit and item' attraction for users. Then we apply Gradient Boost Decision Trees (GBDT) as the training model. In the end, we combine outputs of individual GBDT together by Logistic Regression to get the final predictions. Our solution achieves 8.66% F1 score, and ranks the third place in the final round.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信