Chanyoung Park, Dong Hyun Kim, Jinoh Oh, Hwanjo Yu
{"title":"基于综合特征工程和决策边界聚焦欠采样的电子商务用户购买预测","authors":"Chanyoung Park, Dong Hyun Kim, Jinoh Oh, Hwanjo Yu","doi":"10.1145/2813448.2813517","DOIUrl":null,"url":null,"abstract":"The goal of RecSys Challenge 2015 [2] is: (1) to predict which user will end up with a purchase and if so, (2) to predict items that he/she will buy given click/purchase data provided by YOOCHOOSE. It is hard to achieve the goal of this Challenge because (1) the data does not contain user demographics information and it contains a lot of missing values and (2) the volume of the dataset is massive with about 33 million clicks and 1 million purchase history and the class distribution (the ratio of non-purchased clicks to purchased clicks) is highly imbalanced. In order to efficiently solve these problems, we propose (1) Comprehensive Feature Engineering method (CFE) including imputation of missing values to make up for insufficiency of information and (2) Decision Boundary Focused Under-Sampling method (DBFUS) to cope with class imbalance problem and to reduce learning time and memory usage. Our proposed approach obtained 54403.6 points on the final leaderboard.","PeriodicalId":324873,"journal":{"name":"Proceedings of the 2015 International ACM Recommender Systems Challenge","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":"{\"title\":\"Predicting User Purchase in E-commerce by Comprehensive Feature Engineering and Decision Boundary Focused Under-Sampling\",\"authors\":\"Chanyoung Park, Dong Hyun Kim, Jinoh Oh, Hwanjo Yu\",\"doi\":\"10.1145/2813448.2813517\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The goal of RecSys Challenge 2015 [2] is: (1) to predict which user will end up with a purchase and if so, (2) to predict items that he/she will buy given click/purchase data provided by YOOCHOOSE. It is hard to achieve the goal of this Challenge because (1) the data does not contain user demographics information and it contains a lot of missing values and (2) the volume of the dataset is massive with about 33 million clicks and 1 million purchase history and the class distribution (the ratio of non-purchased clicks to purchased clicks) is highly imbalanced. In order to efficiently solve these problems, we propose (1) Comprehensive Feature Engineering method (CFE) including imputation of missing values to make up for insufficiency of information and (2) Decision Boundary Focused Under-Sampling method (DBFUS) to cope with class imbalance problem and to reduce learning time and memory usage. Our proposed approach obtained 54403.6 points on the final leaderboard.\",\"PeriodicalId\":324873,\"journal\":{\"name\":\"Proceedings of the 2015 International ACM Recommender Systems Challenge\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-09-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2015 International ACM Recommender Systems Challenge\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2813448.2813517\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2015 International ACM Recommender Systems Challenge","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2813448.2813517","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Predicting User Purchase in E-commerce by Comprehensive Feature Engineering and Decision Boundary Focused Under-Sampling
The goal of RecSys Challenge 2015 [2] is: (1) to predict which user will end up with a purchase and if so, (2) to predict items that he/she will buy given click/purchase data provided by YOOCHOOSE. It is hard to achieve the goal of this Challenge because (1) the data does not contain user demographics information and it contains a lot of missing values and (2) the volume of the dataset is massive with about 33 million clicks and 1 million purchase history and the class distribution (the ratio of non-purchased clicks to purchased clicks) is highly imbalanced. In order to efficiently solve these problems, we propose (1) Comprehensive Feature Engineering method (CFE) including imputation of missing values to make up for insufficiency of information and (2) Decision Boundary Focused Under-Sampling method (DBFUS) to cope with class imbalance problem and to reduce learning time and memory usage. Our proposed approach obtained 54403.6 points on the final leaderboard.