使用特征工程增强算法的点击率预测

Mohamadreza Bakhtyari, S. Mirzaei
{"title":"使用特征工程增强算法的点击率预测","authors":"Mohamadreza Bakhtyari, S. Mirzaei","doi":"10.1109/CSICC52343.2021.9420546","DOIUrl":null,"url":null,"abstract":"Click-Through Rate (CTR) prediction plays a critical role in online advertisement campaigns and recommendation systems. Most of the state-of-the-art models are based on Factorization Machines and some of these models try to feed mapped field features to a deep learning component for learning users’ interests by modelling feature interactions. Deploying a model for CTR is an online task and should be able to perform well with a limited amount of data and time. While these models are very good at prediction inferences and learning feature interactions, their deep component needs a vast amount of data and time and does not perform well in limited situations.In a recent article, a combination of boosting algorithms with deep factorization machines (XDBoost algorithm) has been proposed. In this paper, we use a boosting algorithm for prediction inference with limited raw data and time. We show that with an appropriate feature engineering and fine parameter tuning for a raw boosting model, we can outperform XDBoost method and get better results. We will use exploratory data analysis to extract the main characteristics of the dataset and eliminate the redundant data. Then, by applying grid search scheme, we select the best values for the hyperparameters of our model.","PeriodicalId":374593,"journal":{"name":"2021 26th International Computer Conference, Computer Society of Iran (CSICC)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Click-Through Rate Prediction Using Feature Engineered Boosting Algorithms\",\"authors\":\"Mohamadreza Bakhtyari, S. Mirzaei\",\"doi\":\"10.1109/CSICC52343.2021.9420546\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Click-Through Rate (CTR) prediction plays a critical role in online advertisement campaigns and recommendation systems. Most of the state-of-the-art models are based on Factorization Machines and some of these models try to feed mapped field features to a deep learning component for learning users’ interests by modelling feature interactions. Deploying a model for CTR is an online task and should be able to perform well with a limited amount of data and time. While these models are very good at prediction inferences and learning feature interactions, their deep component needs a vast amount of data and time and does not perform well in limited situations.In a recent article, a combination of boosting algorithms with deep factorization machines (XDBoost algorithm) has been proposed. In this paper, we use a boosting algorithm for prediction inference with limited raw data and time. We show that with an appropriate feature engineering and fine parameter tuning for a raw boosting model, we can outperform XDBoost method and get better results. We will use exploratory data analysis to extract the main characteristics of the dataset and eliminate the redundant data. Then, by applying grid search scheme, we select the best values for the hyperparameters of our model.\",\"PeriodicalId\":374593,\"journal\":{\"name\":\"2021 26th International Computer Conference, Computer Society of Iran (CSICC)\",\"volume\":\"15 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-03-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 26th International Computer Conference, Computer Society of Iran (CSICC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CSICC52343.2021.9420546\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 26th International Computer Conference, Computer Society of Iran (CSICC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CSICC52343.2021.9420546","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

点击率(CTR)预测在网络广告活动和推荐系统中起着至关重要的作用。大多数最先进的模型都是基于分解机器的,其中一些模型试图将映射的领域特征馈送到深度学习组件,通过建模特征交互来学习用户的兴趣。部署CTR模型是一项在线任务,应该能够在有限的数据和时间内表现良好。虽然这些模型在预测推断和学习特征交互方面非常出色,但它们的深层组件需要大量的数据和时间,并且在有限的情况下表现不佳。在最近的一篇文章中,提出了一种增强算法与深度分解机器(XDBoost算法)的组合。在本文中,我们使用一种增强算法在有限的原始数据和时间下进行预测推理。我们表明,通过对原始提升模型进行适当的特征工程和精细的参数调优,我们可以胜过XDBoost方法并获得更好的结果。我们将使用探索性数据分析来提取数据集的主要特征,并消除冗余数据。然后,通过网格搜索方案,选择模型超参数的最优值。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Click-Through Rate Prediction Using Feature Engineered Boosting Algorithms
Click-Through Rate (CTR) prediction plays a critical role in online advertisement campaigns and recommendation systems. Most of the state-of-the-art models are based on Factorization Machines and some of these models try to feed mapped field features to a deep learning component for learning users’ interests by modelling feature interactions. Deploying a model for CTR is an online task and should be able to perform well with a limited amount of data and time. While these models are very good at prediction inferences and learning feature interactions, their deep component needs a vast amount of data and time and does not perform well in limited situations.In a recent article, a combination of boosting algorithms with deep factorization machines (XDBoost algorithm) has been proposed. In this paper, we use a boosting algorithm for prediction inference with limited raw data and time. We show that with an appropriate feature engineering and fine parameter tuning for a raw boosting model, we can outperform XDBoost method and get better results. We will use exploratory data analysis to extract the main characteristics of the dataset and eliminate the redundant data. Then, by applying grid search scheme, we select the best values for the hyperparameters of our model.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信