Using online user behavior to predict demographics

2016 IEEE Systems and Information Engineering Design Symposium (SIEDS) Pub Date : 2016-04-29 DOI:10.1109/SIEDS.2016.7489331

Katie Owens, Conor Mettenburg, Evan Cohen, A. Ripley, Ruben Aghayan, W. Scherer

{"title":"Using online user behavior to predict demographics","authors":"Katie Owens, Conor Mettenburg, Evan Cohen, A. Ripley, Ruben Aghayan, W. Scherer","doi":"10.1109/SIEDS.2016.7489331","DOIUrl":null,"url":null,"abstract":"Videology, an online video advertising company, is often unable to obtain gender information about incoming online advertisement requests. They purchase aggregate gender statistics on groups of requests from a third party. This project explores creating groups of requests in which at least 80% of the advertisement requests have the same gender using: 1) traditional clustering algorithms, 2) iterative linear regression algorithm - ITRA - and 3) qualitative clustering algorithm (ROCK). In all cases, the data used was either browsing history data or synthetic attributes created by dimensionality reduction to more simply describe that history. These three approaches were unable to consistently create the desired gender discrimination. None of these three approaches proved to be the preferred alternative as the performance of each varied drastically as the test data set, and even subsets of that test data set changed. However, amongst the data sets used, these methods were able in some instances to create small buckets (less than 3,000 requests) with the desired gender distribution. The success or failure of these algorithms was dependent upon how similar individual requests were to one another (i.e. how many attributes were on average shared between requests). The approaches performed better in those instances in which more attributes were shared between requests, i.e., the requests contained information that allowed for the classification of the requests.","PeriodicalId":426864,"journal":{"name":"2016 IEEE Systems and Information Engineering Design Symposium (SIEDS)","volume":"128 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE Systems and Information Engineering Design Symposium (SIEDS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SIEDS.2016.7489331","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Videology, an online video advertising company, is often unable to obtain gender information about incoming online advertisement requests. They purchase aggregate gender statistics on groups of requests from a third party. This project explores creating groups of requests in which at least 80% of the advertisement requests have the same gender using: 1) traditional clustering algorithms, 2) iterative linear regression algorithm - ITRA - and 3) qualitative clustering algorithm (ROCK). In all cases, the data used was either browsing history data or synthetic attributes created by dimensionality reduction to more simply describe that history. These three approaches were unable to consistently create the desired gender discrimination. None of these three approaches proved to be the preferred alternative as the performance of each varied drastically as the test data set, and even subsets of that test data set changed. However, amongst the data sets used, these methods were able in some instances to create small buckets (less than 3,000 requests) with the desired gender distribution. The success or failure of these algorithms was dependent upon how similar individual requests were to one another (i.e. how many attributes were on average shared between requests). The approaches performed better in those instances in which more attributes were shared between requests, i.e., the requests contained information that allowed for the classification of the requests.

查看原文本刊更多论文

使用在线用户行为来预测人口统计

在线视频广告公司Videology经常无法获得收到的在线广告请求的性别信息。他们从第三方购买请求组的总体性别统计数据。本项目探索创建请求组，其中至少80%的广告请求具有相同的性别，使用:1)传统聚类算法，2)迭代线性回归算法- ITRA -和3)定性聚类算法(ROCK)。在所有情况下，使用的数据要么是浏览历史数据，要么是通过降维创建的合成属性，以便更简单地描述该历史。这三种方法都不能始终如一地产生理想的性别歧视。这三种方法都不是首选的替代方法，因为每种方法的性能都随着测试数据集的变化而急剧变化，甚至测试数据集的子集也发生了变化。然而，在使用的数据集中，这些方法在某些情况下能够创建具有所需性别分布的小桶(少于3,000个请求)。这些算法的成功或失败取决于单个请求彼此之间的相似程度(即请求之间平均共享多少属性)。这些方法在请求之间共享更多属性的情况下执行得更好，例如，请求包含允许对请求进行分类的信息。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2016 IEEE Systems and Information Engineering Design Symposium (SIEDS)

自引率

0.00%

发文量