Katie Owens, Conor Mettenburg, Evan Cohen, A. Ripley, Ruben Aghayan, W. Scherer
{"title":"使用在线用户行为来预测人口统计","authors":"Katie Owens, Conor Mettenburg, Evan Cohen, A. Ripley, Ruben Aghayan, W. Scherer","doi":"10.1109/SIEDS.2016.7489331","DOIUrl":null,"url":null,"abstract":"Videology, an online video advertising company, is often unable to obtain gender information about incoming online advertisement requests. They purchase aggregate gender statistics on groups of requests from a third party. This project explores creating groups of requests in which at least 80% of the advertisement requests have the same gender using: 1) traditional clustering algorithms, 2) iterative linear regression algorithm - ITRA - and 3) qualitative clustering algorithm (ROCK). In all cases, the data used was either browsing history data or synthetic attributes created by dimensionality reduction to more simply describe that history. These three approaches were unable to consistently create the desired gender discrimination. None of these three approaches proved to be the preferred alternative as the performance of each varied drastically as the test data set, and even subsets of that test data set changed. However, amongst the data sets used, these methods were able in some instances to create small buckets (less than 3,000 requests) with the desired gender distribution. The success or failure of these algorithms was dependent upon how similar individual requests were to one another (i.e. how many attributes were on average shared between requests). The approaches performed better in those instances in which more attributes were shared between requests, i.e., the requests contained information that allowed for the classification of the requests.","PeriodicalId":426864,"journal":{"name":"2016 IEEE Systems and Information Engineering Design Symposium (SIEDS)","volume":"128 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Using online user behavior to predict demographics\",\"authors\":\"Katie Owens, Conor Mettenburg, Evan Cohen, A. Ripley, Ruben Aghayan, W. Scherer\",\"doi\":\"10.1109/SIEDS.2016.7489331\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Videology, an online video advertising company, is often unable to obtain gender information about incoming online advertisement requests. They purchase aggregate gender statistics on groups of requests from a third party. This project explores creating groups of requests in which at least 80% of the advertisement requests have the same gender using: 1) traditional clustering algorithms, 2) iterative linear regression algorithm - ITRA - and 3) qualitative clustering algorithm (ROCK). In all cases, the data used was either browsing history data or synthetic attributes created by dimensionality reduction to more simply describe that history. These three approaches were unable to consistently create the desired gender discrimination. None of these three approaches proved to be the preferred alternative as the performance of each varied drastically as the test data set, and even subsets of that test data set changed. However, amongst the data sets used, these methods were able in some instances to create small buckets (less than 3,000 requests) with the desired gender distribution. The success or failure of these algorithms was dependent upon how similar individual requests were to one another (i.e. how many attributes were on average shared between requests). The approaches performed better in those instances in which more attributes were shared between requests, i.e., the requests contained information that allowed for the classification of the requests.\",\"PeriodicalId\":426864,\"journal\":{\"name\":\"2016 IEEE Systems and Information Engineering Design Symposium (SIEDS)\",\"volume\":\"128 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-04-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 IEEE Systems and Information Engineering Design Symposium (SIEDS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SIEDS.2016.7489331\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE Systems and Information Engineering Design Symposium (SIEDS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SIEDS.2016.7489331","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Using online user behavior to predict demographics
Videology, an online video advertising company, is often unable to obtain gender information about incoming online advertisement requests. They purchase aggregate gender statistics on groups of requests from a third party. This project explores creating groups of requests in which at least 80% of the advertisement requests have the same gender using: 1) traditional clustering algorithms, 2) iterative linear regression algorithm - ITRA - and 3) qualitative clustering algorithm (ROCK). In all cases, the data used was either browsing history data or synthetic attributes created by dimensionality reduction to more simply describe that history. These three approaches were unable to consistently create the desired gender discrimination. None of these three approaches proved to be the preferred alternative as the performance of each varied drastically as the test data set, and even subsets of that test data set changed. However, amongst the data sets used, these methods were able in some instances to create small buckets (less than 3,000 requests) with the desired gender distribution. The success or failure of these algorithms was dependent upon how similar individual requests were to one another (i.e. how many attributes were on average shared between requests). The approaches performed better in those instances in which more attributes were shared between requests, i.e., the requests contained information that allowed for the classification of the requests.