{"title":"Visited Websites May Reveal Users’ Demographic Information and Personality","authors":"Cheng-You Lien, Guo-Jhen Bai, Hung-Hsuan Chen","doi":"10.1145/3350546.3352525","DOIUrl":null,"url":null,"abstract":"This study shows that simple supervised learning algorithms can easily predict a user’s personality and demographic information based on the features derived from the users’ browsing logs, even when the logs are not recorded with the finest granularity (i.e., each visited URL of a user). This is different from the analytical formula of Cambridge Analytica (CA), which reported that it needs to know each user’s detailed liked objects (e.g., articles, pages, etc.) on Facebook with a fine granularity (i.e., CA needs to know the liked articles, not only the types of the articles) to predict user information. However, we employed only the visited website categories to predict a user’s gender, age, relationship status, and big six personality scores, which is an authoritative index to represent an individual’s personality in six dimensions. We also show that applying simple clustering as a preprocessing step enhances the predictive power. As a result, the data collectors, even when storing only a coarse granularity of the visited URLs of the users, may leverage such information to identify a user’s preferences/tastes and her/his private information without notifying users.","PeriodicalId":171168,"journal":{"name":"2019 IEEE/WIC/ACM International Conference on Web Intelligence (WI)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE/WIC/ACM International Conference on Web Intelligence (WI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3350546.3352525","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
This study shows that simple supervised learning algorithms can easily predict a user’s personality and demographic information based on the features derived from the users’ browsing logs, even when the logs are not recorded with the finest granularity (i.e., each visited URL of a user). This is different from the analytical formula of Cambridge Analytica (CA), which reported that it needs to know each user’s detailed liked objects (e.g., articles, pages, etc.) on Facebook with a fine granularity (i.e., CA needs to know the liked articles, not only the types of the articles) to predict user information. However, we employed only the visited website categories to predict a user’s gender, age, relationship status, and big six personality scores, which is an authoritative index to represent an individual’s personality in six dimensions. We also show that applying simple clustering as a preprocessing step enhances the predictive power. As a result, the data collectors, even when storing only a coarse granularity of the visited URLs of the users, may leverage such information to identify a user’s preferences/tastes and her/his private information without notifying users.