Dhiraj Murthy, Juhan Lee, Hassan Dashtian, Grace Kong
{"title":"Influence of User Profile Attributes on e-Cigarette-Related Searches on YouTube: Machine Learning Clustering and Classification.","authors":"Dhiraj Murthy, Juhan Lee, Hassan Dashtian, Grace Kong","doi":"10.2196/42218","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>The proliferation of e-cigarette content on YouTube is concerning because of its possible effect on youth use behaviors. YouTube has a personalized search and recommendation algorithm that derives attributes from a user's profile, such as age and sex. However, little is known about whether e-cigarette content is shown differently based on user characteristics.</p><p><strong>Objective: </strong>The aim of this study was to understand the influence of age and sex attributes of user profiles on e-cigarette-related YouTube search results.</p><p><strong>Methods: </strong>We created 16 fictitious YouTube profiles with ages of 16 and 24 years, sex (female and male), and ethnicity/race to search for 18 e-cigarette-related search terms. We used unsupervised (k-means clustering and classification) and supervised (graph convolutional network) machine learning and network analysis to characterize the variation in the search results of each profile. We further examined whether user attributes may play a role in e-cigarette-related content exposure by using networks and degree centrality.</p><p><strong>Results: </strong>We analyzed 4201 nonduplicate videos. Our k-means clustering suggested that the videos could be clustered into 3 categories. The graph convolutional network achieved high accuracy (0.72). Videos were classified based on content into 4 categories: product review (49.3%), health information (15.1%), instruction (26.9%), and other (8.5%). Underage users were exposed mostly to instructional videos (37.5%), with some indication that more female 16-year-old profiles were exposed to this content, while young adult age groups (24 years) were exposed mostly to product review videos (39.2%).</p><p><strong>Conclusions: </strong>Our results indicate that demographic attributes factor into YouTube's algorithmic systems in the context of e-cigarette-related queries on YouTube. Specifically, differences in the age and sex attributes of user profiles do result in variance in both the videos presented in YouTube search results as well as in the types of these videos. We find that underage profiles were exposed to e-cigarette content despite YouTube's age-restriction policy that ostensibly prohibits certain e-cigarette content. Greater enforcement of policies to restrict youth access to e-cigarette content is needed.</p>","PeriodicalId":73554,"journal":{"name":"JMIR infodemiology","volume":"3 ","pages":"e42218"},"PeriodicalIF":3.5000,"publicationDate":"2023-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10139687/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR infodemiology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2196/42218","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/1/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0
Abstract
Background: The proliferation of e-cigarette content on YouTube is concerning because of its possible effect on youth use behaviors. YouTube has a personalized search and recommendation algorithm that derives attributes from a user's profile, such as age and sex. However, little is known about whether e-cigarette content is shown differently based on user characteristics.
Objective: The aim of this study was to understand the influence of age and sex attributes of user profiles on e-cigarette-related YouTube search results.
Methods: We created 16 fictitious YouTube profiles with ages of 16 and 24 years, sex (female and male), and ethnicity/race to search for 18 e-cigarette-related search terms. We used unsupervised (k-means clustering and classification) and supervised (graph convolutional network) machine learning and network analysis to characterize the variation in the search results of each profile. We further examined whether user attributes may play a role in e-cigarette-related content exposure by using networks and degree centrality.
Results: We analyzed 4201 nonduplicate videos. Our k-means clustering suggested that the videos could be clustered into 3 categories. The graph convolutional network achieved high accuracy (0.72). Videos were classified based on content into 4 categories: product review (49.3%), health information (15.1%), instruction (26.9%), and other (8.5%). Underage users were exposed mostly to instructional videos (37.5%), with some indication that more female 16-year-old profiles were exposed to this content, while young adult age groups (24 years) were exposed mostly to product review videos (39.2%).
Conclusions: Our results indicate that demographic attributes factor into YouTube's algorithmic systems in the context of e-cigarette-related queries on YouTube. Specifically, differences in the age and sex attributes of user profiles do result in variance in both the videos presented in YouTube search results as well as in the types of these videos. We find that underage profiles were exposed to e-cigarette content despite YouTube's age-restriction policy that ostensibly prohibits certain e-cigarette content. Greater enforcement of policies to restrict youth access to e-cigarette content is needed.