{"title":"“Too Soon” to count? How gender and race cloud notability considerations on Wikipedia","authors":"Mackenzie Lemieux, Rebecca Zhang, F. Tripodi","doi":"10.1177/20539517231165490","DOIUrl":null,"url":null,"abstract":"While research has explored the extent of gender bias and the barriers to women's inclusion on English-language Wikipedia, very little research has focused on the problem of racial bias within the encyclopedia. Despite advocacy groups' efforts to incrementally improve representation on Wikipedia, much is unknown regarding how biographies are assessed after creation. Applying a combination of web-scraping, deep learning, natural language processing, and qualitative analysis to pages of academics nominated for deletion on Wikipedia, we demonstrate how Wikipedia's notability guidelines are unequally applied across race and gender. We find that online presence predicts whether a Wikipedia page is kept or deleted for white male academics but that this metric is idiosyncratically applied for female and BIPOC academics. Further, women's pages, regardless of race, were more likely to be deemed “too soon” for Wikipedia. A deeper analysis of the deletion archives reveals that when the tag is used on a woman's biography it is done so outside of the community guidelines, referring to one's career stage rather than media/online coverage. We argue that awareness of hidden biases on Wikipedia is critical to the objective and equitable application of the notability criteria across race and gender both on the encyclopedia and beyond.","PeriodicalId":47834,"journal":{"name":"Big Data & Society","volume":" ","pages":""},"PeriodicalIF":6.5000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Big Data & Society","FirstCategoryId":"90","ListUrlMain":"https://doi.org/10.1177/20539517231165490","RegionNum":1,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"SOCIAL SCIENCES, INTERDISCIPLINARY","Score":null,"Total":0}
引用次数: 3
Abstract
While research has explored the extent of gender bias and the barriers to women's inclusion on English-language Wikipedia, very little research has focused on the problem of racial bias within the encyclopedia. Despite advocacy groups' efforts to incrementally improve representation on Wikipedia, much is unknown regarding how biographies are assessed after creation. Applying a combination of web-scraping, deep learning, natural language processing, and qualitative analysis to pages of academics nominated for deletion on Wikipedia, we demonstrate how Wikipedia's notability guidelines are unequally applied across race and gender. We find that online presence predicts whether a Wikipedia page is kept or deleted for white male academics but that this metric is idiosyncratically applied for female and BIPOC academics. Further, women's pages, regardless of race, were more likely to be deemed “too soon” for Wikipedia. A deeper analysis of the deletion archives reveals that when the tag is used on a woman's biography it is done so outside of the community guidelines, referring to one's career stage rather than media/online coverage. We argue that awareness of hidden biases on Wikipedia is critical to the objective and equitable application of the notability criteria across race and gender both on the encyclopedia and beyond.
期刊介绍:
Big Data & Society (BD&S) is an open access, peer-reviewed scholarly journal that publishes interdisciplinary work principally in the social sciences, humanities, and computing and their intersections with the arts and natural sciences. The journal focuses on the implications of Big Data for societies and aims to connect debates about Big Data practices and their effects on various sectors such as academia, social life, industry, business, and government.
BD&S considers Big Data as an emerging field of practices, not solely defined by but generative of unique data qualities such as high volume, granularity, data linking, and mining. The journal pays attention to digital content generated both online and offline, encompassing social media, search engines, closed networks (e.g., commercial or government transactions), and open networks like digital archives, open government, and crowdsourced data. Rather than providing a fixed definition of Big Data, BD&S encourages interdisciplinary inquiries, debates, and studies on various topics and themes related to Big Data practices.
BD&S seeks contributions that analyze Big Data practices, involve empirical engagements and experiments with innovative methods, and reflect on the consequences of these practices for the representation, realization, and governance of societies. As a digital-only journal, BD&S's platform can accommodate multimedia formats such as complex images, dynamic visualizations, videos, and audio content. The contents of the journal encompass peer-reviewed research articles, colloquia, bookcasts, think pieces, state-of-the-art methods, and work by early career researchers.