{"title":"Ghosts in the Data: The Contested Politics of Absence in Data Infrastructures","authors":"Will Orr","doi":"10.1177/08944393251365277","DOIUrl":null,"url":null,"abstract":"Absences are inescapable in data. Data collection always focuses on some elements while occluding others. Yet, how absences are considered and recorded within data infrastructures markedly transforms the inferences that can be made. Tracing a genealogy from early databases to contemporary AI datasets, this paper explores how data infrastructures have grappled with the inherent incompleteness of data. Specifically, I uncover a tension between a desire for certainty and acknowledging partiality at the foundation of data science that continues to pervade contemporary AI datasets. Drawing on archival studies and sociological perspectives, I argue that data science must embrace uncertainty by recognizing the “ghosts in the data”—the uncounted, the unrepresented, and the silenced—and how their absence shapes the outcomes of automated systems.","PeriodicalId":49509,"journal":{"name":"Social Science Computer Review","volume":"26 1","pages":""},"PeriodicalIF":2.7000,"publicationDate":"2025-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Social Science Computer Review","FirstCategoryId":"90","ListUrlMain":"https://doi.org/10.1177/08944393251365277","RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
Absences are inescapable in data. Data collection always focuses on some elements while occluding others. Yet, how absences are considered and recorded within data infrastructures markedly transforms the inferences that can be made. Tracing a genealogy from early databases to contemporary AI datasets, this paper explores how data infrastructures have grappled with the inherent incompleteness of data. Specifically, I uncover a tension between a desire for certainty and acknowledging partiality at the foundation of data science that continues to pervade contemporary AI datasets. Drawing on archival studies and sociological perspectives, I argue that data science must embrace uncertainty by recognizing the “ghosts in the data”—the uncounted, the unrepresented, and the silenced—and how their absence shapes the outcomes of automated systems.
期刊介绍:
Unique Scope Social Science Computer Review is an interdisciplinary journal covering social science instructional and research applications of computing, as well as societal impacts of informational technology. Topics included: artificial intelligence, business, computational social science theory, computer-assisted survey research, computer-based qualitative analysis, computer simulation, economic modeling, electronic modeling, electronic publishing, geographic information systems, instrumentation and research tools, public administration, social impacts of computing and telecommunications, software evaluation, world-wide web resources for social scientists. Interdisciplinary Nature Because the Uses and impacts of computing are interdisciplinary, so is Social Science Computer Review. The journal is of direct relevance to scholars and scientists in a wide variety of disciplines. In its pages you''ll find work in the following areas: sociology, anthropology, political science, economics, psychology, computer literacy, computer applications, and methodology.