{"title":"Does calibration mean what they say it means; or, the reference class problem rises again","authors":"Lily Hu","doi":"10.1007/s11098-025-02322-y","DOIUrl":null,"url":null,"abstract":"<p>Discussions of statistical criteria for fairness commonly convey the normative significance of <i>calibration within groups</i> by invoking what risk scores “mean.” On the <i>Same Meaning</i> picture, group-calibrated scores “mean the same thing” (on average) across individuals from different groups and accordingly, guard against disparate treatment of individuals based on group membership. My contention is that calibration guarantees no such thing. Since concrete actual people belong to many groups, calibration cannot ensure the kind of consistent score interpretation that the Same Meaning picture implies matters for fairness, unless calibration is met within every group to which an individual belongs. Alas only perfect predictors may meet this bar. The Same Meaning picture thus commits a <i>reference class fallacy</i> by inferring from calibration within some group to the “meaning” or evidential value of an individual’s score, because they are a member of that group. The reference class answer it presumes does not only lack justification; it is very likely wrong. I then show that the reference class problem besets not just calibration but other group statistical criteria that claim a close connection to fairness. Reflecting on the origins of this oversight opens a wider lens onto the predominant methodology in algorithmic fairness based on stylized cases.</p>","PeriodicalId":48305,"journal":{"name":"PHILOSOPHICAL STUDIES","volume":"56 1","pages":""},"PeriodicalIF":1.1000,"publicationDate":"2025-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PHILOSOPHICAL STUDIES","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s11098-025-02322-y","RegionNum":1,"RegionCategory":"哲学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"PHILOSOPHY","Score":null,"Total":0}
引用次数: 0
Abstract
Discussions of statistical criteria for fairness commonly convey the normative significance of calibration within groups by invoking what risk scores “mean.” On the Same Meaning picture, group-calibrated scores “mean the same thing” (on average) across individuals from different groups and accordingly, guard against disparate treatment of individuals based on group membership. My contention is that calibration guarantees no such thing. Since concrete actual people belong to many groups, calibration cannot ensure the kind of consistent score interpretation that the Same Meaning picture implies matters for fairness, unless calibration is met within every group to which an individual belongs. Alas only perfect predictors may meet this bar. The Same Meaning picture thus commits a reference class fallacy by inferring from calibration within some group to the “meaning” or evidential value of an individual’s score, because they are a member of that group. The reference class answer it presumes does not only lack justification; it is very likely wrong. I then show that the reference class problem besets not just calibration but other group statistical criteria that claim a close connection to fairness. Reflecting on the origins of this oversight opens a wider lens onto the predominant methodology in algorithmic fairness based on stylized cases.
期刊介绍:
Philosophical Studies was founded in 1950 by Herbert Feigl and Wilfrid Sellars to provide a periodical dedicated to work in analytic philosophy. The journal remains devoted to the publication of papers in exclusively analytic philosophy. Papers applying formal techniques to philosophical problems are welcome. The principal aim is to publish articles that are models of clarity and precision in dealing with significant philosophical issues. It is intended that readers of the journal will be kept abreast of the central issues and problems of contemporary analytic philosophy.
Double-blind review procedure
The journal follows a double-blind reviewing procedure. Authors are therefore requested to place their name and affiliation on a separate page. Self-identifying citations and references in the article text should either be avoided or left blank when manuscripts are first submitted. Authors are responsible for reinserting self-identifying citations and references when manuscripts are prepared for final submission.