{"title":"探索数据缺失对机器学习公平性的不公平影响。","authors":"Sitao Min, Hafiz Asif, Jaideep Vaidya","doi":"10.1109/mis.2025.3549484","DOIUrl":null,"url":null,"abstract":"<p><p>Today, data-driven models and artificial intelligence / machine learning underlie decision making in almost all aspects of society. However, significant concerns have been raised over the fairness of such models. While various aspects of algorithmic fairness have been studied, the effect of missing data on fairness remains understudied. This is a significant problem since data in real-world settings is almost never complete, and may often suffer from systemic missingness. This article systematically evaluates how missing data, particularly when correlated with protected classes and outcome variables, affects the fairness of classifiers. Utilizing a comprehensive framework covering various missing data patterns, rates, and mitigation methods, we analyze 150 experimental dataset variants derived from real-world scenarios, and find that missing data correlated with sensitive attributes and outcomes can exacerbate disparities, even for little missingness, making it crucial to address missingness in fairness evaluations.</p>","PeriodicalId":13160,"journal":{"name":"IEEE Intelligent Systems","volume":"40 3","pages":"28-38"},"PeriodicalIF":6.1000,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12204593/pdf/","citationCount":"0","resultStr":"{\"title\":\"Exploring the inequitable impact of data missingness on fairness in machine learning.\",\"authors\":\"Sitao Min, Hafiz Asif, Jaideep Vaidya\",\"doi\":\"10.1109/mis.2025.3549484\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Today, data-driven models and artificial intelligence / machine learning underlie decision making in almost all aspects of society. However, significant concerns have been raised over the fairness of such models. While various aspects of algorithmic fairness have been studied, the effect of missing data on fairness remains understudied. This is a significant problem since data in real-world settings is almost never complete, and may often suffer from systemic missingness. This article systematically evaluates how missing data, particularly when correlated with protected classes and outcome variables, affects the fairness of classifiers. Utilizing a comprehensive framework covering various missing data patterns, rates, and mitigation methods, we analyze 150 experimental dataset variants derived from real-world scenarios, and find that missing data correlated with sensitive attributes and outcomes can exacerbate disparities, even for little missingness, making it crucial to address missingness in fairness evaluations.</p>\",\"PeriodicalId\":13160,\"journal\":{\"name\":\"IEEE Intelligent Systems\",\"volume\":\"40 3\",\"pages\":\"28-38\"},\"PeriodicalIF\":6.1000,\"publicationDate\":\"2025-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12204593/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Intelligent Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1109/mis.2025.3549484\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/3/11 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Intelligent Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1109/mis.2025.3549484","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/3/11 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Exploring the inequitable impact of data missingness on fairness in machine learning.
Today, data-driven models and artificial intelligence / machine learning underlie decision making in almost all aspects of society. However, significant concerns have been raised over the fairness of such models. While various aspects of algorithmic fairness have been studied, the effect of missing data on fairness remains understudied. This is a significant problem since data in real-world settings is almost never complete, and may often suffer from systemic missingness. This article systematically evaluates how missing data, particularly when correlated with protected classes and outcome variables, affects the fairness of classifiers. Utilizing a comprehensive framework covering various missing data patterns, rates, and mitigation methods, we analyze 150 experimental dataset variants derived from real-world scenarios, and find that missing data correlated with sensitive attributes and outcomes can exacerbate disparities, even for little missingness, making it crucial to address missingness in fairness evaluations.
期刊介绍:
IEEE Intelligent Systems serves users, managers, developers, researchers, and purchasers who are interested in intelligent systems and artificial intelligence, with particular emphasis on applications. Typically they are degreed professionals, with backgrounds in engineering, hard science, or business. The publication emphasizes current practice and experience, together with promising new ideas that are likely to be used in the near future. Sample topic areas for feature articles include knowledge-based systems, intelligent software agents, natural-language processing, technologies for knowledge management, machine learning, data mining, adaptive and intelligent robotics, knowledge-intensive processing on the Web, and social issues relevant to intelligent systems. Also encouraged are application features, covering practice at one or more companies or laboratories; full-length product stories (which require refereeing by at least three reviewers); tutorials; surveys; and case studies. Often issues are theme-based and collect articles around a contemporary topic under the auspices of a Guest Editor working with the EIC.