分析和解决用于社会问题的机器学习模型中数据驱动的公平问题

Vishnu S. Pendyala, Hyun-Gee Kim
{"title":"分析和解决用于社会问题的机器学习模型中数据驱动的公平问题","authors":"Vishnu S. Pendyala, Hyun-Gee Kim","doi":"10.1109/ICCECE51049.2023.10085470","DOIUrl":null,"url":null,"abstract":"This work aims to systematically analyze and address fairness issues arising in machine learning models because of class imbalances present in data, specifically used for addressing societal problems and providing unique insights. Using a specific data set, spectral analysis is first performed to present evidence and characterize the fairness issues. Subsequently, a series of class imbalance correction techniques are applied before the data is used to generate various machine learning models. The models so generated are then evaluated using multiple metrics. The results are then analyzed to compare the various approaches to determine the relative merits of each. As the experiments described in this paper confirm, not all oversampling techniques help in correcting data-induced model biases. Based on the Kappa statistic, F-1 score, and accuracy measured by the area under the Receiver Operating Characteristic curve, among the approaches evaluated, the Majority Weighted Minority Oversampling Technique, MWMOTE oversampling technique addresses the fairness issues the best and also improves the performance of the models at least for the dataset in consideration. The experiments also demonstrate that some of the oversampling techniques can degrade the models both in terms of performance and fairness. The results are interpreted using the evaluation metrics.","PeriodicalId":447131,"journal":{"name":"2023 International Conference on Computer, Electrical & Communication Engineering (ICCECE)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Analyzing and Addressing Data-driven Fairness Issues in Machine Learning Models used for Societal Problems\",\"authors\":\"Vishnu S. Pendyala, Hyun-Gee Kim\",\"doi\":\"10.1109/ICCECE51049.2023.10085470\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This work aims to systematically analyze and address fairness issues arising in machine learning models because of class imbalances present in data, specifically used for addressing societal problems and providing unique insights. Using a specific data set, spectral analysis is first performed to present evidence and characterize the fairness issues. Subsequently, a series of class imbalance correction techniques are applied before the data is used to generate various machine learning models. The models so generated are then evaluated using multiple metrics. The results are then analyzed to compare the various approaches to determine the relative merits of each. As the experiments described in this paper confirm, not all oversampling techniques help in correcting data-induced model biases. Based on the Kappa statistic, F-1 score, and accuracy measured by the area under the Receiver Operating Characteristic curve, among the approaches evaluated, the Majority Weighted Minority Oversampling Technique, MWMOTE oversampling technique addresses the fairness issues the best and also improves the performance of the models at least for the dataset in consideration. The experiments also demonstrate that some of the oversampling techniques can degrade the models both in terms of performance and fairness. The results are interpreted using the evaluation metrics.\",\"PeriodicalId\":447131,\"journal\":{\"name\":\"2023 International Conference on Computer, Electrical & Communication Engineering (ICCECE)\",\"volume\":\"21 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-01-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 International Conference on Computer, Electrical & Communication Engineering (ICCECE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCECE51049.2023.10085470\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 International Conference on Computer, Electrical & Communication Engineering (ICCECE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCECE51049.2023.10085470","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

这项工作旨在系统地分析和解决机器学习模型中由于数据中存在的阶级不平衡而产生的公平问题,专门用于解决社会问题并提供独特的见解。使用特定的数据集,首先进行频谱分析,以提供证据并表征公平性问题。随后,在将数据用于生成各种机器学习模型之前,应用一系列类不平衡校正技术。然后使用多个指标对生成的模型进行评估。然后对结果进行分析,比较各种方法,以确定每种方法的相对优点。正如本文所描述的实验所证实的那样,并非所有的过采样技术都有助于纠正数据引起的模型偏差。基于Kappa统计量、F-1分数和接收者工作特征曲线下面积测量的准确性,在评估的方法中,多数加权少数过采样技术、MWMOTE过采样技术解决了公平性问题,至少在考虑的数据集上提高了模型的性能。实验还表明,一些过采样技术会降低模型的性能和公平性。使用评估度量来解释结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Analyzing and Addressing Data-driven Fairness Issues in Machine Learning Models used for Societal Problems
This work aims to systematically analyze and address fairness issues arising in machine learning models because of class imbalances present in data, specifically used for addressing societal problems and providing unique insights. Using a specific data set, spectral analysis is first performed to present evidence and characterize the fairness issues. Subsequently, a series of class imbalance correction techniques are applied before the data is used to generate various machine learning models. The models so generated are then evaluated using multiple metrics. The results are then analyzed to compare the various approaches to determine the relative merits of each. As the experiments described in this paper confirm, not all oversampling techniques help in correcting data-induced model biases. Based on the Kappa statistic, F-1 score, and accuracy measured by the area under the Receiver Operating Characteristic curve, among the approaches evaluated, the Majority Weighted Minority Oversampling Technique, MWMOTE oversampling technique addresses the fairness issues the best and also improves the performance of the models at least for the dataset in consideration. The experiments also demonstrate that some of the oversampling techniques can degrade the models both in terms of performance and fairness. The results are interpreted using the evaluation metrics.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信