分析和解决用于社会问题的机器学习模型中数据驱动的公平问题

2023 International Conference on Computer, Electrical & Communication Engineering (ICCECE) Pub Date : 2023-01-20 DOI:10.1109/ICCECE51049.2023.10085470

Vishnu S. Pendyala, Hyun-Gee Kim

{"title":"分析和解决用于社会问题的机器学习模型中数据驱动的公平问题","authors":"Vishnu S. Pendyala, Hyun-Gee Kim","doi":"10.1109/ICCECE51049.2023.10085470","DOIUrl":null,"url":null,"abstract":"This work aims to systematically analyze and address fairness issues arising in machine learning models because of class imbalances present in data, specifically used for addressing societal problems and providing unique insights. Using a specific data set, spectral analysis is first performed to present evidence and characterize the fairness issues. Subsequently, a series of class imbalance correction techniques are applied before the data is used to generate various machine learning models. The models so generated are then evaluated using multiple metrics. The results are then analyzed to compare the various approaches to determine the relative merits of each. As the experiments described in this paper confirm, not all oversampling techniques help in correcting data-induced model biases. Based on the Kappa statistic, F-1 score, and accuracy measured by the area under the Receiver Operating Characteristic curve, among the approaches evaluated, the Majority Weighted Minority Oversampling Technique, MWMOTE oversampling technique addresses the fairness issues the best and also improves the performance of the models at least for the dataset in consideration. The experiments also demonstrate that some of the oversampling techniques can degrade the models both in terms of performance and fairness. The results are interpreted using the evaluation metrics.","PeriodicalId":447131,"journal":{"name":"2023 International Conference on Computer, Electrical & Communication Engineering (ICCECE)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Analyzing and Addressing Data-driven Fairness Issues in Machine Learning Models used for Societal Problems\",\"authors\":\"Vishnu S. Pendyala, Hyun-Gee Kim\",\"doi\":\"10.1109/ICCECE51049.2023.10085470\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This work aims to systematically analyze and address fairness issues arising in machine learning models because of class imbalances present in data, specifically used for addressing societal problems and providing unique insights. Using a specific data set, spectral analysis is first performed to present evidence and characterize the fairness issues. Subsequently, a series of class imbalance correction techniques are applied before the data is used to generate various machine learning models. The models so generated are then evaluated using multiple metrics. The results are then analyzed to compare the various approaches to determine the relative merits of each. As the experiments described in this paper confirm, not all oversampling techniques help in correcting data-induced model biases. Based on the Kappa statistic, F-1 score, and accuracy measured by the area under the Receiver Operating Characteristic curve, among the approaches evaluated, the Majority Weighted Minority Oversampling Technique, MWMOTE oversampling technique addresses the fairness issues the best and also improves the performance of the models at least for the dataset in consideration. The experiments also demonstrate that some of the oversampling techniques can degrade the models both in terms of performance and fairness. The results are interpreted using the evaluation metrics.\",\"PeriodicalId\":447131,\"journal\":{\"name\":\"2023 International Conference on Computer, Electrical & Communication Engineering (ICCECE)\",\"volume\":\"21 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-01-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 International Conference on Computer, Electrical & Communication Engineering (ICCECE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCECE51049.2023.10085470\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 International Conference on Computer, Electrical & Communication Engineering (ICCECE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCECE51049.2023.10085470","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

这项工作旨在系统地分析和解决机器学习模型中由于数据中存在的阶级不平衡而产生的公平问题，专门用于解决社会问题并提供独特的见解。使用特定的数据集，首先进行频谱分析，以提供证据并表征公平性问题。随后，在将数据用于生成各种机器学习模型之前，应用一系列类不平衡校正技术。然后使用多个指标对生成的模型进行评估。然后对结果进行分析，比较各种方法，以确定每种方法的相对优点。正如本文所描述的实验所证实的那样，并非所有的过采样技术都有助于纠正数据引起的模型偏差。基于Kappa统计量、F-1分数和接收者工作特征曲线下面积测量的准确性，在评估的方法中，多数加权少数过采样技术、MWMOTE过采样技术解决了公平性问题，至少在考虑的数据集上提高了模型的性能。实验还表明，一些过采样技术会降低模型的性能和公平性。使用评估度量来解释结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Analyzing and Addressing Data-driven Fairness Issues in Machine Learning Models used for Societal Problems

This work aims to systematically analyze and address fairness issues arising in machine learning models because of class imbalances present in data, specifically used for addressing societal problems and providing unique insights. Using a specific data set, spectral analysis is first performed to present evidence and characterize the fairness issues. Subsequently, a series of class imbalance correction techniques are applied before the data is used to generate various machine learning models. The models so generated are then evaluated using multiple metrics. The results are then analyzed to compare the various approaches to determine the relative merits of each. As the experiments described in this paper confirm, not all oversampling techniques help in correcting data-induced model biases. Based on the Kappa statistic, F-1 score, and accuracy measured by the area under the Receiver Operating Characteristic curve, among the approaches evaluated, the Majority Weighted Minority Oversampling Technique, MWMOTE oversampling technique addresses the fairness issues the best and also improves the performance of the models at least for the dataset in consideration. The experiments also demonstrate that some of the oversampling techniques can degrade the models both in terms of performance and fairness. The results are interpreted using the evaluation metrics.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2023 International Conference on Computer, Electrical & Communication Engineering (ICCECE)

自引率

0.00%

发文量