Towards Fairer Health Recommendations: finding informative unbiased samples via Word Sense Disambiguation

Gavin Butts, Pegah Emdad, Jethro Lee, Shannon Song, Chiman Salavati, Willmar Sosa Diaz, Shiri Dori-Hacohen, Fabricio Murai
{"title":"Towards Fairer Health Recommendations: finding informative unbiased samples via Word Sense Disambiguation","authors":"Gavin Butts, Pegah Emdad, Jethro Lee, Shannon Song, Chiman Salavati, Willmar Sosa Diaz, Shiri Dori-Hacohen, Fabricio Murai","doi":"arxiv-2409.07424","DOIUrl":null,"url":null,"abstract":"There have been growing concerns around high-stake applications that rely on\nmodels trained with biased data, which consequently produce biased predictions,\noften harming the most vulnerable. In particular, biased medical data could\ncause health-related applications and recommender systems to create outputs\nthat jeopardize patient care and widen disparities in health outcomes. A recent\nframework titled Fairness via AI posits that, instead of attempting to correct\nmodel biases, researchers must focus on their root causes by using AI to debias\ndata. Inspired by this framework, we tackle bias detection in medical curricula\nusing NLP models, including LLMs, and evaluate them on a gold standard dataset\ncontaining 4,105 excerpts annotated by medical experts for bias from a large\ncorpus. We build on previous work by coauthors which augments the set of\nnegative samples with non-annotated text containing social identifier terms.\nHowever, some of these terms, especially those related to race and ethnicity,\ncan carry different meanings (e.g., \"white matter of spinal cord\"). To address\nthis issue, we propose the use of Word Sense Disambiguation models to refine\ndataset quality by removing irrelevant sentences. We then evaluate fine-tuned\nvariations of BERT models as well as GPT models with zero- and few-shot\nprompting. We found LLMs, considered SOTA on many NLP tasks, unsuitable for\nbias detection, while fine-tuned BERT models generally perform well across all\nevaluated metrics.","PeriodicalId":501112,"journal":{"name":"arXiv - CS - Computers and Society","volume":"157 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computers and Society","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.07424","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

There have been growing concerns around high-stake applications that rely on models trained with biased data, which consequently produce biased predictions, often harming the most vulnerable. In particular, biased medical data could cause health-related applications and recommender systems to create outputs that jeopardize patient care and widen disparities in health outcomes. A recent framework titled Fairness via AI posits that, instead of attempting to correct model biases, researchers must focus on their root causes by using AI to debias data. Inspired by this framework, we tackle bias detection in medical curricula using NLP models, including LLMs, and evaluate them on a gold standard dataset containing 4,105 excerpts annotated by medical experts for bias from a large corpus. We build on previous work by coauthors which augments the set of negative samples with non-annotated text containing social identifier terms. However, some of these terms, especially those related to race and ethnicity, can carry different meanings (e.g., "white matter of spinal cord"). To address this issue, we propose the use of Word Sense Disambiguation models to refine dataset quality by removing irrelevant sentences. We then evaluate fine-tuned variations of BERT models as well as GPT models with zero- and few-shot prompting. We found LLMs, considered SOTA on many NLP tasks, unsuitable for bias detection, while fine-tuned BERT models generally perform well across all evaluated metrics.
实现更公平的健康建议:通过词义消歧找到信息丰富的无偏样本
越来越多的人开始关注那些依赖于有偏见数据训练的模型的高风险应用,这些模型会产生有偏见的预测,往往会伤害到最脆弱的群体。特别是,有偏见的医疗数据可能会导致与健康相关的应用和推荐系统产生危害病人护理和扩大健康结果差距的输出。最近一个名为 "通过人工智能实现公平"(Fairness via AI)的框架认为,研究人员不应试图纠正模型偏差,而必须通过使用人工智能来消除数据偏差,从而关注其根本原因。受这一框架的启发,我们利用包括 LLM 在内的 NLP 模型来解决医学课程中的偏差检测问题,并在一个金标准数据集上对其进行评估,该数据集包含由医学专家注释的 4105 个摘录,这些摘录来自一个大型语料库,存在偏差。然而,其中一些术语,尤其是与种族和民族相关的术语,可能具有不同的含义(如 "脊髓白质")。为了解决这个问题,我们建议使用词义消歧模型,通过删除不相关的句子来提高数据集的质量。然后,我们评估了 BERT 模型的微调变量,以及零提示和少提示的 GPT 模型。我们发现,在许多 NLP 任务中被认为是 SOTA 的 LLMs 并不适合偏误检测,而经过微调的 BERT 模型在所有评估指标中表现一般。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信