IF 3.7 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC
Brian Belgodere;Pierre Dognin;Adam Ivankay;Igor Melnyk;Youssef Mroueh;Aleksandra Mojsilović;Jiri Navratil;Apoorva Nitsure;Inkit Padhi;Mattia Rigotti;Jerret Ross;Yair Schiff;Radhika Vedpathak;Richard A. Young
{"title":"Auditing and Generating Synthetic Data With Controllable Trust Trade-Offs","authors":"Brian Belgodere;Pierre Dognin;Adam Ivankay;Igor Melnyk;Youssef Mroueh;Aleksandra Mojsilović;Jiri Navratil;Apoorva Nitsure;Inkit Padhi;Mattia Rigotti;Jerret Ross;Yair Schiff;Radhika Vedpathak;Richard A. Young","doi":"10.1109/JETCAS.2024.3477976","DOIUrl":null,"url":null,"abstract":"Real-world data often exhibits bias, imbalance, and privacy risks. Synthetic datasets have emerged to address these issues by enabling a paradigm that relies on generative AI models to generate unbiased, privacy-preserving data while maintaining fidelity to the original data. However, assessing the trustworthiness of synthetic datasets and models is a critical challenge. We introduce a holistic auditing framework that comprehensively evaluates synthetic datasets and AI models. It focuses on preventing bias and discrimination, ensuring fidelity to the source data, and assessing utility, robustness, and privacy preservation. We demonstrate our framework’s effectiveness by auditing various generative models across diverse use cases like education, healthcare, banking, and human resources, spanning different data modalities such as tabular, time-series, vision, and natural language. This holistic assessment is essential for compliance with regulatory safeguards. We introduce a trustworthiness index to rank synthetic datasets based on their safeguards trade-offs. Furthermore, we present a trustworthiness-driven model selection and cross-validation process during training, exemplified with “TrustFormers” across various data types. This approach allows for controllable trustworthiness trade-offs in synthetic data creation. Our auditing framework fosters collaboration among stakeholders, including data scientists, governance experts, internal reviewers, external certifiers, and regulators. This transparent reporting should become a standard practice to prevent bias, discrimination, and privacy violations, ensuring compliance with policies and providing accountability, safety, and performance guarantees.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"14 4","pages":"773-788"},"PeriodicalIF":3.7000,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10713321","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10713321/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

摘要

现实世界的数据往往存在偏差、不平衡和隐私风险。为了解决这些问题,合成数据集应运而生,这种模式依靠生成式人工智能模型生成无偏见、保护隐私的数据,同时保持与原始数据的保真度。然而,评估合成数据集和模型的可信度是一项严峻的挑战。我们引入了一个整体审核框架,可全面评估合成数据集和人工智能模型。它侧重于防止偏见和歧视,确保忠于源数据,以及评估实用性、稳健性和隐私保护。我们通过审核教育、医疗保健、银行和人力资源等不同使用案例中的各种生成模型,以及表格、时间序列、视觉和自然语言等不同数据模式,展示了我们框架的有效性。这种整体评估对于遵守监管保障措施至关重要。我们引入了一种可信度指数,可根据合成数据集的保障措施权衡对其进行排序。此外,我们还介绍了在训练过程中以可信度为导向的模型选择和交叉验证过程,并在各种数据类型中以 "TrustFormers "为例进行说明。这种方法允许在创建合成数据时进行可控的可信度权衡。我们的审核框架促进了利益相关者之间的合作,包括数据科学家、治理专家、内部审核人员、外部认证人员和监管机构。这种透明的报告应成为防止偏见、歧视和侵犯隐私的标准做法,确保符合政策并提供责任、安全和性能保证。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Auditing and Generating Synthetic Data With Controllable Trust Trade-Offs
Real-world data often exhibits bias, imbalance, and privacy risks. Synthetic datasets have emerged to address these issues by enabling a paradigm that relies on generative AI models to generate unbiased, privacy-preserving data while maintaining fidelity to the original data. However, assessing the trustworthiness of synthetic datasets and models is a critical challenge. We introduce a holistic auditing framework that comprehensively evaluates synthetic datasets and AI models. It focuses on preventing bias and discrimination, ensuring fidelity to the source data, and assessing utility, robustness, and privacy preservation. We demonstrate our framework’s effectiveness by auditing various generative models across diverse use cases like education, healthcare, banking, and human resources, spanning different data modalities such as tabular, time-series, vision, and natural language. This holistic assessment is essential for compliance with regulatory safeguards. We introduce a trustworthiness index to rank synthetic datasets based on their safeguards trade-offs. Furthermore, we present a trustworthiness-driven model selection and cross-validation process during training, exemplified with “TrustFormers” across various data types. This approach allows for controllable trustworthiness trade-offs in synthetic data creation. Our auditing framework fosters collaboration among stakeholders, including data scientists, governance experts, internal reviewers, external certifiers, and regulators. This transparent reporting should become a standard practice to prevent bias, discrimination, and privacy violations, ensuring compliance with policies and providing accountability, safety, and performance guarantees.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
8.50
自引率
2.20%
发文量
86
期刊介绍: The IEEE Journal on Emerging and Selected Topics in Circuits and Systems is published quarterly and solicits, with particular emphasis on emerging areas, special issues on topics that cover the entire scope of the IEEE Circuits and Systems (CAS) Society, namely the theory, analysis, design, tools, and implementation of circuits and systems, spanning their theoretical foundations, applications, and architectures for signal and information processing.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信