‘Equality and Privacy by Design’: Ensuring Artificial Intelligence (AI) Is Properly Trained & Fed: A New Model of AI Data Transparency & Certification As Safe Harbor Procedures

S. Yanisky-Ravid, Sean Hallisey
{"title":"‘Equality and Privacy by Design’: Ensuring Artificial Intelligence (AI) Is Properly Trained & Fed: A New Model of AI Data Transparency & Certification As Safe Harbor Procedures","authors":"S. Yanisky-Ravid, Sean Hallisey","doi":"10.2139/SSRN.3278490","DOIUrl":null,"url":null,"abstract":"Artificial Intelligence systems (“AI”) are often described as a technological breakthrough that will completely transform our society and economy. AI systems have been implemented in all facets of the economy, from medicine to transportation, finance, art, legal, social, and weapons; making decisions previously determined by humans. While this article recognizes that AI systems promise benefits, it also identifies urgent challenges to our everyday life. Just as the technology has become prolific, so has the literature concerning its legal implications. However, the literature suffers from a lack of solutions that address the legal and engineering perspectives. This leaves technology firms without guidelines and increases the risk of societal harm. Policymakers, including judges, operate without a regulatory regime to turn to when addressing these novel and unpredictable outcomes. This article tries to fill the void by focusing on the use of data by these systems, rather than on the software and software programmers. It suggests a new Model that stems from a recognition of the significant role that the data plays in the development and functioning of AI systems. \n \nOne of the most important phases of teaching AI systems to operate starts with a preexisting massive dataset that the data providers use to train the system. The data providers are programmers, trainers; the stakeholders who enable access to data or the systems’ users. In this article, we analyze and discuss the threats the use of data by AI systems pose in terms of producing discriminatory outcomes as well as violations of privacy. \n \nThe data can be illegal, discriminatory, manufacture, unreliable, or simply incomplete. The more data that AI systems “swallow,” the likelihood increases that AI systems could produce biased, discriminatory decisions and/or violate privacy. The article discusses how discrimination can arise, even inadvertently, from the operation of “trusted” and \"objective\" AI systems. The article addresses, on the one hand, the hurdles and challenges behind the use of big data by AI systems, and on the other, suggests a possible, new solution. \n \nWe propose a new AI data transparency Model that focuses on disclosure of the data being used by AI systems, when necessary. To perfect the Model we recommend an auditing regime and a certification program, either by governmental body or, in the absence of such entity, by private institutes. This Model will encourage the industry to take steps, proactively, to ensure that the dataset is trustworthy and then, to publicly exhibit the quality of the data (that their AI systems rely on). By receiving and publicizing a quality “stamp” the firms will fairly build their competitive reputation and will strengthen the public control of the systems. \n \nWe envision that the implementation of this Model will help firms and individuals become educated about the potential issues concerning AI, discrimination and the continued weakening of societal expectations of privacy. In this sense, the AI data transparency Model operates as a safe harbor mechanism that incentivizes the industry, from the first steps of developing and training AI systems, to the actual operation of the AI systems, to implement effective standards, that we coin Equality and Privacy by Design. \n \nThe suggested AI Transparency Model functions as a safe harbor, even without massive regulatory steps. From an engineering point of view, not only does the model recognize the data providers and the big data as the most important components in the process of creating, training and operating AI systems, but the AI Data Transparency Model is also technologically feasible as data can be easily absorbed and kept by a technological tool. This Model is feasible from a practical perspective, as it follows already existing legal frameworks of data transparency, such as the ones being implemented by the FDA and SEC. \n \nWe argue that improving transparency in data systems should result in less harmful AI systems, better protect societal rights and norms, and produce improved outcomes in this emerging field, specifically for minority communities, who often lack resources or representation to combat the use of AI systems. We assert that improvements in transparency regarding the data used while developing, training or operating AI systems could mitigate and reduce these harms. We recommend critical evaluations and audits of data used to train AI systems to identify such risks, and propose a certification system whereby AI systems can publicize good faith efforts to reduce the possibility of discriminatory outcomes and privacy violations. We do not purport to solve the riddle of every possible negative outcome created by AI systems; instead, we are trying to incentivize the creation of new standards that the industry could implement, from day one of developing AI systems that addresses the possibility of harm, rather than post-hoc assignments of liability.","PeriodicalId":218558,"journal":{"name":"AARN: Science & Technology Studies (Sub-Topic)","volume":"385 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"AARN: Science & Technology Studies (Sub-Topic)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2139/SSRN.3278490","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8

Abstract

Artificial Intelligence systems (“AI”) are often described as a technological breakthrough that will completely transform our society and economy. AI systems have been implemented in all facets of the economy, from medicine to transportation, finance, art, legal, social, and weapons; making decisions previously determined by humans. While this article recognizes that AI systems promise benefits, it also identifies urgent challenges to our everyday life. Just as the technology has become prolific, so has the literature concerning its legal implications. However, the literature suffers from a lack of solutions that address the legal and engineering perspectives. This leaves technology firms without guidelines and increases the risk of societal harm. Policymakers, including judges, operate without a regulatory regime to turn to when addressing these novel and unpredictable outcomes. This article tries to fill the void by focusing on the use of data by these systems, rather than on the software and software programmers. It suggests a new Model that stems from a recognition of the significant role that the data plays in the development and functioning of AI systems. One of the most important phases of teaching AI systems to operate starts with a preexisting massive dataset that the data providers use to train the system. The data providers are programmers, trainers; the stakeholders who enable access to data or the systems’ users. In this article, we analyze and discuss the threats the use of data by AI systems pose in terms of producing discriminatory outcomes as well as violations of privacy. The data can be illegal, discriminatory, manufacture, unreliable, or simply incomplete. The more data that AI systems “swallow,” the likelihood increases that AI systems could produce biased, discriminatory decisions and/or violate privacy. The article discusses how discrimination can arise, even inadvertently, from the operation of “trusted” and "objective" AI systems. The article addresses, on the one hand, the hurdles and challenges behind the use of big data by AI systems, and on the other, suggests a possible, new solution. We propose a new AI data transparency Model that focuses on disclosure of the data being used by AI systems, when necessary. To perfect the Model we recommend an auditing regime and a certification program, either by governmental body or, in the absence of such entity, by private institutes. This Model will encourage the industry to take steps, proactively, to ensure that the dataset is trustworthy and then, to publicly exhibit the quality of the data (that their AI systems rely on). By receiving and publicizing a quality “stamp” the firms will fairly build their competitive reputation and will strengthen the public control of the systems. We envision that the implementation of this Model will help firms and individuals become educated about the potential issues concerning AI, discrimination and the continued weakening of societal expectations of privacy. In this sense, the AI data transparency Model operates as a safe harbor mechanism that incentivizes the industry, from the first steps of developing and training AI systems, to the actual operation of the AI systems, to implement effective standards, that we coin Equality and Privacy by Design. The suggested AI Transparency Model functions as a safe harbor, even without massive regulatory steps. From an engineering point of view, not only does the model recognize the data providers and the big data as the most important components in the process of creating, training and operating AI systems, but the AI Data Transparency Model is also technologically feasible as data can be easily absorbed and kept by a technological tool. This Model is feasible from a practical perspective, as it follows already existing legal frameworks of data transparency, such as the ones being implemented by the FDA and SEC. We argue that improving transparency in data systems should result in less harmful AI systems, better protect societal rights and norms, and produce improved outcomes in this emerging field, specifically for minority communities, who often lack resources or representation to combat the use of AI systems. We assert that improvements in transparency regarding the data used while developing, training or operating AI systems could mitigate and reduce these harms. We recommend critical evaluations and audits of data used to train AI systems to identify such risks, and propose a certification system whereby AI systems can publicize good faith efforts to reduce the possibility of discriminatory outcomes and privacy violations. We do not purport to solve the riddle of every possible negative outcome created by AI systems; instead, we are trying to incentivize the creation of new standards that the industry could implement, from day one of developing AI systems that addresses the possibility of harm, rather than post-hoc assignments of liability.
“设计的平等和隐私”:确保人工智能(AI)得到适当的训练和喂养:人工智能数据透明度和认证作为安全港程序的新模式
人工智能系统(“AI”)通常被描述为一项技术突破,将彻底改变我们的社会和经济。人工智能系统已经应用于经济的各个方面,从医药到交通、金融、艺术、法律、社会和武器;做出之前由人类决定的决定。虽然这篇文章承认人工智能系统带来了好处,但它也指出了我们日常生活面临的紧迫挑战。随着这项技术的发展,有关其法律含义的文献也越来越多。然而,文献缺乏解决法律和工程观点的解决方案。这使得科技公司没有指导方针,并增加了社会危害的风险。包括法官在内的政策制定者,在处理这些新颖而不可预测的结果时,没有一个监管机制可以求助。本文试图通过关注这些系统对数据的使用来填补这一空白,而不是关注软件和软件程序员。它提出了一种新的模型,这种模型源于对数据在人工智能系统的开发和功能中发挥重要作用的认识。教人工智能系统运行的最重要阶段之一是从预先存在的大量数据集开始的,数据提供者使用这些数据集来训练系统。数据提供者是程序员、培训师;能够访问数据或系统用户的利益相关者。在本文中,我们分析和讨论了人工智能系统在产生歧视性结果和侵犯隐私方面使用数据所构成的威胁。这些数据可能是非法的、歧视性的、捏造的、不可靠的或不完整的。人工智能系统“吞下”的数据越多,人工智能系统产生偏见、歧视性决策和/或侵犯隐私的可能性就越大。这篇文章讨论了“可信的”和“客观的”人工智能系统如何在不经意间产生歧视。这篇文章一方面阐述了人工智能系统使用大数据背后的障碍和挑战,另一方面提出了一种可能的新解决方案。我们提出了一个新的人工智能数据透明度模型,重点是在必要时披露人工智能系统使用的数据。为了完善该模式,我们建议由政府机构或在没有此类实体的情况下由私人机构建立审计制度和认证计划。该模型将鼓励行业主动采取措施,确保数据集值得信赖,然后公开展示(他们的人工智能系统所依赖的)数据的质量。通过接受和宣传质量“印章”,公司将公平地建立其竞争声誉,并将加强公众对系统的控制。我们设想,该模型的实施将帮助公司和个人了解有关人工智能、歧视和社会对隐私期望持续减弱的潜在问题。从这个意义上讲,人工智能数据透明模型作为一种安全港机制,从开发和培训人工智能系统的最初步骤,到人工智能系统的实际运行,激励行业实施有效的标准,我们通过设计创造平等和隐私。即使没有大规模的监管措施,拟议的人工智能透明度模型也可以作为一个安全港。从工程的角度来看,该模型不仅承认数据提供者和大数据是人工智能系统创建、培训和运营过程中最重要的组成部分,而且由于数据可以很容易地被技术工具吸收和保存,因此在技术上也是可行的。从实践的角度来看,该模型是可行的,因为它遵循了现有的数据透明度法律框架,例如FDA和SEC正在实施的法律框架。我们认为,提高数据系统的透明度应该会减少人工智能系统的危害,更好地保护社会权利和规范,并在这一新兴领域产生更好的结果,特别是对于经常缺乏资源或代表性的少数族裔社区,以对抗人工智能系统的使用。我们认为,在开发、培训或操作人工智能系统时,提高数据的透明度可以减轻和减少这些危害。我们建议对用于训练人工智能系统以识别此类风险的数据进行关键评估和审计,并提出一个认证系统,使人工智能系统可以宣传善意的努力,以减少歧视性结果和侵犯隐私的可能性。我们并不打算解决人工智能系统可能产生的每一个负面结果;相反,我们正试图鼓励制定行业可以实施的新标准,从开发人工智能系统的第一天起就解决可能造成的伤害,而不是事后分配责任。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信