训练数据的分类:解开不匹配的权利、补救措施和限制机器学习的理由

The electronic journal of human sexuality Pub Date : 2020-08-19 DOI:10.2139/ssrn.3677548

Benjamin Sobel

{"title":"训练数据的分类:解开不匹配的权利、补救措施和限制机器学习的理由","authors":"Benjamin Sobel","doi":"10.2139/ssrn.3677548","DOIUrl":null,"url":null,"abstract":"This chapter addresses a crucial problem in artificial intelligence: many applications of machine learning depend on unauthorized uses of copyrighted data. Scholars and lawmakers often articulate this problem as a deficiency in copyright’s exceptions and limitations, reasoning that legal uncertainties surrounding today’s AI stem from the lack of a clear exception or limitation, and that such an exception or limitation could resolve the current predicament. In fact, the current predicament is a product of two systemic features of the copyright regime — the absence of formalities and the low threshold of copyright-able originality — combined with a technological environment that turns routine activities into acts of authorship. Equilibrating the economy for human expression in the AI age requires a solution that focuses not only on exceptions to existing copyrights, but also on the aforementioned doctrinal features that determine the ownership and scope of copyright entitlements at their inception. \n \nThe chapter taxonomizes different applications of machine learning according to the qualities of their training data. Four categories emerge: (1) public-domain training data, (2) licensed training data, (3) market-encroaching uses of copyrighted training data, and (4) non-market-encroaching uses of copyrighted training data. Copyright can only regulate market-encroaching uses of data, but these uses represent a narrow subset of AI applications and exclude many of the most socially harmful uses of copyrighted materials. Moreover, paradoxically, copyright’s property-style remedies are ill-suited to addressing market-encroaching uses, and are in fact much more appropriate remedies for the categories of worrisome AI that fall outside copyright’s normative mandate. \n \nFinally, this chapter discusses a variety of remedies to the “AI problems” it identifies, with an emphasis on facilitating market-encroaching uses while affording human creators due compensation. It concludes that the exception for Text and Data Mining in the European Union’s Directive on Copyright in the Digital Single Market represents a positive development precisely because the exception addresses some structural causes of the training data problem that this chapter identifies. The TDM provision styles itself as an exception, but it may in fact be better understood as a formality: it requires rights holders to take positive action to exercise a right to exclude their materials from training datasets. Thus, the TDM exception addresses a root cause of the AI dilemma rather than trying to patch up the copyright regime post hoc. The chapter concludes that the next step for an equitable AI framework will be to transition towards rules that not only clarify that non-market-encroaching uses do not infringe copyright, but also facilitate remunerated uses of copyrighted works for market-encroaching purposes.","PeriodicalId":89488,"journal":{"name":"The electronic journal of human sexuality","volume":"44 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2020-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"A Taxonomy of Training Data: Disentangling the Mismatched Rights, Remedies, and Rationales for Restricting Machine Learning\",\"authors\":\"Benjamin Sobel\",\"doi\":\"10.2139/ssrn.3677548\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This chapter addresses a crucial problem in artificial intelligence: many applications of machine learning depend on unauthorized uses of copyrighted data. Scholars and lawmakers often articulate this problem as a deficiency in copyright’s exceptions and limitations, reasoning that legal uncertainties surrounding today’s AI stem from the lack of a clear exception or limitation, and that such an exception or limitation could resolve the current predicament. In fact, the current predicament is a product of two systemic features of the copyright regime — the absence of formalities and the low threshold of copyright-able originality — combined with a technological environment that turns routine activities into acts of authorship. Equilibrating the economy for human expression in the AI age requires a solution that focuses not only on exceptions to existing copyrights, but also on the aforementioned doctrinal features that determine the ownership and scope of copyright entitlements at their inception. \\n \\nThe chapter taxonomizes different applications of machine learning according to the qualities of their training data. Four categories emerge: (1) public-domain training data, (2) licensed training data, (3) market-encroaching uses of copyrighted training data, and (4) non-market-encroaching uses of copyrighted training data. Copyright can only regulate market-encroaching uses of data, but these uses represent a narrow subset of AI applications and exclude many of the most socially harmful uses of copyrighted materials. Moreover, paradoxically, copyright’s property-style remedies are ill-suited to addressing market-encroaching uses, and are in fact much more appropriate remedies for the categories of worrisome AI that fall outside copyright’s normative mandate. \\n \\nFinally, this chapter discusses a variety of remedies to the “AI problems” it identifies, with an emphasis on facilitating market-encroaching uses while affording human creators due compensation. It concludes that the exception for Text and Data Mining in the European Union’s Directive on Copyright in the Digital Single Market represents a positive development precisely because the exception addresses some structural causes of the training data problem that this chapter identifies. The TDM provision styles itself as an exception, but it may in fact be better understood as a formality: it requires rights holders to take positive action to exercise a right to exclude their materials from training datasets. Thus, the TDM exception addresses a root cause of the AI dilemma rather than trying to patch up the copyright regime post hoc. The chapter concludes that the next step for an equitable AI framework will be to transition towards rules that not only clarify that non-market-encroaching uses do not infringe copyright, but also facilitate remunerated uses of copyrighted works for market-encroaching purposes.\",\"PeriodicalId\":89488,\"journal\":{\"name\":\"The electronic journal of human sexuality\",\"volume\":\"44 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-08-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The electronic journal of human sexuality\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2139/ssrn.3677548\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The electronic journal of human sexuality","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2139/ssrn.3677548","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

摘要

本章解决了人工智能中的一个关键问题:机器学习的许多应用依赖于未经授权使用受版权保护的数据。学者和立法者经常将这一问题表述为版权例外和限制的不足，理由是围绕当今人工智能的法律不确定性源于缺乏明确的例外或限制，而这种例外或限制可以解决当前的困境。事实上，目前的困境是版权制度的两个系统特征的产物——形式上的缺乏和可受版权保护的独创性的低门槛——再加上把日常活动变成作者行为的技术环境。在人工智能时代，平衡人类表达的经济需要一个解决方案，不仅要关注现有版权的例外情况，还要关注上述理论特征，这些特征在一开始就决定了版权权利的所有权和范围。本章根据训练数据的质量对机器学习的不同应用进行分类。出现了四种类型:(1)公共领域的训练数据;(2)许可的训练数据;(3)侵犯市场的使用有版权的训练数据;(4)非侵犯市场的使用有版权的训练数据。版权只能规范侵占市场的数据使用，但这些使用代表了人工智能应用的一个狭窄子集，并排除了许多对版权材料最具社会危害性的使用。此外，矛盾的是，版权的产权式补救措施并不适合解决市场侵占的用途，事实上，对于那些不属于版权规范授权的令人担忧的人工智能类别，这种补救措施更合适。最后，本章讨论了它所确定的“人工智能问题”的各种补救措施，重点是促进市场侵占的使用，同时向人类创造者提供应有的补偿。它的结论是，欧盟数字单一市场版权指令中文本和数据挖掘的例外代表了一个积极的发展，正是因为该例外解决了本章所确定的培训数据问题的一些结构性原因。TDM条款将自己定义为一个例外，但实际上它可能更好地理解为一种形式:它要求权利持有人采取积极行动，行使将其材料排除在训练数据集之外的权利。因此，TDM例外解决了人工智能困境的根本原因，而不是试图在事后修补版权制度。本章的结论是，公平的人工智能框架的下一步将是向规则过渡，不仅要澄清非市场侵占用途不侵犯版权，而且要促进为市场侵占目的而有偿使用受版权保护的作品。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Taxonomy of Training Data: Disentangling the Mismatched Rights, Remedies, and Rationales for Restricting Machine Learning

This chapter addresses a crucial problem in artificial intelligence: many applications of machine learning depend on unauthorized uses of copyrighted data. Scholars and lawmakers often articulate this problem as a deficiency in copyright’s exceptions and limitations, reasoning that legal uncertainties surrounding today’s AI stem from the lack of a clear exception or limitation, and that such an exception or limitation could resolve the current predicament. In fact, the current predicament is a product of two systemic features of the copyright regime — the absence of formalities and the low threshold of copyright-able originality — combined with a technological environment that turns routine activities into acts of authorship. Equilibrating the economy for human expression in the AI age requires a solution that focuses not only on exceptions to existing copyrights, but also on the aforementioned doctrinal features that determine the ownership and scope of copyright entitlements at their inception. The chapter taxonomizes different applications of machine learning according to the qualities of their training data. Four categories emerge: (1) public-domain training data, (2) licensed training data, (3) market-encroaching uses of copyrighted training data, and (4) non-market-encroaching uses of copyrighted training data. Copyright can only regulate market-encroaching uses of data, but these uses represent a narrow subset of AI applications and exclude many of the most socially harmful uses of copyrighted materials. Moreover, paradoxically, copyright’s property-style remedies are ill-suited to addressing market-encroaching uses, and are in fact much more appropriate remedies for the categories of worrisome AI that fall outside copyright’s normative mandate. Finally, this chapter discusses a variety of remedies to the “AI problems” it identifies, with an emphasis on facilitating market-encroaching uses while affording human creators due compensation. It concludes that the exception for Text and Data Mining in the European Union’s Directive on Copyright in the Digital Single Market represents a positive development precisely because the exception addresses some structural causes of the training data problem that this chapter identifies. The TDM provision styles itself as an exception, but it may in fact be better understood as a formality: it requires rights holders to take positive action to exercise a right to exclude their materials from training datasets. Thus, the TDM exception addresses a root cause of the AI dilemma rather than trying to patch up the copyright regime post hoc. The chapter concludes that the next step for an equitable AI framework will be to transition towards rules that not only clarify that non-market-encroaching uses do not infringe copyright, but also facilitate remunerated uses of copyrighted works for market-encroaching purposes.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

The electronic journal of human sexuality

自引率

0.00%

发文量