数据稀缺下隐私政策信息提取的“人在环”方法

2023 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW) Pub Date : 2023-05-24 DOI:10.1109/EuroSPW59978.2023.00014

M. Gebauer, Faraz Maschhur, Nicola Leschke, Elias Grünewald, Frank Pallas

{"title":"数据稀缺下隐私政策信息提取的“人在环”方法","authors":"M. Gebauer, Faraz Maschhur, Nicola Leschke, Elias Grünewald, Frank Pallas","doi":"10.1109/EuroSPW59978.2023.00014","DOIUrl":null,"url":null,"abstract":"Machine-readable representations of privacy policies are door openers for a broad variety of novel privacy-enhancing and, in particular, transparency-enhancing technologies (TETs). In order to generate such representations, transparency information needs to be extracted from written privacy policies. However, respective manual annotation and extraction processes are laborious and require expert knowledge. Approaches for fully automated annotation, in turn, have so far not succeeded due to overly high error rates in the specific domain of privacy policies. In the end, a lack of properly annotated privacy policies and respective machine-readable representations persists and enduringly hinders the development and establishment of novel technical approaches fostering policy perception and data subject informedness.In this work, we present a prototype system for a ‘ Human-in-the-Loop’ approach to privacy policy annotation that integrates ML-generated suggestions and ultimately human annotation decisions. We propose an ML-based suggestion system specifically tailored to the constraint of data scarcity prevalent in the domain of privacy policy annotation. On this basis, we provide meaningful predictions to users thereby streamlining the annotation process. Additionally, we also evaluate our approach through a prototypical implementation to show that our ML-based extraction approach provides superior performance over other recently used extraction models for legal documents.","PeriodicalId":220415,"journal":{"name":"2023 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"A ‘Human-in-the-Loop’ approach for Information Extraction from Privacy Policies under Data Scarcity\",\"authors\":\"M. Gebauer, Faraz Maschhur, Nicola Leschke, Elias Grünewald, Frank Pallas\",\"doi\":\"10.1109/EuroSPW59978.2023.00014\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Machine-readable representations of privacy policies are door openers for a broad variety of novel privacy-enhancing and, in particular, transparency-enhancing technologies (TETs). In order to generate such representations, transparency information needs to be extracted from written privacy policies. However, respective manual annotation and extraction processes are laborious and require expert knowledge. Approaches for fully automated annotation, in turn, have so far not succeeded due to overly high error rates in the specific domain of privacy policies. In the end, a lack of properly annotated privacy policies and respective machine-readable representations persists and enduringly hinders the development and establishment of novel technical approaches fostering policy perception and data subject informedness.In this work, we present a prototype system for a ‘ Human-in-the-Loop’ approach to privacy policy annotation that integrates ML-generated suggestions and ultimately human annotation decisions. We propose an ML-based suggestion system specifically tailored to the constraint of data scarcity prevalent in the domain of privacy policy annotation. On this basis, we provide meaningful predictions to users thereby streamlining the annotation process. Additionally, we also evaluate our approach through a prototypical implementation to show that our ML-based extraction approach provides superior performance over other recently used extraction models for legal documents.\",\"PeriodicalId\":220415,\"journal\":{\"name\":\"2023 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW)\",\"volume\":\"65 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-05-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/EuroSPW59978.2023.00014\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/EuroSPW59978.2023.00014","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

机器可读的隐私政策表示是各种新型隐私增强技术，特别是透明度增强技术(TETs)的大门。为了生成这样的表示，需要从书面隐私策略中提取透明度信息。然而，各自的手动注释和提取过程是费力的，并且需要专业知识。反过来，由于在隐私策略的特定领域中错误率过高，全自动注释的方法到目前为止还没有成功。最后，缺乏适当注释的隐私政策和相应的机器可读表示仍然存在，并长期阻碍了促进政策感知和数据主体知情性的新技术方法的开发和建立。在这项工作中，我们提出了一个用于隐私策略注释的“人在循环”方法的原型系统，该方法集成了ml生成的建议和最终的人类注释决策。针对隐私策略标注领域普遍存在的数据稀缺性约束，提出了一种基于机器学习的建议系统。在此基础上，我们为用户提供有意义的预测，从而简化注释过程。此外，我们还通过一个原型实现来评估我们的方法，以表明我们基于ml的提取方法比其他最近使用的法律文件提取模型提供了更好的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A ‘Human-in-the-Loop’ approach for Information Extraction from Privacy Policies under Data Scarcity

Machine-readable representations of privacy policies are door openers for a broad variety of novel privacy-enhancing and, in particular, transparency-enhancing technologies (TETs). In order to generate such representations, transparency information needs to be extracted from written privacy policies. However, respective manual annotation and extraction processes are laborious and require expert knowledge. Approaches for fully automated annotation, in turn, have so far not succeeded due to overly high error rates in the specific domain of privacy policies. In the end, a lack of properly annotated privacy policies and respective machine-readable representations persists and enduringly hinders the development and establishment of novel technical approaches fostering policy perception and data subject informedness.In this work, we present a prototype system for a ‘ Human-in-the-Loop’ approach to privacy policy annotation that integrates ML-generated suggestions and ultimately human annotation decisions. We propose an ML-based suggestion system specifically tailored to the constraint of data scarcity prevalent in the domain of privacy policy annotation. On this basis, we provide meaningful predictions to users thereby streamlining the annotation process. Additionally, we also evaluate our approach through a prototypical implementation to show that our ML-based extraction approach provides superior performance over other recently used extraction models for legal documents.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2023 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW)

自引率

0.00%

发文量