联邦框架中差分私有序列标记的研究

Proceedings of the Third Workshop on Privacy in Natural Language Processing Pub Date : 2021-06-01 DOI:10.18653/V1/2021.PRIVATENLP-1.4

Abhik Jana, Chris Biemann

{"title":"联邦框架中差分私有序列标记的研究","authors":"Abhik Jana, Chris Biemann","doi":"10.18653/V1/2021.PRIVATENLP-1.4","DOIUrl":null,"url":null,"abstract":"To build machine learning-based applications for sensitive domains like medical, legal, etc. where the digitized text contains private information, anonymization of text is required for preserving privacy. Sequence tagging, e.g. as done in Named Entity Recognition (NER) can help to detect private information. However, to train sequence tagging models, a sufficient amount of labeled data are required but for privacy-sensitive domains, such labeled data also can not be shared directly. In this paper, we investigate the applicability of a privacy-preserving framework for sequence tagging tasks, specifically NER. Hence, we analyze a framework for the NER task, which incorporates two levels of privacy protection. Firstly, we deploy a federated learning (FL) framework where the labeled data are not shared with the centralized server as well as the peer clients. Secondly, we apply differential privacy (DP) while the models are being trained in each client instance. While both privacy measures are suitable for privacy-aware models, their combination results in unstable models. To our knowledge, this is the first study of its kind on privacy-aware sequence tagging models.","PeriodicalId":270632,"journal":{"name":"Proceedings of the Third Workshop on Privacy in Natural Language Processing","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"An Investigation towards Differentially Private Sequence Tagging in a Federated Framework\",\"authors\":\"Abhik Jana, Chris Biemann\",\"doi\":\"10.18653/V1/2021.PRIVATENLP-1.4\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"To build machine learning-based applications for sensitive domains like medical, legal, etc. where the digitized text contains private information, anonymization of text is required for preserving privacy. Sequence tagging, e.g. as done in Named Entity Recognition (NER) can help to detect private information. However, to train sequence tagging models, a sufficient amount of labeled data are required but for privacy-sensitive domains, such labeled data also can not be shared directly. In this paper, we investigate the applicability of a privacy-preserving framework for sequence tagging tasks, specifically NER. Hence, we analyze a framework for the NER task, which incorporates two levels of privacy protection. Firstly, we deploy a federated learning (FL) framework where the labeled data are not shared with the centralized server as well as the peer clients. Secondly, we apply differential privacy (DP) while the models are being trained in each client instance. While both privacy measures are suitable for privacy-aware models, their combination results in unstable models. To our knowledge, this is the first study of its kind on privacy-aware sequence tagging models.\",\"PeriodicalId\":270632,\"journal\":{\"name\":\"Proceedings of the Third Workshop on Privacy in Natural Language Processing\",\"volume\":\"10 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the Third Workshop on Privacy in Natural Language Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.18653/V1/2021.PRIVATENLP-1.4\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Third Workshop on Privacy in Natural Language Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18653/V1/2021.PRIVATENLP-1.4","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

摘要

要为敏感领域(如医疗、法律等)构建基于机器学习的应用程序，其中数字化文本包含私人信息，需要对文本进行匿名化以保护隐私。序列标记，例如在命名实体识别(NER)中所做的，可以帮助检测私有信息。然而，为了训练序列标记模型，需要足够数量的标记数据，但对于隐私敏感领域，这些标记数据也不能直接共享。在本文中，我们研究了隐私保护框架对序列标记任务的适用性，特别是NER。因此，我们分析了NER任务的框架，该框架包含两个级别的隐私保护。首先，我们部署了一个联邦学习(FL)框架，其中标记的数据不与集中式服务器以及对等客户端共享。其次，我们在每个客户端实例中训练模型时应用差分隐私(DP)。虽然这两种隐私措施都适用于隐私感知模型，但它们的组合会导致模型不稳定。据我们所知，这是对隐私感知序列标记模型的第一次研究。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

An Investigation towards Differentially Private Sequence Tagging in a Federated Framework

To build machine learning-based applications for sensitive domains like medical, legal, etc. where the digitized text contains private information, anonymization of text is required for preserving privacy. Sequence tagging, e.g. as done in Named Entity Recognition (NER) can help to detect private information. However, to train sequence tagging models, a sufficient amount of labeled data are required but for privacy-sensitive domains, such labeled data also can not be shared directly. In this paper, we investigate the applicability of a privacy-preserving framework for sequence tagging tasks, specifically NER. Hence, we analyze a framework for the NER task, which incorporates two levels of privacy protection. Firstly, we deploy a federated learning (FL) framework where the labeled data are not shared with the centralized server as well as the peer clients. Secondly, we apply differential privacy (DP) while the models are being trained in each client instance. While both privacy measures are suitable for privacy-aware models, their combination results in unstable models. To our knowledge, this is the first study of its kind on privacy-aware sequence tagging models.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the Third Workshop on Privacy in Natural Language Processing

自引率

0.00%

发文量