机密信息的自动去个人化

Rossijskij tehnologičeskij žurnal Pub Date : 2023-10-05 DOI:10.32362/2500-316x-2023-11-5-7-18

N G. Babak, L. Yu. Belorybkin, S. A. Otsokov, A. T. Terenin, A. I. Shabrova

{"title":"机密信息的自动去个人化","authors":"N G. Babak, L. Yu. Belorybkin, S. A. Otsokov, A. T. Terenin, A. I. Shabrova","doi":"10.32362/2500-316x-2023-11-5-7-18","DOIUrl":null,"url":null,"abstract":"Objectives . As the scope of personal data transmitted online continues to grow, national legislatures are increasingly regulating the storage and processing of digital information. This paper raises the problem of protecting personal data and other confidential information such as bank secrecy or medical confidentiality of individuals. One approach to the protection of confidential data is to depersonalize it, i.e., to transform it so that it becomes impossible to identify the specific subject to whom the data belongs. The aim of the work is to develop a method for the rapid and safe automation of the depersonalization process using machine learning technologies. Methods. The authors propose the use of artificial intelligence models to implement a system for the automatic depersonalization of personal data without the use of human labor to preclude the possibility of recognizing confidential information even in unstructured data with sufficient accuracy. Rule-based algorithms for improving the precision of the depersonalization system are described. Results . In order to solve this problem, a model of named entity recognition is trained on confidential data provided by the authors. In conjunction with rule-based algorithms, an F1 score greater than 0.9 is achieved. For solving specific depersonalization problems, a choice between several implemented anonymization algorithm variants can be made. Conclusions . The developed system solves the problem of automatic anonymization of confidential data. This opens an opportunity to ensure the secure processing and transmission of confidential information in many areas, such as banking, government administration, and advertising campaigns. The automation of the depersonalization process makes it possible to transfer confidential information in cases where it is necessary, but not currently possible due to legal restrictions. The distinctive feature of the developed solution is that both structured data and unstructured data are depersonalized, including the preservation of context.","PeriodicalId":494463,"journal":{"name":"Rossijskij tehnologičeskij žurnal","volume":"45 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Automatic depersonalization of confidential information\",\"authors\":\"N G. Babak, L. Yu. Belorybkin, S. A. Otsokov, A. T. Terenin, A. I. Shabrova\",\"doi\":\"10.32362/2500-316x-2023-11-5-7-18\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Objectives . As the scope of personal data transmitted online continues to grow, national legislatures are increasingly regulating the storage and processing of digital information. This paper raises the problem of protecting personal data and other confidential information such as bank secrecy or medical confidentiality of individuals. One approach to the protection of confidential data is to depersonalize it, i.e., to transform it so that it becomes impossible to identify the specific subject to whom the data belongs. The aim of the work is to develop a method for the rapid and safe automation of the depersonalization process using machine learning technologies. Methods. The authors propose the use of artificial intelligence models to implement a system for the automatic depersonalization of personal data without the use of human labor to preclude the possibility of recognizing confidential information even in unstructured data with sufficient accuracy. Rule-based algorithms for improving the precision of the depersonalization system are described. Results . In order to solve this problem, a model of named entity recognition is trained on confidential data provided by the authors. In conjunction with rule-based algorithms, an F1 score greater than 0.9 is achieved. For solving specific depersonalization problems, a choice between several implemented anonymization algorithm variants can be made. Conclusions . The developed system solves the problem of automatic anonymization of confidential data. This opens an opportunity to ensure the secure processing and transmission of confidential information in many areas, such as banking, government administration, and advertising campaigns. The automation of the depersonalization process makes it possible to transfer confidential information in cases where it is necessary, but not currently possible due to legal restrictions. The distinctive feature of the developed solution is that both structured data and unstructured data are depersonalized, including the preservation of context.\",\"PeriodicalId\":494463,\"journal\":{\"name\":\"Rossijskij tehnologičeskij žurnal\",\"volume\":\"45 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-10-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Rossijskij tehnologičeskij žurnal\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.32362/2500-316x-2023-11-5-7-18\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Rossijskij tehnologičeskij žurnal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.32362/2500-316x-2023-11-5-7-18","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

目标。随着在线传输的个人数据范围不断扩大，国家立法机构越来越多地规范数字信息的存储和处理。本文提出了保护个人数据和其他机密信息的问题，例如个人的银行保密或医疗保密。保护机密数据的一种方法是使其非个人化，即对其进行改造，使其无法确定数据所属的具体主体。这项工作的目的是开发一种使用机器学习技术快速、安全地自动化去个性化过程的方法。方法。作者建议使用人工智能模型来实现一个不使用人工劳动的个人数据自动去个性化系统，以排除即使在非结构化数据中也能以足够的准确性识别机密信息的可能性。描述了用于提高去个性化系统精度的基于规则的算法。结果。为了解决这个问题，在作者提供的机密数据上训练了一个命名实体识别模型。结合基于规则的算法，可以获得大于0.9的F1分数。为了解决具体的去个性化问题，可以在几个实现的匿名化算法变体之间进行选择。结论。所开发的系统解决了机密数据的自动匿名化问题。这为确保在许多领域(如银行、政府管理和广告活动)安全处理和传输机密信息提供了机会。去人格化过程的自动化使得在必要的情况下转移机密信息成为可能，但由于法律限制，目前还不可能。开发的解决方案的独特特点是结构化数据和非结构化数据都是非个性化的，包括上下文的保存。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Automatic depersonalization of confidential information

Objectives . As the scope of personal data transmitted online continues to grow, national legislatures are increasingly regulating the storage and processing of digital information. This paper raises the problem of protecting personal data and other confidential information such as bank secrecy or medical confidentiality of individuals. One approach to the protection of confidential data is to depersonalize it, i.e., to transform it so that it becomes impossible to identify the specific subject to whom the data belongs. The aim of the work is to develop a method for the rapid and safe automation of the depersonalization process using machine learning technologies. Methods. The authors propose the use of artificial intelligence models to implement a system for the automatic depersonalization of personal data without the use of human labor to preclude the possibility of recognizing confidential information even in unstructured data with sufficient accuracy. Rule-based algorithms for improving the precision of the depersonalization system are described. Results . In order to solve this problem, a model of named entity recognition is trained on confidential data provided by the authors. In conjunction with rule-based algorithms, an F1 score greater than 0.9 is achieved. For solving specific depersonalization problems, a choice between several implemented anonymization algorithm variants can be made. Conclusions . The developed system solves the problem of automatic anonymization of confidential data. This opens an opportunity to ensure the secure processing and transmission of confidential information in many areas, such as banking, government administration, and advertising campaigns. The automation of the depersonalization process makes it possible to transfer confidential information in cases where it is necessary, but not currently possible due to legal restrictions. The distinctive feature of the developed solution is that both structured data and unstructured data are depersonalized, including the preservation of context.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Rossijskij tehnologičeskij žurnal

自引率

0.00%

发文量