A mobile edge computing-focused transferable sensitive data identification method based on product quantization

Journal of Cloud Computing Pub Date : 2024-05-08 DOI:10.1186/s13677-024-00662-4

Xinjian Zhao, Guoquan Yuan, Shuhan Qiu, Chenwei Xu, Shanming Wei

{"title":"A mobile edge computing-focused transferable sensitive data identification method based on product quantization","authors":"Xinjian Zhao, Guoquan Yuan, Shuhan Qiu, Chenwei Xu, Shanming Wei","doi":"10.1186/s13677-024-00662-4","DOIUrl":null,"url":null,"abstract":"Sensitive data identification represents the initial and crucial step in safeguarding sensitive information. With the ongoing evolution of the industrial internet, including its interconnectivity across various sectors like the electric power industry, the potential for sensitive data to traverse different domains increases, thereby altering the composition of sensitive data. Consequently, traditional approaches reliant on sensitive vocabularies struggle to adequately address the challenges posed by identifying sensitive data in the era of information abundance. Drawing inspiration from advancements in natural language processing within the realm of deep learning, we propose a transferable Sensitive Data Identification method based on Product Quantization, named PQ-SDI. This innovative approach harnesses both the composition and contextual cues within textual data to accurately pinpoint sensitive information within the context of Mobile Edge Computing (MEC). Notably, PQ-SDI exhibits proficiency not only within a singular domain but also demonstrates adaptability to new domains following training on heterogeneous datasets. Moreover, the method autonomously identifies sensitive data throughout the entire process, eliminating the necessity for human upkeep of sensitive vocabularies. Extensive experimentation with the PQ-SDI model across four real-world datasets, resulting in performance improvements ranging from 2% to 5% over the baseline model and achieves an accuracy of up to 94.41%. In cross-domain trials, PQ-SDI achieved comparable accuracy to training and identification within the same domain. Furthermore, our experiments showcased the product quantization technique significantly reduces the parameter size by tens of times for the subsequent sensitive data identification phase, particularly beneficial for resource-constrained environments characteristic of MEC scenarios. This inherent advantage not only bolsters sensitive data protection but also mitigates the risk of data leakage during transmission, thus enhancing overall security measures in MEC environments.","PeriodicalId":501257,"journal":{"name":"Journal of Cloud Computing","volume":"10 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Cloud Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1186/s13677-024-00662-4","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Sensitive data identification represents the initial and crucial step in safeguarding sensitive information. With the ongoing evolution of the industrial internet, including its interconnectivity across various sectors like the electric power industry, the potential for sensitive data to traverse different domains increases, thereby altering the composition of sensitive data. Consequently, traditional approaches reliant on sensitive vocabularies struggle to adequately address the challenges posed by identifying sensitive data in the era of information abundance. Drawing inspiration from advancements in natural language processing within the realm of deep learning, we propose a transferable Sensitive Data Identification method based on Product Quantization, named PQ-SDI. This innovative approach harnesses both the composition and contextual cues within textual data to accurately pinpoint sensitive information within the context of Mobile Edge Computing (MEC). Notably, PQ-SDI exhibits proficiency not only within a singular domain but also demonstrates adaptability to new domains following training on heterogeneous datasets. Moreover, the method autonomously identifies sensitive data throughout the entire process, eliminating the necessity for human upkeep of sensitive vocabularies. Extensive experimentation with the PQ-SDI model across four real-world datasets, resulting in performance improvements ranging from 2% to 5% over the baseline model and achieves an accuracy of up to 94.41%. In cross-domain trials, PQ-SDI achieved comparable accuracy to training and identification within the same domain. Furthermore, our experiments showcased the product quantization technique significantly reduces the parameter size by tens of times for the subsequent sensitive data identification phase, particularly beneficial for resource-constrained environments characteristic of MEC scenarios. This inherent advantage not only bolsters sensitive data protection but also mitigates the risk of data leakage during transmission, thus enhancing overall security measures in MEC environments.

查看原文本刊更多论文

基于乘积量化的以移动边缘计算为重点的可转移敏感数据识别方法

敏感数据识别是保护敏感信息的第一步，也是至关重要的一步。随着工业互联网的不断发展，包括其在电力行业等不同领域的互联性，敏感数据穿越不同领域的可能性增加，从而改变了敏感数据的构成。因此，依赖于敏感词汇表的传统方法难以充分应对在信息丰富时代识别敏感数据所带来的挑战。从深度学习领域的自然语言处理进展中汲取灵感，我们提出了一种基于产品量化的可转移敏感数据识别方法，命名为 PQ-SDI。这种创新方法利用文本数据的构成和上下文线索，在移动边缘计算（MEC）中准确定位敏感信息。值得注意的是，PQ-SDI 不仅在单一领域表现出卓越的能力，而且在异构数据集上接受训练后，还表现出对新领域的适应性。此外，该方法能在整个过程中自动识别敏感数据，无需人工维护敏感词汇。在四个真实数据集上对 PQ-SDI 模型进行了广泛的实验，结果比基准模型的性能提高了 2% 到 5%，准确率高达 94.41%。在跨领域试验中，PQ-SDI 的准确率与同一领域内的训练和识别结果相当。此外，我们的实验表明，在随后的敏感数据识别阶段，乘积量化技术大大减少了数十倍的参数大小，这对于 MEC 场景特有的资源受限环境尤为有利。这一固有优势不仅加强了敏感数据的保护，还降低了数据在传输过程中泄漏的风险，从而增强了 MEC 环境中的整体安全措施。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Cloud Computing

自引率

0.00%

发文量