Detecting Data Semantic: A Data Leakage Prevention Approach

Sultan Alneyadi, E. Sithirasenan, V. Muthukkumarasamy
{"title":"Detecting Data Semantic: A Data Leakage Prevention Approach","authors":"Sultan Alneyadi, E. Sithirasenan, V. Muthukkumarasamy","doi":"10.1109/Trustcom.2015.464","DOIUrl":null,"url":null,"abstract":"Data leakage prevention systems (DLPSs) are increasingly being implemented by organizations. Unlike standard security mechanisms such as firewalls and intrusion detection systems, DLPSs are designated systems used to protect in use, at rest and in transit data. DLPSs analytically use the content and surrounding context of confidential data to detect and prevent unauthorized access to confidential data. DLPSs that use content analysis techniques are largely dependent upon data fingerprinting, regular expressions, and statistical analysis to detect data leaks. Given that data is susceptible to change, data fingerprinting and regular expressions suffer from shortcomings in detecting the semantics of evolved confidential data. However, statistical analysis can manage any data that appears fuzzy in nature or has other variations. Thus, DLPSs with statistical analysis capabilities can approximate the presence of data semantics. In this paper, a statistical data leakage prevention (DLP) model is presented to classify data on the basis of semantics. This study contributes to the data leakage prevention field by using data statistical analysis to detect evolved confidential data. The approach was based on using the well-known information retrieval function Term Frequency-Inverse Document Frequency (TF-IDF) to classify documents under certain topics. A Singular Value Decomposition (SVD) matrix was also used to visualize the classification results. The results showed that the proposed statistical DLP approach could correctly classify documents even in cases of extreme modification. It also had a high level of precision and recall scores.","PeriodicalId":277092,"journal":{"name":"2015 IEEE Trustcom/BigDataSE/ISPA","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"41","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE Trustcom/BigDataSE/ISPA","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/Trustcom.2015.464","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 41

Abstract

Data leakage prevention systems (DLPSs) are increasingly being implemented by organizations. Unlike standard security mechanisms such as firewalls and intrusion detection systems, DLPSs are designated systems used to protect in use, at rest and in transit data. DLPSs analytically use the content and surrounding context of confidential data to detect and prevent unauthorized access to confidential data. DLPSs that use content analysis techniques are largely dependent upon data fingerprinting, regular expressions, and statistical analysis to detect data leaks. Given that data is susceptible to change, data fingerprinting and regular expressions suffer from shortcomings in detecting the semantics of evolved confidential data. However, statistical analysis can manage any data that appears fuzzy in nature or has other variations. Thus, DLPSs with statistical analysis capabilities can approximate the presence of data semantics. In this paper, a statistical data leakage prevention (DLP) model is presented to classify data on the basis of semantics. This study contributes to the data leakage prevention field by using data statistical analysis to detect evolved confidential data. The approach was based on using the well-known information retrieval function Term Frequency-Inverse Document Frequency (TF-IDF) to classify documents under certain topics. A Singular Value Decomposition (SVD) matrix was also used to visualize the classification results. The results showed that the proposed statistical DLP approach could correctly classify documents even in cases of extreme modification. It also had a high level of precision and recall scores.
数据语义检测:一种防止数据泄漏的方法
数据泄漏预防系统(dlps)越来越多地被组织所采用。与防火墙和入侵检测系统等标准安全机制不同,dlps是用于保护使用中、静态和传输中的数据的指定系统。dlps分析地使用机密数据的内容和周围上下文来检测和防止对机密数据的未经授权的访问。使用内容分析技术的dlps在很大程度上依赖于数据指纹、正则表达式和统计分析来检测数据泄漏。由于数据容易发生变化,数据指纹和正则表达式在检测演化的机密数据的语义方面存在缺陷。然而,统计分析可以管理本质上看起来模糊或有其他变化的任何数据。因此,具有统计分析功能的dlps可以近似地表示数据语义的存在。本文提出了一种基于语义的统计数据泄漏预防(DLP)模型。本研究通过数据统计分析来检测演变的机密数据,为数据泄漏预防领域做出了贡献。该方法基于使用著名的信息检索函数术语频率-逆文档频率(TF-IDF)对特定主题下的文档进行分类。采用奇异值分解(SVD)矩阵对分类结果进行可视化处理。结果表明,即使在极端修改的情况下,统计DLP方法也能正确地对文档进行分类。它也有很高的精确度和回忆分数。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信