Word N-Gram Based Classification for Data Leakage Prevention

Sultan Alneyadi, E. Sithirasenan, V. Muthukkumarasamy
{"title":"Word N-Gram Based Classification for Data Leakage Prevention","authors":"Sultan Alneyadi, E. Sithirasenan, V. Muthukkumarasamy","doi":"10.1109/TrustCom.2013.71","DOIUrl":null,"url":null,"abstract":"Revealing sensitive data to unauthorised personal is a serious problem to many organizations that can lead to devastating consequences. Traditionally, prevention of data leak was achieved through firewalls, VPNs and IDS, but without much consideration to sensitivity of the data. In recent years, new technologies such as data leakage prevention systems (DLPs) are developed, especially to either identify and protect sensitive data or monitor and detect sensitive data leakage. One of the most popular approaches used in DLPs is content analysis, where the content of exchanged documents, stored data or even network traffic is monitored for sensitive data. Contents of documents are examined using mainly text analysis and text clustering methods. Moreover, text analysis can be performed using methods such as pattern recognition, style variation and N-gram frequency. In this paper, we investigate the use of N-grams for data classification purposes. Our method is based on using the N-grams frequency to classify documents in order to detect and prevent leakage of sensitive data. We have studied the effectiveness of N-grams to measure the similarity between regular documents and existing classified documents.","PeriodicalId":206739,"journal":{"name":"2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications","volume":"93 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TrustCom.2013.71","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 17

Abstract

Revealing sensitive data to unauthorised personal is a serious problem to many organizations that can lead to devastating consequences. Traditionally, prevention of data leak was achieved through firewalls, VPNs and IDS, but without much consideration to sensitivity of the data. In recent years, new technologies such as data leakage prevention systems (DLPs) are developed, especially to either identify and protect sensitive data or monitor and detect sensitive data leakage. One of the most popular approaches used in DLPs is content analysis, where the content of exchanged documents, stored data or even network traffic is monitored for sensitive data. Contents of documents are examined using mainly text analysis and text clustering methods. Moreover, text analysis can be performed using methods such as pattern recognition, style variation and N-gram frequency. In this paper, we investigate the use of N-grams for data classification purposes. Our method is based on using the N-grams frequency to classify documents in order to detect and prevent leakage of sensitive data. We have studied the effectiveness of N-grams to measure the similarity between regular documents and existing classified documents.
基于词n图的数据泄漏预防分类
向未经授权的个人泄露敏感数据对许多组织来说是一个严重的问题,可能会导致毁灭性的后果。传统上,防止数据泄漏是通过防火墙、vpn和IDS来实现的,但没有考虑到数据的敏感性。近年来,数据泄漏预防系统(dlp)等新技术得到了发展,特别是在识别和保护敏感数据或监控和检测敏感数据泄漏方面。dlp中使用的最流行的方法之一是内容分析,其中监视交换文档、存储数据甚至网络流量的内容以查找敏感数据。主要使用文本分析和文本聚类方法来检查文档的内容。此外,文本分析可以使用模式识别、风格变化和N-gram频率等方法进行。在本文中,我们研究了n -图用于数据分类的目的。我们的方法是基于N-grams频率对文档进行分类,以检测和防止敏感数据的泄露。我们研究了N-grams在衡量常规文档和现有分类文档之间的相似性方面的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信