Detection of SQL Injection Attack Using Machine Learning Based On Natural Language Processing

Joko Triloka, H. Hartono, Sutedi Sutedi
{"title":"Detection of SQL Injection Attack Using Machine Learning Based On Natural Language Processing","authors":"Joko Triloka, H. Hartono, Sutedi Sutedi","doi":"10.29099/ijair.v6i2.355","DOIUrl":null,"url":null,"abstract":"There has been a significant increase in the number of cyberattacks. This is not only happening in Indonesia, but also in many countries.  Thus, the issue of cyber attacks should receive attention and be interesting to study.  Regarding the explored security vulnerabilities, the Open Web Application Security Project has published the Top-10 website vulnerabilities. SQL Injection is still become one of the website vulnerabiliteis which is often exploited by attacker. This research has implemented and tested five algorithms. They are Naïve Bayes, Logistic Regression, Gradient Boosting, K-Nearest Neighbor, and Support Vector Machine. In addition, this study also uses natural language processing to increase the level of detection accuracy, as a part of text processing. Therefore, the main dataset was converted to corpus to make it easier to be analyzed. This process was carried out on feature enginering stage. This study used two datasets of SQL Injection. The first dataset was used to train the classifier, and the second dataset was used to test the performance of classifier. Based on the tests that have been carried out, the Support Vector Machine get the highest level of accuracy detection. The accuracy of detection is 0.9977 with 0,00100 micro seconds per query time of process. In performance testing, Support Vector Machine classifier can detect 99,37% of second dataset. Not only Support Vector Machine, the study have also revealed the detection accuracy level of further tested algorithms: K-Nearest Neighbor (0,9970), Logistic Refression (0,9960), Gradient Boosting (0,99477), and Naïve Bayes (0,9754).","PeriodicalId":334856,"journal":{"name":"International Journal of Artificial Intelligence Research","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Artificial Intelligence Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.29099/ijair.v6i2.355","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

There has been a significant increase in the number of cyberattacks. This is not only happening in Indonesia, but also in many countries.  Thus, the issue of cyber attacks should receive attention and be interesting to study.  Regarding the explored security vulnerabilities, the Open Web Application Security Project has published the Top-10 website vulnerabilities. SQL Injection is still become one of the website vulnerabiliteis which is often exploited by attacker. This research has implemented and tested five algorithms. They are Naïve Bayes, Logistic Regression, Gradient Boosting, K-Nearest Neighbor, and Support Vector Machine. In addition, this study also uses natural language processing to increase the level of detection accuracy, as a part of text processing. Therefore, the main dataset was converted to corpus to make it easier to be analyzed. This process was carried out on feature enginering stage. This study used two datasets of SQL Injection. The first dataset was used to train the classifier, and the second dataset was used to test the performance of classifier. Based on the tests that have been carried out, the Support Vector Machine get the highest level of accuracy detection. The accuracy of detection is 0.9977 with 0,00100 micro seconds per query time of process. In performance testing, Support Vector Machine classifier can detect 99,37% of second dataset. Not only Support Vector Machine, the study have also revealed the detection accuracy level of further tested algorithms: K-Nearest Neighbor (0,9970), Logistic Refression (0,9960), Gradient Boosting (0,99477), and Naïve Bayes (0,9754).
基于自然语言处理的机器学习检测SQL注入攻击
网络攻击的数量显著增加。这不仅发生在印度尼西亚,也发生在许多国家。因此,网络攻击的问题应该得到重视和有趣的研究。对于已发现的安全漏洞,开放Web应用程序安全项目发布了十大网站漏洞。SQL注入已成为攻击者经常利用的网站漏洞之一。本研究实现并测试了五种算法。它们是Naïve贝叶斯,逻辑回归,梯度增强,k近邻和支持向量机。此外,本研究还利用自然语言处理作为文本处理的一部分来提高检测精度水平。因此,将主数据集转换为语料库,使其更易于分析。该过程在特征工程阶段进行。本研究使用了SQL Injection的两个数据集。第一个数据集用于训练分类器,第二个数据集用于测试分类器的性能。基于已经进行的测试,支持向量机获得了最高水平的精度检测。检测精度为0.9977,每个进程的查询时间为0.00100微秒。在性能测试中,支持向量机分类器可以检测到99.37%的第二数据集。不仅是支持向量机,该研究还揭示了进一步测试的算法的检测精度水平:k -最近邻(0,9970),逻辑回归(0,9960),梯度增强(0,99477)和Naïve贝叶斯(0,9754)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信