Count vectorizer model based web application vulnerability detection using artificial intelligence approach

IF 1.2 Q2 MATHEMATICS, APPLIED
K. Manjunatha, M. Kempanna
{"title":"Count vectorizer model based web application vulnerability detection using artificial intelligence approach","authors":"K. Manjunatha, M. Kempanna","doi":"10.1080/09720529.2022.2133243","DOIUrl":null,"url":null,"abstract":"Abstract A web application is a dynamic, intricate, and interactive program that provides end-users with information and services such as utility payments, online communication, e-learning, socializing, shopping, online banking, and income tax filing etc. Web applications have become a major target for attackers due to their accessibility, availability, and ubiquity. Web application vulnerabilities are hazardous for some reasons. Attackers can harm an organizations image and status. The implementation flaws in web application allow the invader to infuse user-input that violates the syntax-based assembly of the query or infuse malicious code etc. Among various types of injection flaws, SQL injection (SQLI) is more prominent than (XML) both are considered as common application-layer web attack, which allows the attacker to bypass the security mechanisms therefore; these two are ranked as the most common vulnerabilities. Hence, a methodology for detecting evaluating both SQLI & XML vulnerabilities in web applications are considered for research. This research work addresses the above mentioned flaws and proposed an Ensemble Method to classify the Structure Query Language injection vulnerabilities, we selected a benchmark dataset with 33,758 rows containing; various types of SQL and XML injection attacks. Raw data is preprocessed to remove artifacts, and then feature engineering is performed using Natural Language Processing techniques to clean the data and extract 6 types of features such as TF-IDF, Word-to-Vector, SkipGram, Count Vectorizer, Glove and Continuous Bag of words. Imbalance data is handled using sampling techniques, best features are selected using 4 types of validation techniques Significant Test, PCA, Variance Threshold and Sbest. Prepared data is provided to Ensemble Model having two stages; Stage-2 accepts URL from the user and detects presence of susceptibility in the sub domains and domains. Stage-1 having 9 different types of machine learning models Multinomial, Gaussian, Bernoulli Naive Bayes, Logistic Regression, Decision Tree, Random Forest, AdaBoost, SVC with, poly, rbf and linear kernel, these models are trained on additional vectors such as google news and glove to detect the new query either SQL or XML for presences or absence of vulnerability, using this proposed ensemble approach obtained the accuracy of 99%.","PeriodicalId":46563,"journal":{"name":"JOURNAL OF DISCRETE MATHEMATICAL SCIENCES & CRYPTOGRAPHY","volume":"25 1","pages":"2039 - 2048"},"PeriodicalIF":1.2000,"publicationDate":"2022-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JOURNAL OF DISCRETE MATHEMATICAL SCIENCES & CRYPTOGRAPHY","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/09720529.2022.2133243","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MATHEMATICS, APPLIED","Score":null,"Total":0}
引用次数: 0

Abstract

Abstract A web application is a dynamic, intricate, and interactive program that provides end-users with information and services such as utility payments, online communication, e-learning, socializing, shopping, online banking, and income tax filing etc. Web applications have become a major target for attackers due to their accessibility, availability, and ubiquity. Web application vulnerabilities are hazardous for some reasons. Attackers can harm an organizations image and status. The implementation flaws in web application allow the invader to infuse user-input that violates the syntax-based assembly of the query or infuse malicious code etc. Among various types of injection flaws, SQL injection (SQLI) is more prominent than (XML) both are considered as common application-layer web attack, which allows the attacker to bypass the security mechanisms therefore; these two are ranked as the most common vulnerabilities. Hence, a methodology for detecting evaluating both SQLI & XML vulnerabilities in web applications are considered for research. This research work addresses the above mentioned flaws and proposed an Ensemble Method to classify the Structure Query Language injection vulnerabilities, we selected a benchmark dataset with 33,758 rows containing; various types of SQL and XML injection attacks. Raw data is preprocessed to remove artifacts, and then feature engineering is performed using Natural Language Processing techniques to clean the data and extract 6 types of features such as TF-IDF, Word-to-Vector, SkipGram, Count Vectorizer, Glove and Continuous Bag of words. Imbalance data is handled using sampling techniques, best features are selected using 4 types of validation techniques Significant Test, PCA, Variance Threshold and Sbest. Prepared data is provided to Ensemble Model having two stages; Stage-2 accepts URL from the user and detects presence of susceptibility in the sub domains and domains. Stage-1 having 9 different types of machine learning models Multinomial, Gaussian, Bernoulli Naive Bayes, Logistic Regression, Decision Tree, Random Forest, AdaBoost, SVC with, poly, rbf and linear kernel, these models are trained on additional vectors such as google news and glove to detect the new query either SQL or XML for presences or absence of vulnerability, using this proposed ensemble approach obtained the accuracy of 99%.
基于计数矢量模型的人工智能web应用漏洞检测方法
摘要web应用程序是一个动态、复杂和交互式的程序,它为最终用户提供信息和服务,如公用事业支付、在线通信、电子学习、社交、购物、网上银行和所得税申报等。由于其可访问性、可用性和普遍性,web应用程序已成为攻击者的主要目标。由于某些原因,Web应用程序漏洞是危险的。攻击者可能会损害组织的形象和地位。web应用程序中的实现缺陷允许入侵者注入违反查询基于语法的汇编的用户输入或注入恶意代码等。在各种类型的注入缺陷中,SQL注入(SQLI)比XML更突出,两者都被认为是常见的应用层web攻击,这使攻击者能够绕过安全机制;这两个漏洞被列为最常见的漏洞。因此,需要考虑一种检测和评估web应用程序中SQLI和XML漏洞的方法进行研究。本研究工作针对上述缺陷,提出了一种集成方法来对结构查询语言注入漏洞进行分类,我们选择了一个33758行的基准数据集;各种类型的SQL和XML注入攻击。对原始数据进行预处理以去除伪影,然后使用自然语言处理技术进行特征工程以清理数据并提取6种类型的特征,如TF-IDF、Word to Vector、SkippGram、Count Vectorizer、Glove和Continuous Bag of words。不平衡数据使用采样技术处理,最佳特征使用4种类型的验证技术显著性检验、主成分分析、方差阈值和Sbest进行选择。准备好的数据被提供给具有两个阶段的集合模型;阶段2接受来自用户的URL,并检测子域和域中是否存在易感性。阶段-1具有9种不同类型的机器学习模型多项式、高斯、伯努利-奈夫贝叶斯、逻辑回归、决策树、随机森林、AdaBoost、SVC,以及poly、rbf和线性内核,这些模型在谷歌新闻和手套等附加向量上进行训练,以检测SQL或XML的新查询是否存在漏洞,使用该集成方法获得了99%的准确率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
3.10
自引率
21.40%
发文量
126
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信