Wei-qing Huang, Chenggang Jia, Min Yu, Gang Li, Chao Liu, Jianguo Jiang
{"title":"UTANSA: Static Approach for Multi-Language Malicious Web Scripts Detection","authors":"Wei-qing Huang, Chenggang Jia, Min Yu, Gang Li, Chao Liu, Jianguo Jiang","doi":"10.1109/ISCC53001.2021.9631400","DOIUrl":null,"url":null,"abstract":"In order to detect malicious web scripts automatically, many detection methods using static features and machine learning are proposed. However, the existing detection methods can only detect web scripts of specific programming languages. This paper proposes the unified text features and abstract syntax tree(AST) node sequence features algorithm(UTANSA) that exploits the text feature classification method and AST node classification method, together with the corresponding unified method to enhance the generalization ability of the model. Through the algorithm, two unified approaches are proposed based on text features and AST node features respectively, so that the detection model can detect multi-language web scripts. We choose scripts written in the JavaScript(JS) and PHP languages for experimentation to evaluate our approach. The results show that the detection model trained with the proposed method has a similar detection effect as trained with only JS samples or PHP samples.","PeriodicalId":270786,"journal":{"name":"2021 IEEE Symposium on Computers and Communications (ISCC)","volume":"187 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE Symposium on Computers and Communications (ISCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCC53001.2021.9631400","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
In order to detect malicious web scripts automatically, many detection methods using static features and machine learning are proposed. However, the existing detection methods can only detect web scripts of specific programming languages. This paper proposes the unified text features and abstract syntax tree(AST) node sequence features algorithm(UTANSA) that exploits the text feature classification method and AST node classification method, together with the corresponding unified method to enhance the generalization ability of the model. Through the algorithm, two unified approaches are proposed based on text features and AST node features respectively, so that the detection model can detect multi-language web scripts. We choose scripts written in the JavaScript(JS) and PHP languages for experimentation to evaluate our approach. The results show that the detection model trained with the proposed method has a similar detection effect as trained with only JS samples or PHP samples.