{"title":"Text Analysis for Honeypot Misuse Inference","authors":"Toivo Herman Kamati, D. Jat, Saurabh Chamotra","doi":"10.1109/CITISIA50690.2020.9371771","DOIUrl":null,"url":null,"abstract":"Transformation of raw text is required for computational text analysis using Natural Language Processing methods. Computational text analysis leverage on human brain limitations to automatically index documents for retrieval and topic generation for topic distribution correlations in corpus of voluminous documents. Natural language non-parametric and parametric Topic modeling with Expectancy Maximization and Gibbs sampling render technique to build Machine Learning models for evaluation with log-likelihood, topic coherence and coefficient of determination of held-out document. This research extends the concept of Natural Language Processing to automate analysis of High interaction honeypot system call documents to deduce system resources misuse by malcode during real-time engagement with the user-space applications of the deployed honeypot.","PeriodicalId":145272,"journal":{"name":"2020 5th International Conference on Innovative Technologies in Intelligent Systems and Industrial Applications (CITISIA)","volume":"1995 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 5th International Conference on Innovative Technologies in Intelligent Systems and Industrial Applications (CITISIA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CITISIA50690.2020.9371771","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Transformation of raw text is required for computational text analysis using Natural Language Processing methods. Computational text analysis leverage on human brain limitations to automatically index documents for retrieval and topic generation for topic distribution correlations in corpus of voluminous documents. Natural language non-parametric and parametric Topic modeling with Expectancy Maximization and Gibbs sampling render technique to build Machine Learning models for evaluation with log-likelihood, topic coherence and coefficient of determination of held-out document. This research extends the concept of Natural Language Processing to automate analysis of High interaction honeypot system call documents to deduce system resources misuse by malcode during real-time engagement with the user-space applications of the deployed honeypot.