{"title":"Big Data Analytics in Cybersecurity: Network Data and Intrusion Prediction","authors":"Lidong Wang, Randy Jones","doi":"10.1109/UEMCON47517.2019.8993037","DOIUrl":null,"url":null,"abstract":"Intrusion detection of computer networks is an important issue in cybersecurity. Networks generate stream data which are big data and often lead to challenges in intrusion detection. The ‘Variety’ and ‘Veracity’ characteristics of big data in network data are studied using $R$ and its functions in this paper. The statistics, correlation, and association of variables in the spam email database ‘spambase’ are analysed. The clustering analysis based on k-means and principal component analysis for the data dimension reduction of the database are performed. Spam-email intrusion is predicted based on the Naïve Bayesian classification and deep learning, respectively. The analytics of missing values and missing data patterns in a large data set of ‘VAST 2013’ (with multiple data types and a huge volume of missing values) is conducted and its missing data patterns are obtained.","PeriodicalId":187022,"journal":{"name":"2019 IEEE 10th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 10th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/UEMCON47517.2019.8993037","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
Intrusion detection of computer networks is an important issue in cybersecurity. Networks generate stream data which are big data and often lead to challenges in intrusion detection. The ‘Variety’ and ‘Veracity’ characteristics of big data in network data are studied using $R$ and its functions in this paper. The statistics, correlation, and association of variables in the spam email database ‘spambase’ are analysed. The clustering analysis based on k-means and principal component analysis for the data dimension reduction of the database are performed. Spam-email intrusion is predicted based on the Naïve Bayesian classification and deep learning, respectively. The analytics of missing values and missing data patterns in a large data set of ‘VAST 2013’ (with multiple data types and a huge volume of missing values) is conducted and its missing data patterns are obtained.