{"title":"Topic detections in Arabic Dark websites using improved Vector Space Model","authors":"H. Alghamdi, Ali Selamat","doi":"10.1109/DMO.2012.6329790","DOIUrl":null,"url":null,"abstract":"Terrorist group's forums remain a threat for all web users. It stills need to be inspired with algorithms to detect the informative contents. In this paper, we investigate most discussed topics on Arabic Dark Web forums. Arabic Textual contents extracted from selected Arabic Dark Web forums. Vector Space Model (VSM) used as text representation with two different term weighing schemas, Term Frequency (TF) and Term Frequency - Inverse Document Frequency (TF-IDF). Pre-processing phase plays a significant role in processing extracted terms. That consists of filtering, tokenization and stemming. Stemming step is based on proposed stemmer without a root dictionary. Using one of the well-know clustering algorithm k-means to cluster of the terms. The experimental results were presented and showed the most shared terms between the selected forums.","PeriodicalId":330241,"journal":{"name":"2012 4th Conference on Data Mining and Optimization (DMO)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"28","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 4th Conference on Data Mining and Optimization (DMO)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DMO.2012.6329790","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 28
Abstract
Terrorist group's forums remain a threat for all web users. It stills need to be inspired with algorithms to detect the informative contents. In this paper, we investigate most discussed topics on Arabic Dark Web forums. Arabic Textual contents extracted from selected Arabic Dark Web forums. Vector Space Model (VSM) used as text representation with two different term weighing schemas, Term Frequency (TF) and Term Frequency - Inverse Document Frequency (TF-IDF). Pre-processing phase plays a significant role in processing extracted terms. That consists of filtering, tokenization and stemming. Stemming step is based on proposed stemmer without a root dictionary. Using one of the well-know clustering algorithm k-means to cluster of the terms. The experimental results were presented and showed the most shared terms between the selected forums.