定位问题:通过文本分析加强极端主义网络内容的分类

2016 IEEE International Conference on Cybercrime and Computer Forensic (ICCCF) Pub Date : 2016-11-17 DOI:10.1109/ICCCF.2016.7740431

G. Weir, Emanuel Dos Santos, B. Cartwright, Richard Frank

{"title":"定位问题:通过文本分析加强极端主义网络内容的分类","authors":"G. Weir, Emanuel Dos Santos, B. Cartwright, Richard Frank","doi":"10.1109/ICCCF.2016.7740431","DOIUrl":null,"url":null,"abstract":"Webpages with terrorist and extremist content are key factors in the recruitment and radicalization of disaffected young adults who may then engage in terrorist activities at home or fight alongside terrorist groups abroad. This paper reports on advances in techniques for classifying data collected by the Terrorism and Extremism Network Extractor (TENE) webcrawler, a custom-written program that browses the World Wide Web, collecting vast amounts of data, retrieving the pages it visits, analyzing them, and recursively following the links out of those pages. The textual content is subjected to enhanced classification through software analysis, using the Posit textual analysis toolset, generating a detailed frequency analysis of the syntax, including multi-word units and associated part-of-speech components. Results are then deployed in a knowledge extraction process using knowledge extraction algorithms, e.g., from the WEKA system. Indications are that the use of the data enrichment through application of Posit analysis affords a greater degree of match between automatic and manual classification than previously attained. Furthermore, the incorporation and deployment of these technologies promises to provide public safety officials with techniques that can help to detect terrorist webpages, gauge the intensity of their content, discriminate between webpages that do or do not require a concerted response, and take appropriate action where warranted.","PeriodicalId":281072,"journal":{"name":"2016 IEEE International Conference on Cybercrime and Computer Forensic (ICCCF)","volume":"94 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":"{\"title\":\"Positing the problem: enhancing classification of extremist web content through textual analysis\",\"authors\":\"G. Weir, Emanuel Dos Santos, B. Cartwright, Richard Frank\",\"doi\":\"10.1109/ICCCF.2016.7740431\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Webpages with terrorist and extremist content are key factors in the recruitment and radicalization of disaffected young adults who may then engage in terrorist activities at home or fight alongside terrorist groups abroad. This paper reports on advances in techniques for classifying data collected by the Terrorism and Extremism Network Extractor (TENE) webcrawler, a custom-written program that browses the World Wide Web, collecting vast amounts of data, retrieving the pages it visits, analyzing them, and recursively following the links out of those pages. The textual content is subjected to enhanced classification through software analysis, using the Posit textual analysis toolset, generating a detailed frequency analysis of the syntax, including multi-word units and associated part-of-speech components. Results are then deployed in a knowledge extraction process using knowledge extraction algorithms, e.g., from the WEKA system. Indications are that the use of the data enrichment through application of Posit analysis affords a greater degree of match between automatic and manual classification than previously attained. Furthermore, the incorporation and deployment of these technologies promises to provide public safety officials with techniques that can help to detect terrorist webpages, gauge the intensity of their content, discriminate between webpages that do or do not require a concerted response, and take appropriate action where warranted.\",\"PeriodicalId\":281072,\"journal\":{\"name\":\"2016 IEEE International Conference on Cybercrime and Computer Forensic (ICCCF)\",\"volume\":\"94 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-11-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"14\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 IEEE International Conference on Cybercrime and Computer Forensic (ICCCF)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCCF.2016.7740431\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE International Conference on Cybercrime and Computer Forensic (ICCCF)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCCF.2016.7740431","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 14

摘要

含有恐怖主义和极端主义内容的网页是招募心怀不满的年轻人并使其激进化的关键因素，这些年轻人随后可能在国内从事恐怖主义活动，或在国外与恐怖组织并肩作战。本文报告了恐怖主义和极端主义网络提取器(TENE)网络爬虫收集的数据分类技术的进展，TENE是一个自定义编写的程序，它浏览万维网，收集大量数据，检索它访问的页面，分析它们，并递归地跟踪这些页面的链接。文本内容通过软件分析进行增强分类，使用Posit文本分析工具集，生成语法的详细频率分析，包括多词单位和相关的词性成分。然后使用知识提取算法(例如来自WEKA系统)将结果部署到知识提取过程中。有迹象表明，通过应用Posit分析来丰富数据，使自动分类和人工分类之间的匹配程度比以前更高。此外，这些技术的整合和部署承诺为公共安全官员提供技术，可以帮助检测恐怖主义网页，衡量其内容的强度，区分需要或不需要协调响应的网页，并在必要时采取适当的行动。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Positing the problem: enhancing classification of extremist web content through textual analysis

Webpages with terrorist and extremist content are key factors in the recruitment and radicalization of disaffected young adults who may then engage in terrorist activities at home or fight alongside terrorist groups abroad. This paper reports on advances in techniques for classifying data collected by the Terrorism and Extremism Network Extractor (TENE) webcrawler, a custom-written program that browses the World Wide Web, collecting vast amounts of data, retrieving the pages it visits, analyzing them, and recursively following the links out of those pages. The textual content is subjected to enhanced classification through software analysis, using the Posit textual analysis toolset, generating a detailed frequency analysis of the syntax, including multi-word units and associated part-of-speech components. Results are then deployed in a knowledge extraction process using knowledge extraction algorithms, e.g., from the WEKA system. Indications are that the use of the data enrichment through application of Posit analysis affords a greater degree of match between automatic and manual classification than previously attained. Furthermore, the incorporation and deployment of these technologies promises to provide public safety officials with techniques that can help to detect terrorist webpages, gauge the intensity of their content, discriminate between webpages that do or do not require a concerted response, and take appropriate action where warranted.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2016 IEEE International Conference on Cybercrime and Computer Forensic (ICCCF)

自引率

0.00%

发文量