基于文本挖掘的社会网络犯罪活动检测:综合分析

2019 4th International Conference on Information Systems and Computer Networks (ISCON) Pub Date : 2019-11-01 DOI:10.1109/ISCON47742.2019.9036157

Tamanna Siddiqui, A. Amer, N. A. Khan

{"title":"基于文本挖掘的社会网络犯罪活动检测:综合分析","authors":"Tamanna Siddiqui, A. Amer, N. A. Khan","doi":"10.1109/ISCON47742.2019.9036157","DOIUrl":null,"url":null,"abstract":"Criminal activity detection in social network by text mining is the process of finding criminal activity by the criminals and help law text mining technique, the ability to detect hidden text from corpus documents. Text mining is process of transforming data from unstructured text to structured text which is easily perceived and processed by humans, but hard for machines to understand without designing algorithms, tools and methods in order to effectively process, such enforcing agencies to keep control of the prevailing crimes Text mining is method deriving high-quality information from raw data through the pattern devising and statistical pattern learning. Text mining is field a multidisciplinary field that relies on data mining, information retrieval, statistics, machine learning, and computational linguistics. The main thing in text mining process of analyzing and exploring is natural language processing, information retrieval, information extraction, content analysis, text clustering, and text classification. All that processes are wanted after you complete a step, the preprocess task. The importance of pre-processing task is to reduce the volume of the corpus textual documents and the tasks involved in that step are text boundary determinant, natural language specific stemming stop-word, elimination, and tokenization to remove unwanted data and handling missing data. Among this, doing the most important work is tokenization. Tokenization assist to divide the text data to individual words, open source tools become available for those interested such as spacy, NLTK with python, Gensim and many other. After that define model architecture to fit the model on the training data and evaluate this model on test simple data in order to predict values.","PeriodicalId":124412,"journal":{"name":"2019 4th International Conference on Information Systems and Computer Networks (ISCON)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"Criminal Activity Detection in Social Network by Text Mining: Comprehensive Analysis\",\"authors\":\"Tamanna Siddiqui, A. Amer, N. A. Khan\",\"doi\":\"10.1109/ISCON47742.2019.9036157\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Criminal activity detection in social network by text mining is the process of finding criminal activity by the criminals and help law text mining technique, the ability to detect hidden text from corpus documents. Text mining is process of transforming data from unstructured text to structured text which is easily perceived and processed by humans, but hard for machines to understand without designing algorithms, tools and methods in order to effectively process, such enforcing agencies to keep control of the prevailing crimes Text mining is method deriving high-quality information from raw data through the pattern devising and statistical pattern learning. Text mining is field a multidisciplinary field that relies on data mining, information retrieval, statistics, machine learning, and computational linguistics. The main thing in text mining process of analyzing and exploring is natural language processing, information retrieval, information extraction, content analysis, text clustering, and text classification. All that processes are wanted after you complete a step, the preprocess task. The importance of pre-processing task is to reduce the volume of the corpus textual documents and the tasks involved in that step are text boundary determinant, natural language specific stemming stop-word, elimination, and tokenization to remove unwanted data and handling missing data. Among this, doing the most important work is tokenization. Tokenization assist to divide the text data to individual words, open source tools become available for those interested such as spacy, NLTK with python, Gensim and many other. After that define model architecture to fit the model on the training data and evaluate this model on test simple data in order to predict values.\",\"PeriodicalId\":124412,\"journal\":{\"name\":\"2019 4th International Conference on Information Systems and Computer Networks (ISCON)\",\"volume\":\"19 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 4th International Conference on Information Systems and Computer Networks (ISCON)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISCON47742.2019.9036157\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 4th International Conference on Information Systems and Computer Networks (ISCON)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCON47742.2019.9036157","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

摘要

基于文本挖掘的社交网络犯罪活动检测是指利用文本挖掘技术发现犯罪分子的犯罪活动并帮助法律，从语料库文档中发现隐藏文本的能力。文本挖掘是将数据从非结构化文本转换为结构化文本的过程，这种结构化文本易于被人类感知和处理，但机器很难理解，而不需要设计算法、工具和方法来有效地处理，以便执法机构控制流行的犯罪行为。文本挖掘是通过模式设计和统计模式学习从原始数据中获得高质量信息的方法。文本挖掘是一个多学科领域，它依赖于数据挖掘、信息检索、统计学、机器学习和计算语言学。文本挖掘分析和探索的过程主要是自然语言处理、信息检索、信息提取、内容分析、文本聚类和文本分类。在完成一个步骤(预处理任务)之后，需要所有这些进程。预处理任务的重要性在于减少语料库文本文档的数量，该步骤涉及的任务包括文本边界确定、自然语言特定的词干停止词、消除和标记化，以去除不需要的数据和处理缺失的数据。其中，做最重要的工作是代币化。标记化有助于将文本数据划分为单个单词，对于那些感兴趣的人来说，可以使用开源工具，例如space, python的NLTK, Gensim和许多其他工具。然后定义模型体系结构，将模型拟合到训练数据上，并在测试简单数据上对模型进行评估，从而进行预测。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Criminal Activity Detection in Social Network by Text Mining: Comprehensive Analysis

Criminal activity detection in social network by text mining is the process of finding criminal activity by the criminals and help law text mining technique, the ability to detect hidden text from corpus documents. Text mining is process of transforming data from unstructured text to structured text which is easily perceived and processed by humans, but hard for machines to understand without designing algorithms, tools and methods in order to effectively process, such enforcing agencies to keep control of the prevailing crimes Text mining is method deriving high-quality information from raw data through the pattern devising and statistical pattern learning. Text mining is field a multidisciplinary field that relies on data mining, information retrieval, statistics, machine learning, and computational linguistics. The main thing in text mining process of analyzing and exploring is natural language processing, information retrieval, information extraction, content analysis, text clustering, and text classification. All that processes are wanted after you complete a step, the preprocess task. The importance of pre-processing task is to reduce the volume of the corpus textual documents and the tasks involved in that step are text boundary determinant, natural language specific stemming stop-word, elimination, and tokenization to remove unwanted data and handling missing data. Among this, doing the most important work is tokenization. Tokenization assist to divide the text data to individual words, open source tools become available for those interested such as spacy, NLTK with python, Gensim and many other. After that define model architecture to fit the model on the training data and evaluate this model on test simple data in order to predict values.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 4th International Conference on Information Systems and Computer Networks (ISCON)

自引率

0.00%

发文量