微博的码字检测——基于两种语料库用词差异的研究

2020 International Conference on Computing, Electronics & Communications Engineering (iCCECE) Pub Date : 2020-08-17 DOI:10.1109/iCCECE49321.2020.9231109

Takuro Hada, Y. Sei, Yasuyuki Tahara, Akihiko Ohsuga

{"title":"微博的码字检测——基于两种语料库用词差异的研究","authors":"Takuro Hada, Y. Sei, Yasuyuki Tahara, Akihiko Ohsuga","doi":"10.1109/iCCECE49321.2020.9231109","DOIUrl":null,"url":null,"abstract":"In recent years, drug trafficking using microblogs has risen and become a social problem. A common method of cyber patrols for cracking down on crimes, such as drug trafficking, involves searching for crime-related keywords. However, criminals who post crime-inducing messages make maximum use of \"codewords\" rather than keywords, such as enjo kosai, marijuana, and methamphetamine, to camouflage their criminal intentions. Research suggests that these codewords change once they become popular; therefore, searching for a specific word requires significant effort to keep track of the latest codewords. In this study, we focused on the appearance of codewords and those likely to be included in incriminating posts with aim to detect codewords with the high likelihood of inclusion in incriminating posts. We proposed new methods for detecting codewords based on differences in word usage and conducted experiments on concealed-word detection in order to evaluate method effectiveness. The results showed that the proposed method was capable of detecting concealed words other than those in the initial list and to better degree relative to baseline methods. These findings demonstrated the ability of the proposed method to rapidly and automatically detect codewords that change over time and blog posts that induce crimes, thereby potentially reducing the burden of continuous monitoring of codewords.","PeriodicalId":413847,"journal":{"name":"2020 International Conference on Computing, Electronics & Communications Engineering (iCCECE)","volume":"203 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Codewords Detection in Microblogs Focusing on Differences in Word Use Between Two Corpora\",\"authors\":\"Takuro Hada, Y. Sei, Yasuyuki Tahara, Akihiko Ohsuga\",\"doi\":\"10.1109/iCCECE49321.2020.9231109\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In recent years, drug trafficking using microblogs has risen and become a social problem. A common method of cyber patrols for cracking down on crimes, such as drug trafficking, involves searching for crime-related keywords. However, criminals who post crime-inducing messages make maximum use of \\\"codewords\\\" rather than keywords, such as enjo kosai, marijuana, and methamphetamine, to camouflage their criminal intentions. Research suggests that these codewords change once they become popular; therefore, searching for a specific word requires significant effort to keep track of the latest codewords. In this study, we focused on the appearance of codewords and those likely to be included in incriminating posts with aim to detect codewords with the high likelihood of inclusion in incriminating posts. We proposed new methods for detecting codewords based on differences in word usage and conducted experiments on concealed-word detection in order to evaluate method effectiveness. The results showed that the proposed method was capable of detecting concealed words other than those in the initial list and to better degree relative to baseline methods. These findings demonstrated the ability of the proposed method to rapidly and automatically detect codewords that change over time and blog posts that induce crimes, thereby potentially reducing the burden of continuous monitoring of codewords.\",\"PeriodicalId\":413847,\"journal\":{\"name\":\"2020 International Conference on Computing, Electronics & Communications Engineering (iCCECE)\",\"volume\":\"203 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-08-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 International Conference on Computing, Electronics & Communications Engineering (iCCECE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/iCCECE49321.2020.9231109\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Conference on Computing, Electronics & Communications Engineering (iCCECE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/iCCECE49321.2020.9231109","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

摘要

近年来，利用微博贩卖毒品的现象越来越多，已经成为一个社会问题。打击毒品走私等犯罪的网络巡逻常用方法是搜索与犯罪相关的关键词。然而，犯罪分子在发布诱导犯罪的信息时，最大限度地使用“暗语”，而不是关键词，如“恩条kosai”、“大麻”、“甲基苯丙胺”等，来掩饰犯罪意图。研究表明，这些码字一旦流行起来，就会发生变化;因此，搜索一个特定的单词需要花费大量的精力来跟踪最新的码字。在本研究中，我们将重点放在码字的外观和那些可能被包含在犯罪帖子中的码字上，目的是检测那些可能被包含在犯罪帖子中的码字。我们提出了基于词使用差异的码字检测新方法，并进行了隐藏词检测实验，以评估方法的有效性。实验结果表明，该方法能够检测出初始列表之外的隐藏词，且检测程度优于基线方法。这些发现表明，所提出的方法能够快速、自动地检测随时间变化的码字和诱发犯罪的博客文章，从而有可能减少持续监控码字的负担。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Codewords Detection in Microblogs Focusing on Differences in Word Use Between Two Corpora

In recent years, drug trafficking using microblogs has risen and become a social problem. A common method of cyber patrols for cracking down on crimes, such as drug trafficking, involves searching for crime-related keywords. However, criminals who post crime-inducing messages make maximum use of "codewords" rather than keywords, such as enjo kosai, marijuana, and methamphetamine, to camouflage their criminal intentions. Research suggests that these codewords change once they become popular; therefore, searching for a specific word requires significant effort to keep track of the latest codewords. In this study, we focused on the appearance of codewords and those likely to be included in incriminating posts with aim to detect codewords with the high likelihood of inclusion in incriminating posts. We proposed new methods for detecting codewords based on differences in word usage and conducted experiments on concealed-word detection in order to evaluate method effectiveness. The results showed that the proposed method was capable of detecting concealed words other than those in the initial list and to better degree relative to baseline methods. These findings demonstrated the ability of the proposed method to rapidly and automatically detect codewords that change over time and blog posts that induce crimes, thereby potentially reducing the burden of continuous monitoring of codewords.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 International Conference on Computing, Electronics & Communications Engineering (iCCECE)

自引率

0.00%

发文量