T. Yoon, Sun-young Park, Woo-Keun Chung, Hwan-Gue Cho
{"title":"Heuristic Methods for Filtering Newly Coined Profanities Using Phylogenetic Analysis","authors":"T. Yoon, Sun-young Park, Woo-Keun Chung, Hwan-Gue Cho","doi":"10.1109/CYBERC.2010.70","DOIUrl":null,"url":null,"abstract":"We proposed a smart filtering system for newly coined profanities, using approximate string searching and sequence alignment. However there are a lot of coined profanities. For example, game portal Nexon has a forbidden word list of 60,000 words, so even our system still requires too much computational time for application to a real-time chat system. Therefore we need to manage a profanity database, discard redundancy and divide the elements into several groups by priority. In this paper, we propose a management algorithm for a profanity database. We use phylogenetic analysis, make evolution trees and classify profanities. We compare input words and a root of a group. We decrease the elements of the database from 6302 to 2229.","PeriodicalId":315132,"journal":{"name":"2010 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery","volume":"344 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CYBERC.2010.70","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
We proposed a smart filtering system for newly coined profanities, using approximate string searching and sequence alignment. However there are a lot of coined profanities. For example, game portal Nexon has a forbidden word list of 60,000 words, so even our system still requires too much computational time for application to a real-time chat system. Therefore we need to manage a profanity database, discard redundancy and divide the elements into several groups by priority. In this paper, we propose a management algorithm for a profanity database. We use phylogenetic analysis, make evolution trees and classify profanities. We compare input words and a root of a group. We decrease the elements of the database from 6302 to 2229.