Mining for Norms in Clouds: Complying to Ethical Communication through Cloud Text Data Mining

2012 IEEE Fifth International Conference on Utility and Cloud Computing Pub Date : 2012-11-05 DOI:10.1109/UCC.2012.59

Ahsan Nabi Khan, A. Muhammad, A. Enríquez

{"title":"Mining for Norms in Clouds: Complying to Ethical Communication through Cloud Text Data Mining","authors":"Ahsan Nabi Khan, A. Muhammad, A. Enríquez","doi":"10.1109/UCC.2012.59","DOIUrl":null,"url":null,"abstract":"As the world is realizing the power and efficiency of cloud computing, enhanced security and intelligence is needed in communication to filter out unethical data violating norms in clouds. No filtering categorization has been currently proposed. Numerous lists of banned, unethical and objectionable words have been developed with limited user satisfaction. Lists are usually manually generated, with some programmable extensibility for online forums and public newsgroups. We define a tool and methodology to categorize the censor data. We statistically grow words in the categorized data and tag the hidden neutral words with meaning in context. Using Computational Linguistics tools and modifying them to suit our means, we analyze sample text from gigabytes of email newsgroup dataset over Cloud Servers. A sample result dataset of the most frequently used words breaking the norms in recent cloud communication is presented in the results in broad categories. The categories separate cloud-server data found in newsgroups related to internet crimes, fraud, theft, anti-state elements, and other material of legal importance. Thus this study demonstrates a tag cloud of most frequent critical words in communications from legal and ethical point-of-view in the current scenario of cloud databases.","PeriodicalId":122639,"journal":{"name":"2012 IEEE Fifth International Conference on Utility and Cloud Computing","volume":"44 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE Fifth International Conference on Utility and Cloud Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/UCC.2012.59","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

As the world is realizing the power and efficiency of cloud computing, enhanced security and intelligence is needed in communication to filter out unethical data violating norms in clouds. No filtering categorization has been currently proposed. Numerous lists of banned, unethical and objectionable words have been developed with limited user satisfaction. Lists are usually manually generated, with some programmable extensibility for online forums and public newsgroups. We define a tool and methodology to categorize the censor data. We statistically grow words in the categorized data and tag the hidden neutral words with meaning in context. Using Computational Linguistics tools and modifying them to suit our means, we analyze sample text from gigabytes of email newsgroup dataset over Cloud Servers. A sample result dataset of the most frequently used words breaking the norms in recent cloud communication is presented in the results in broad categories. The categories separate cloud-server data found in newsgroups related to internet crimes, fraud, theft, anti-state elements, and other material of legal importance. Thus this study demonstrates a tag cloud of most frequent critical words in communications from legal and ethical point-of-view in the current scenario of cloud databases.

查看原文本刊更多论文

在云中挖掘规范:通过云文本数据挖掘遵循道德沟通

随着世界逐渐认识到云计算的力量和效率，需要加强通信的安全性和智能，以过滤掉云中违反规范的不道德数据。目前还没有提出过滤分类。许多被禁止的、不道德的和令人反感的词汇列表被开发出来，但用户满意度有限。列表通常是手动生成的，具有在线论坛和公共新闻组的一些可编程扩展性。我们定义了一个工具和方法来对审查数据进行分类。我们在分类数据中统计增长单词，并在上下文中标记隐藏的中性词。使用计算语言学工具并修改它们以适应我们的方法，我们在云服务器上分析来自千兆字节的电子邮件新闻组数据集的样本文本。在最近的云通信中，最常用的打破规范的单词的样本结果数据集在结果中以广泛的类别呈现。这些分类将在新闻组中发现的云服务器数据与网络犯罪、欺诈、盗窃、反国家元素和其他具有法律重要性的材料分开。因此，本研究从法律和伦理的角度展示了当前云数据库场景中通信中最常见的关键字的标签云。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2012 IEEE Fifth International Conference on Utility and Cloud Computing

自引率

0.00%

发文量