{"title":"Mining for Norms in Clouds: Complying to Ethical Communication through Cloud Text Data Mining","authors":"Ahsan Nabi Khan, A. Muhammad, A. Enríquez","doi":"10.1109/UCC.2012.59","DOIUrl":null,"url":null,"abstract":"As the world is realizing the power and efficiency of cloud computing, enhanced security and intelligence is needed in communication to filter out unethical data violating norms in clouds. No filtering categorization has been currently proposed. Numerous lists of banned, unethical and objectionable words have been developed with limited user satisfaction. Lists are usually manually generated, with some programmable extensibility for online forums and public newsgroups. We define a tool and methodology to categorize the censor data. We statistically grow words in the categorized data and tag the hidden neutral words with meaning in context. Using Computational Linguistics tools and modifying them to suit our means, we analyze sample text from gigabytes of email newsgroup dataset over Cloud Servers. A sample result dataset of the most frequently used words breaking the norms in recent cloud communication is presented in the results in broad categories. The categories separate cloud-server data found in newsgroups related to internet crimes, fraud, theft, anti-state elements, and other material of legal importance. Thus this study demonstrates a tag cloud of most frequent critical words in communications from legal and ethical point-of-view in the current scenario of cloud databases.","PeriodicalId":122639,"journal":{"name":"2012 IEEE Fifth International Conference on Utility and Cloud Computing","volume":"44 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE Fifth International Conference on Utility and Cloud Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/UCC.2012.59","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
As the world is realizing the power and efficiency of cloud computing, enhanced security and intelligence is needed in communication to filter out unethical data violating norms in clouds. No filtering categorization has been currently proposed. Numerous lists of banned, unethical and objectionable words have been developed with limited user satisfaction. Lists are usually manually generated, with some programmable extensibility for online forums and public newsgroups. We define a tool and methodology to categorize the censor data. We statistically grow words in the categorized data and tag the hidden neutral words with meaning in context. Using Computational Linguistics tools and modifying them to suit our means, we analyze sample text from gigabytes of email newsgroup dataset over Cloud Servers. A sample result dataset of the most frequently used words breaking the norms in recent cloud communication is presented in the results in broad categories. The categories separate cloud-server data found in newsgroups related to internet crimes, fraud, theft, anti-state elements, and other material of legal importance. Thus this study demonstrates a tag cloud of most frequent critical words in communications from legal and ethical point-of-view in the current scenario of cloud databases.