WebQuality '12Pub Date : 2012-04-16DOI: 10.1145/2184305.2184314
Dávid Siklósi, B. Daróczy, A. Benczúr
{"title":"Content-based trust and bias classification via biclustering","authors":"Dávid Siklósi, B. Daróczy, A. Benczúr","doi":"10.1145/2184305.2184314","DOIUrl":"https://doi.org/10.1145/2184305.2184314","url":null,"abstract":"In this paper we improve trust, bias and factuality classification over Web data on the domain level. Unlike the majority of literature in this area that aims at extracting opinion and handling short text on the micro level, we aim to aid a researcher or an archivist in obtaining a large collection that, on the high level, originates from unbiased and trustworthy sources. Our method generates features as Jensen-Shannon distances from centers in a host-term biclustering. On top of the distance features, we apply kernel methods and also combine with baseline text classifiers. We test our method on the ECML/PKDD Discovery Challenge data set DC2010. Our method improves over the best achieved text classification NDCG results by over 3--10% for neutrality, bias and trustworthiness. The fact that the ECML/PKDD Discovery Challenge 2010 participants reached an AUC only slightly above 0.5 indicates the hardness of the task.","PeriodicalId":230983,"journal":{"name":"WebQuality '12","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116065133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
WebQuality '12Pub Date : 2012-04-16DOI: 10.1145/2184305.2184309
Maik Anderka, Benno Stein
{"title":"A breakdown of quality flaws in Wikipedia","authors":"Maik Anderka, Benno Stein","doi":"10.1145/2184305.2184309","DOIUrl":"https://doi.org/10.1145/2184305.2184309","url":null,"abstract":"The online encyclopedia Wikipedia is a successful example of the increasing popularity of user generated content on the Web. Despite its success, Wikipedia is often criticized for containing low-quality information, which is mainly attributed to its core policy of being open for editing by everyone. The identification of low-quality information is an important task since Wikipedia has become the primary source of knowledge for a huge number of people around the world. Previous research on quality assessment in Wikipedia either investigates only small samples of articles, or else focuses on single quality aspects, like accuracy or formality. This paper targets the investigation of quality flaws, and presents the first complete breakdown of Wikipedia's quality flaw structure. We conduct an extensive exploratory analysis, which reveals (1) the quality flaws that actually exist, (2) the distribution of flaws in Wikipedia, and (3) the extent of flawed content. An important finding is that more than one in four English Wikipedia articles contains at least one quality flaw, 70% of which concern article verifiability.","PeriodicalId":230983,"journal":{"name":"WebQuality '12","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114682698","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
WebQuality '12Pub Date : 2012-04-16DOI: 10.1145/2184305.2184307
R. Baeza-Yates, Luz Rello
{"title":"On measuring the lexical quality of the web","authors":"R. Baeza-Yates, Luz Rello","doi":"10.1145/2184305.2184307","DOIUrl":"https://doi.org/10.1145/2184305.2184307","url":null,"abstract":"In this paper we propose a measure for estimating the lexical quality of the Web, that is, the representational aspect of the textual web content. Our lexical quality measure is based in a small corpus of spelling errors and we apply it to English and Spanish. We first compute the correlation of our measure with web popularity measures to show that gives independent information and then we apply it to different web segments, including social media. Our results shed a light on the lexical quality of the Web and show that authoritative websites have several orders of magnitude less misspellings than the overall Web. We also present an analysis of the geographical distribution of lexical quality throughout English and Spanish speaking countries as well as how this measure changes in about one year.","PeriodicalId":230983,"journal":{"name":"WebQuality '12","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116736561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
WebQuality '12Pub Date : 2012-04-16DOI: 10.1145/2184305.2184318
Tien Le, A. Dua, Wu-chang Feng
{"title":"kaPoW plugins: protecting web applications using reputation-based proof-of-work","authors":"Tien Le, A. Dua, Wu-chang Feng","doi":"10.1145/2184305.2184318","DOIUrl":"https://doi.org/10.1145/2184305.2184318","url":null,"abstract":"Comment spam is a fact of life if you have a blog or forum. Tools like Akismet and CAPTCHA help prevent spam in applications like WordPress or phpBB. However, they are not devoid of shortcomings. CAPTCHAs are getting easier to solve by automated adversaries like bots and pose usability issues. Akismet strives to detect spam, but can't do much to reduce it. This paper presents the kaPoW plugin and reputation service that can complement existing antispam tools. kaPoW creates disincentives for sending spam by slowing down spammers. It uses a web-based proof-of-work approach wherein a client is given a computational puzzle to solve before accessing a service (e.g. comment posting). The idea is to set puzzle difficulties based on a client's reputation, thereby, issuing \"harder\" puzzles to spammers. The more time spammers solve puzzles, the less time they have to send spam. Unlike CAPTCHAs, kaPoW requires no additional user interaction since all the puzzles are issued and solved in software. kaPoW can be used by any web application that supports an extension framework (e.g. plugins) and is concerned about spam.","PeriodicalId":230983,"journal":{"name":"WebQuality '12","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132381234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
WebQuality '12Pub Date : 2012-04-16DOI: 10.1145/2184305.2184310
D. Kitayama, K. Sumiya
{"title":"A deformation analysis method for artificial maps based on geographical accuracy and its applications","authors":"D. Kitayama, K. Sumiya","doi":"10.1145/2184305.2184310","DOIUrl":"https://doi.org/10.1145/2184305.2184310","url":null,"abstract":"Artificial maps are widely used for a variety of purposes, including as tourist guides to help people find geographical objects using simple figures. We aim to develop an editing system and a navigation system for artificial maps. Artificial maps made for tourists show suitable objects for traveling users. Therefore, if the artificial map has a navigation system, users can get geographical information such as object positions and routes without performing any operations. However, artificial maps might contain incorrect or superfluous information, such as some objects on the map being intentionally enlarged or omitted. For developing the system, there are two problems: 1. extraction of geographical information from the raster graphics of the artificial map and 2. revision of inaccurate geographical information on the artificial map. We propose a deformation-analyzing method based on geographical accuracy using optical character recognition techniques and comparing gazetteer information. That is, our proposed method detects the tolerance level for deformation according to the purpose of the artificial map. Then, we detect a certain position on the artificial map using deformation analysis. In this paper, we develop a prototype system and we evaluate the accuracy of extracting information from the artificial map and detecting positions.","PeriodicalId":230983,"journal":{"name":"WebQuality '12","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124089540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
WebQuality '12Pub Date : 2012-04-16DOI: 10.1145/2184305.2184312
Thanasis G. Papaioannou, K. Aberer, Katarzyna Abramczuk, P. Adamska, A. Wierzbicki
{"title":"Game-theoretic models of web credibility","authors":"Thanasis G. Papaioannou, K. Aberer, Katarzyna Abramczuk, P. Adamska, A. Wierzbicki","doi":"10.1145/2184305.2184312","DOIUrl":"https://doi.org/10.1145/2184305.2184312","url":null,"abstract":"Research on Web credibility assessment can significantly benefit from new models that are better suited for evaluation and study of adversary strategies. Currently employed models lack several important aspects, such as the explicit modeling of Web content properties (e.g. presentation quality), the user economic incentives and assessment capabilities. In this paper, we introduce a new, game-theoretic model of credibility, referred to as the Credibility Game. We perform equilibrium and stability analysis of a simple variant of the game and then study it as a signaling game against naïve and expert information consumers. By a generic economic model of the player payoffs, we study, via simulation experiments, more complex variants of the Credibility Game and demonstrate the effect of consumer expertise and of the signal for credibility evaluation on the evolutionary stable strategies of the information producers and consumers.","PeriodicalId":230983,"journal":{"name":"WebQuality '12","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134313741","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
WebQuality '12Pub Date : 2012-04-16DOI: 10.1145/2184305.2184308
E. Lex, Michael Völske, M. Errecalde, Edgardo Ferretti, L. Cagnina, Christopher Horn, Benno Stein, M. Granitzer
{"title":"Measuring the quality of web content using factual information","authors":"E. Lex, Michael Völske, M. Errecalde, Edgardo Ferretti, L. Cagnina, Christopher Horn, Benno Stein, M. Granitzer","doi":"10.1145/2184305.2184308","DOIUrl":"https://doi.org/10.1145/2184305.2184308","url":null,"abstract":"Nowadays, many decisions are based on information found in the Web. For the most part, the disseminating sources are not certified, and hence an assessment of the quality and credibility of Web content became more important than ever. With factual density we present a simple statistical quality measure that is based on facts extracted from Web content using Open Information Extraction. In a first case study, we use this measure to identify featured/good articles in Wikipedia. We compare the factual density measure with word count, a measure that has successfully been applied to this task in the past. Our evaluation corroborates the good performance of word count in Wikipedia since featured/good articles are often longer than non-featured. However, for articles of similar lengths the word count measure fails while factual density can separate between them with an F-measure of 90.4%. We also investigate the use of relational features for categorizing Wikipedia articles into featured/good versus non-featured ones. If articles have similar lengths, we achieve an F-measure of 86.7% and 84% otherwise.","PeriodicalId":230983,"journal":{"name":"WebQuality '12","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121542481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An information theoretic approach to sentiment polarity classification","authors":"Yuming Lin, Jingwei Zhang, Xiaoling Wang, Aoying Zhou","doi":"10.1145/2184305.2184313","DOIUrl":"https://doi.org/10.1145/2184305.2184313","url":null,"abstract":"Sentiment classification is a task of classifying documents according to their overall sentiment inclination. It is very important and popular in many web applications, such as credibility analysis of news sites on the Web, recommendation system and mining online discussion. Vector space model is widely applied on modeling documents in supervised sentiment classification, in which the feature presentation (including features type and weight function) is crucial for classification accuracy. The traditional feature presentation methods of text categorization do not perform well in sentiment classification, because the expressing manners of sentiment are more subtle. We analyze the relationships of terms with sentiment labels based on information theory, and propose a method by applying information theoretic approach on sentiment classification of documents. In this paper, we adopt mutual information on quantifying the sentiment polarities of terms in a document firstly. Then the terms are weighted in vector space based on both sentiment scores and contribution to the document. We perform extensive experiments with SVM on the sets of multiple product reviews, and the experimental results show our approach is more effective than the traditional ones.","PeriodicalId":230983,"journal":{"name":"WebQuality '12","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122283004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
WebQuality '12Pub Date : 2012-04-16DOI: 10.1145/2184305.2184316
Kyumin Lee, James Caverlee, K. Kamath, Zhiyuan Cheng
{"title":"Detecting collective attention spam","authors":"Kyumin Lee, James Caverlee, K. Kamath, Zhiyuan Cheng","doi":"10.1145/2184305.2184316","DOIUrl":"https://doi.org/10.1145/2184305.2184316","url":null,"abstract":"We examine the problem of collective attention spam, in which spammers target social media where user attention quickly coalesces and then collectively focuses around a phenomenon. Compared to many existing spam types, collective attention spam relies on the users themselves to seek out the content -- like breaking news, viral videos, and popular memes -- where the spam will be encountered, potentially increasing its effectiveness and reach. We study the presence of collective attention spam in one popular service, Twitter, and we develop spam classifiers to detect spam messages generated by collective attention spammers. Since many instances of collective attention are bursty and unexpected, it is difficult to build spam detectors to pre-screen them before they arise; hence, we examine the effectiveness of quickly learning a classifier based on the first moments of a bursting phenomenon. Through initial experiments over a small set of trending topics on Twitter, we find encouraging results, suggesting that collective attention spam may be identified early in its life cycle and shielded from the view of unsuspecting social media users.","PeriodicalId":230983,"journal":{"name":"WebQuality '12","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122097520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
WebQuality '12Pub Date : 2012-04-16DOI: 10.1145/2184305.2184317
Rishi Chandy, Haijie Gu
{"title":"Identifying spam in the iOS app store","authors":"Rishi Chandy, Haijie Gu","doi":"10.1145/2184305.2184317","DOIUrl":"https://doi.org/10.1145/2184305.2184317","url":null,"abstract":"Popular apps on the Apple iOS App Store can generate millions of dollars in profit and collect valuable personal user information. Fraudulent reviews could deceive users into downloading potentially harmful spam apps or unfairly ignoring apps that are victims of review spam. Thus, automatically identifying spam in the App Store is an important problem. This paper aims to introduce and characterize novel datasets acquired through crawling the iOS App Store, compare a baseline Decision Tree model with a novel Latent Class graphical model for classification of app spam, and analyze preliminary results for clustering reviews.","PeriodicalId":230983,"journal":{"name":"WebQuality '12","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115994130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}