J. Web Sci.Pub Date : 2016-04-24DOI: 10.1561/106.00000006
E. Cradock, D. Millard, Sophie Stalla-Bourdillon
{"title":"An Extended Investigation of the Similarity Between Privacy Policies of Social Networking Sites as a Precursor for Standardization","authors":"E. Cradock, D. Millard, Sophie Stalla-Bourdillon","doi":"10.1561/106.00000006","DOIUrl":"https://doi.org/10.1561/106.00000006","url":null,"abstract":"An Extended Investigation of the Similarity Between Privacy Policies of Social Networking Sites as a Precursor for Standardization","PeriodicalId":405637,"journal":{"name":"J. Web Sci.","volume":"110 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130835529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Web Sci.Pub Date : 2016-02-18DOI: 10.1561/106.00000004
C. Trattner, Dominik Kowald, Paul Seitlinger, Tobias Ley, Simone Kopeinik
{"title":"Modeling Activation Processes in Human Memory to Predict the Use of Tags in Social Bookmarking Systems","authors":"C. Trattner, Dominik Kowald, Paul Seitlinger, Tobias Ley, Simone Kopeinik","doi":"10.1561/106.00000004","DOIUrl":"https://doi.org/10.1561/106.00000004","url":null,"abstract":"Modeling Activation Processes in Human Memory to Predict the Use of Tags in Social Bookmarking Systems","PeriodicalId":405637,"journal":{"name":"J. Web Sci.","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116451579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Web Sci.Pub Date : 2015-08-04DOI: 10.1561/106.00000001
G. Gkotsis, Maria Liakata, K. Stepanyan, J. Domingue
{"title":"ACQUA: Automated Community-based Question Answering through the Discretisation of Shallow Linguistic Features","authors":"G. Gkotsis, Maria Liakata, K. Stepanyan, J. Domingue","doi":"10.1561/106.00000001","DOIUrl":"https://doi.org/10.1561/106.00000001","url":null,"abstract":"This paper addresses the problem of determining the best answer in Community-based Question Answering (CQA) websites by focussing on the content. In particular, we present a novel system, ACQUA (http://acqua.kmi.open.ac.uk), that can be installed onto the majority of browsers as a plugin. The service offers a seamless and accurate prediction of the answer to be accepted. Our system is based on a novel approach for processing answers in CQAs. Previous research on this topic relies on the exploitation of community feedback on the answers, which involves rating of either users (e.g., reputation) or answers (e.g. scores manually assigned to answers). We propose a new technique that leverages the content/textual features of answers in a novel way. Our approach delivers better results than related linguistics-based solutions and manages to match rating-based approaches. More specifically, the gain in performance is achieved by rendering the values of these features into a discretised form. We also show how our technique manages to deliver equally good results in real-time settings, as opposed to having to rely on information not always readily available, such as user ratings and answer scores. We ran an evaluation on 21 StackExchange websites covering around 4 million questions and more than 8 million answers. We obtain 84% average precision and 70% recall, which shows that our technique is robust, effective, and widely applicable.","PeriodicalId":405637,"journal":{"name":"J. Web Sci.","volume":"146 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133426075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Web Sci.Pub Date : 2015-07-13DOI: 10.1561/106.00000003
R. Meusel, S. Vigna, O. Lehmberg, Christian Bizer
{"title":"The Graph Structure in the Web - Analyzed on Different Aggregation Levels","authors":"R. Meusel, S. Vigna, O. Lehmberg, Christian Bizer","doi":"10.1561/106.00000003","DOIUrl":"https://doi.org/10.1561/106.00000003","url":null,"abstract":"Knowledge about the general graph structure of theWorldWideWeb is important for understanding the social mechanisms that govern its growth, for designing ranking methods, for devising better crawling algorithms, and for creating accurate models of its structure. In this paper, we analyze a large web graph. The graph was extracted from a large publicly accessible web crawl that was gathered by the Common Crawl Foundation in 2012. The graph covers over 3:5 billion web pages and 128:7 billion hyperlinks. We analyze and compare, among other features, degree distributions, connectivity, average distances, and the structure of weakly/strongly connected components. We conduct our analysis on three different levels of aggregation: page, host, and pay-level domain (PLD) (one “dot level” above public suffixes). Our analysis shows that, as evidenced by previous research (Serrano et al., 2007), some of the features previously observed by Broder et al., 2000 are very dependent on artifacts of the crawling process, whereas other appear to be more structural. We confirm the existence of a giant strongly connected component; we however find, as observed by other researchers (Donato et al., 2005; Boldi et al., 2002; Baeza-Yates and Poblete, 2003), very different proportions of nodes that can reach or that can be reached from the giant component, suggesting that the “bow-tie structure” as described by Broder et al. is strongly dependent on the crawling process, and to the best of our current knowledge is not a structural property of the Web. More importantly, statistical testing and visual inspection of size-rank plots show that the distributions of indegree, outdegree and sizes of strongly connected components of the page and host graph are not power laws, contrarily to what was previously reported for much smaller crawls, although they might be heavy tailed. If we aggregate at pay-level domain, however, a power law emerges. We also provide for the first time accurate measurement of distance-based features, using recently introduced algorithms that scale to the size of our crawl (Boldi and Vigna, 2013).","PeriodicalId":405637,"journal":{"name":"J. Web Sci.","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130940362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Web Sci.Pub Date : 2015-02-09DOI: 10.1561/106.00000002
Jared Lorince, S. Zorowitz, J. Murdock, P. Todd
{"title":"The Wisdom of the Few? \"Supertaggers\" in Collaborative Tagging Systems","authors":"Jared Lorince, S. Zorowitz, J. Murdock, P. Todd","doi":"10.1561/106.00000002","DOIUrl":"https://doi.org/10.1561/106.00000002","url":null,"abstract":"A folksonomy is ostensibly an information structure built up by the \"wisdom of the crowd\", but is the \"crowd\" really doing the work? Tagging is in fact a sharply skewed process in which a small minority of \"supertagger\" users generate an overwhelming majority of the annotations. Using data from three large-scale social tagging platforms, we explore (a) how to best quantify the imbalance in tagging behavior and formally define a supertagger, (b) how supertaggers differ from other users in their tagging patterns, and (c) if effects of motivation and expertise inform our understanding of what makes a supertagger. Our results indicate that such prolific users not only tag more than their counterparts, but in quantifiably different ways. Specifically, we find that supertaggers are more likely to label content in the long tail of less popular items, that they show differences in patterns of content tagged and terms utilized, and are measurably different with respect to tagging expertise and motivation. These findings suggest we should question the extent to which folksonomies achieve crowdsourced classification via the \"wisdom of the crowd\", especially for broad folksonomies like Last.fm as opposed to narrow folksonomies like Flickr.","PeriodicalId":405637,"journal":{"name":"J. Web Sci.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123294378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}