{"title":"A Keyword-Based Method for Measuring Sentence Similarity","authors":"Yuanjun Bi, Kai Deng, Jinxing Cheng","doi":"10.1145/3091478.3098878","DOIUrl":"https://doi.org/10.1145/3091478.3098878","url":null,"abstract":"In this paper, a sentence similarity computing approach based on keywords is presented. First, it extracts the keywords from a sentence. Then the approach computes ranking scores for the keywords. Finally it applies these ranking scores into the sentence similarity computation using the Jaccard similarity coefficient. Experiments on a real word chatterbot system dataset demonstrate that this proposed approach significantly improves the relevance of sentence similarity method up to 30%.","PeriodicalId":165747,"journal":{"name":"Proceedings of the 2017 ACM on Web Science Conference","volume":"129 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126698450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yasmin AlNoamany, Michele C. Weigle, Michael L. Nelson
{"title":"Generating Stories From Archived Collections","authors":"Yasmin AlNoamany, Michele C. Weigle, Michael L. Nelson","doi":"10.1145/3091478.3091508","DOIUrl":"https://doi.org/10.1145/3091478.3091508","url":null,"abstract":"With the extensive growth of the Web, multiple Web archiving initiatives have been started to archive different aspects of the Web. Services such as Archive-It exist to allow institutions to develop, curate, and preserve collections of Web resources. Understanding the contents and boundaries of these archived collections is a challenge, resulting in the paradox of the larger the collection, the harder it is to understand. Meanwhile, as the sheer volume of data grows on the Web, \"storytelling\" is becoming a popular technique in social media for selecting Web resources to support a particular narrative or \"story\". We address the problem of understanding archived collections by proposing the Dark and Stormy Archive (DSA) framework, in which we integrate \"storytelling\" social media and Web archives. In the DSA framework, we identify, evaluate, and select candidate Web pages from archived collections that summarize the holdings of these collections, arrange them in chronological order, and then visualize these pages using tools that users already are familiar with, such as Storify. Inspired by the Turing Test, we evaluate the stories automatically generated by the DSA framework against a ground truth dataset of hand-crafted stories, generated by expert archivists from Archive-It collections. Using Amazon's Mechanical Turk, we found that the stories automatically generated by DSA are indistinguishable from those created by human subject domain experts, while at the same time both kinds of stories (automatic and human) are easily distinguished from randomly generated stories.","PeriodicalId":165747,"journal":{"name":"Proceedings of the 2017 ACM on Web Science Conference","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133490868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiankai Sun, Deepak Ajwani, Patrick K. Nicholson, A. Sala, S. Parthasarathy
{"title":"Breaking Cycles In Noisy Hierarchies","authors":"Jiankai Sun, Deepak Ajwani, Patrick K. Nicholson, A. Sala, S. Parthasarathy","doi":"10.1145/3091478.3091495","DOIUrl":"https://doi.org/10.1145/3091478.3091495","url":null,"abstract":"Taxonomy graphs that capture hyponymy or meronymy relationships through directed edges are expected to be acyclic. However, in practice, they may have thousands of cycles, as they are often created in a crowd-sourced way. Since these cycles represent logical fallacies, they need to be removed for many web applications. In this paper, we address the problem of breaking cycles while preserving the logical structure (hierarchy) of a directed graph as much as possible. Existing approaches for this problem either need manual intervention or use heuristics that can critically alter the taxonomy structure. In contrast, our approach infers graph hierarchy using a range of features, including a Bayesian skill rating system and a social agony metric. We also devise several strategies to leverage the inferred hierarchy for removing a small subset of edges to make the graph acyclic. Extensive experiments demonstrate the effectiveness of our approach.","PeriodicalId":165747,"journal":{"name":"Proceedings of the 2017 ACM on Web Science Conference","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133218764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"InstaCan: Examining Deleted Content on Instagram","authors":"Ramine Tinati, Aastha Madaan, W. Hall","doi":"10.1145/3091478.3091516","DOIUrl":"https://doi.org/10.1145/3091478.3091516","url":null,"abstract":"As the speed, volume, and heterogeneity of data produced on the Web increases, we are faced with developing more intelligent and efficient strategies for storing and archiving data. The archiving of Web data involves many technical, governance, and policy related challenges, however one of the most prominent and timely challenges that archivists face involves the deletion of data which from existing data stores; popularised by the various policy-related movements, such as the 'right to be forgotten'. For social media researchers, organisations, and analysis companies, it is a requirement for them to comply to the removal requests of the streams they consume. However, due to the nature of archiving, this is often difficult to comply to, without becoming a resource intensive exercise. In this paper we investigate deleted content on Instagram, the structure of the Instagram platform, and develop and evaluate a method to identify content which will becomes deleted. Our work contributes to the archiving community, and the Web Science community, interested in understanding the social factors that contribute the use of Social Media.","PeriodicalId":165747,"journal":{"name":"Proceedings of the 2017 ACM on Web Science Conference","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132839700","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shahan Ali Memon, R. Pillai, Susan Dun, Yelena Mejova, Ingmar Weber
{"title":"Public Perception of a Country: Exploring Tweets About Qatar","authors":"Shahan Ali Memon, R. Pillai, Susan Dun, Yelena Mejova, Ingmar Weber","doi":"10.1145/3091478.3098872","DOIUrl":"https://doi.org/10.1145/3091478.3098872","url":null,"abstract":"Is it possible to \"hack\" an image of an international entity by driving international and domestic media? Here, we present an image/brand monitoring tool for a country, Qatar, which presents an overview of the contexts and references to media in which it is mentioned on social media. Tracking dozens of languages, this tool allows a global understanding of the perceptions and concerns Twitter users associate with Qatar, and which mainstream media may be driving these sentiments.","PeriodicalId":165747,"journal":{"name":"Proceedings of the 2017 ACM on Web Science Conference","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116225597","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Community Photo Tagging: Engagement and Quality Study","authors":"A. Ponomarev","doi":"10.1145/3091478.3098874","DOIUrl":"https://doi.org/10.1145/3091478.3098874","url":null,"abstract":"With today's dissemination of digital cameras, running events are usually well-presented in various photo sharing platforms. The number of photos from one medium-sized race can easily exceed a thousand or two, therefore, it is a tedious task for a participant to find photos where he/she is depicted. Bibtaggers is a crowd-based service allowing volunteers (usually, race participants) to collectively tag photos with race bib numbers, therefore enabling fast photo search. The paper presents results obtained using the service, specifically, the quality of tags collected and the user engagement data. These results are also compared to bib tags collected with a help of Amazon Mechanical Turk using similar procedure but different (monetary) incentive.","PeriodicalId":165747,"journal":{"name":"Proceedings of the 2017 ACM on Web Science Conference","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114812927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Elvira Perez, A. Koene, Virginia Portillo, L. Dowthwaite, Monica Cano
{"title":"Young People's Policy Recommendations on Algorithm Fairness","authors":"Elvira Perez, A. Koene, Virginia Portillo, L. Dowthwaite, Monica Cano","doi":"10.1145/3091478.3091512","DOIUrl":"https://doi.org/10.1145/3091478.3091512","url":null,"abstract":"This paper explores the policy recommendations made by young people regarding algorithm fairness. It describes a piece of ongoing research developed to bring children and young people to the front line of the debate regarding children's digital rights. We employed the Youth Juries methodology which was designed to facilitate learning through discussions. The juries capture the deliberation process on a specific digital right, the right to know how algorithms govern and influence the Web and its users. Preliminary results show that young people demand to know more about algorithms, they want more transparency, more options, and more control about the way algorithms use their personal data.","PeriodicalId":165747,"journal":{"name":"Proceedings of the 2017 ACM on Web Science Conference","volume":"1999 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116684735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Deep Study into the History of Web Design","authors":"Bardia Doosti, David J. Crandall, N. Su","doi":"10.1145/3091478.3091503","DOIUrl":"https://doi.org/10.1145/3091478.3091503","url":null,"abstract":"Since its ambitious beginnings to create a hyperlinked information system, the web has evolved over 25 years to become our primary means of expression and communication. No longer limited to text, the evolving visual features of websites are important signals of larger societal shifts in humanity's technologies, aesthetics, cultures, and industries. Just as paintings can be analyzed to study an era's social norms and culture, techniques for systematically analyzing large-scale archives of the web could help unpack global changes in the visual appearance of websites and of modern society itself. In this paper, we propose automated techniques for characterizing the visual \"style\" of websites and use this analysis to discover and visualize shifts over time and across website domains. In particular, we use deep Convolutional Neural Networks to classify websites into 26 subject areas (e.g., technology, news media websites) and 4 design eras. The features produced by this process then allow us to quantitatively characterize the appearance of any given website. We demonstrate how to track changes in these features over time and introduce a technique using Hidden Markov Models (HMMs) to discover sudden, significant changes in these appearances. Finally, we visualize the features learned by our network to help reveal the distinctive visual elements that were discovered by the network.","PeriodicalId":165747,"journal":{"name":"Proceedings of the 2017 ACM on Web Science Conference","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121827122","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"'Dark Germany': Temporal Characteristics and Connectivity Patterns in Online Far-Right Protests Against Refugee Housing","authors":"Sebastian Schelter, Jérôme Kunegis","doi":"10.1145/3091478.3098880","DOIUrl":"https://doi.org/10.1145/3091478.3098880","url":null,"abstract":"We present a quantitative study of the social media activities of a contemporary nationwide protest movement against local refugee housing in Germany, which organizes itself via dedicated city-level Facebook pages. We analyse data from 2015, containing more than one million interactions by more than 200,000 users. We investigate the temporal characteristics of the social media activities of this protest movement, as well as the connectedness of the interactions of its participants. We find several activity metrics such as the number of posts issued, negative polarity in comments, and user engagement to peak in late 2015, coinciding with chancellor Angela Merkel's much criticized decision of September 2015 to temporarily admit the entry of Syrian refugees to Germany. Furthermore, our evidence suggests a low degree of direct connectedness of participants in this movement, (i.a., indicated by a lack of geographical collaboration patterns), yet we encounter a strong affiliation of the pages' user base with far-right political parties.","PeriodicalId":165747,"journal":{"name":"Proceedings of the 2017 ACM on Web Science Conference","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131838251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Observing Web Archives: The Case for an Ethnographic Study of Web Archiving","authors":"J. Ogden, S. Halford, L. Carr","doi":"10.1145/3091478.3091506","DOIUrl":"https://doi.org/10.1145/3091478.3091506","url":null,"abstract":"This paper makes the case for studying the work of web archivists, in an effort to explore the ways in which practitioners shape the preservation and maintenance of the archived Web in its various forms. An ethnographic approach is taken through the use of observation, interviews and documentary sources over the course of several weeks in collaboration with web archivists, engineers and managers at the Internet Archive - a private, non-profit digital library that has been archiving the Web since 1996. The concept of web archival labour is proposed to encompass and highlight the ways in which web archivists (as both networked human and non-human agents) shape and maintain the preserved Web through work that is often embedded in and obscured by the complex technical arrangements of collection and access. As a result, this engagement positions web archives as places of knowledge and cultural production in their own right, revealing new insights into the performative nature of web archiving that have implications for how these data are used and understood.1","PeriodicalId":165747,"journal":{"name":"Proceedings of the 2017 ACM on Web Science Conference","volume":"574 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127083056","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}