{"title":"How to Assess the Exhaustiveness of Longitudinal Web Archives: A Case Study of the German Academic Web","authors":"Michael Paris, R. Jäschke","doi":"10.1145/3372923.3404836","DOIUrl":"https://doi.org/10.1145/3372923.3404836","url":null,"abstract":"Longitudinal web archives can be a foundation for investigating structural and content-based research questions. One prerequisite is that they contain a faithful representation of the relevant subset of the web. Therefore, an assessment of the authority of a given dataset with respect to a research question should precede the actual investigation. Next to proper creation and curation, this requires measures for estimating the potential of a longitudinal web archive to yield information about the central objects the research question aims to investigate. In particular, content-based research questions often lack the ab-initio confidence about the integrity of the data. In this paper we focus on one specifically important aspect, namely the exhaustiveness of the dataset with respect to the central objects. Therefore, we investigate the recall coverage of researcher names in a longitudinal academic web crawl over a seven year period and the influence of our crawl method on the dataset integrity. Additionally, we propose a method to estimate the amount of missing information as a means to describe the exhaustiveness of the crawl and motivate a use case for the presented corpus.","PeriodicalId":389616,"journal":{"name":"Proceedings of the 31st ACM Conference on Hypertext and Social Media","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130854743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Augustine as \"Naturalist of the Mind\"","authors":"R. M. Simpson","doi":"10.1145/3372923.3404814","DOIUrl":"https://doi.org/10.1145/3372923.3404814","url":null,"abstract":"This paper, Augustine as \"Naturalist of the Mind\", is a linear portal to its associated graph-structured Tinderbox hypertext. The hypertext is one component of a research project arising out of a Philosophy seminar on Augustine as the preeminent bridge philosopher between the ancient world of Greece and Rome and the subsequent 1000 years of Western philosophy. The research project explores some surprising insights that emerged during this seminar from a deep study of Augustine's Confessions: Book 10-Memory. The purpose of the Augustine as \"Naturalist of the Mind\" Tinderbox hypertext is not only to be a multi-dimensional resource base for the research but also to provide an exploratorium where new materials can be added, new relationships created, and new research directions can be discovered and pursued.","PeriodicalId":389616,"journal":{"name":"Proceedings of the 31st ACM Conference on Hypertext and Social Media","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121415597","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shabnam Najafian, Daniel Herzog, S. Qiu, O. Inel, N. Tintarev
{"title":"You Do Not Decide for Me! Evaluating Explainable Group Aggregation Strategies for Tourism","authors":"Shabnam Najafian, Daniel Herzog, S. Qiu, O. Inel, N. Tintarev","doi":"10.1145/3372923.3404800","DOIUrl":"https://doi.org/10.1145/3372923.3404800","url":null,"abstract":"Most recommender systems propose items to individual users. However, in domains such as tourism, people often consume items in groups rather than individually. Different individual preferences in such a group can be difficult to resolve, and often compromises need to be made. Social choice strategies can be used to aggregate the preferences of individuals. We evaluated two explainable modified preference aggregation strategies in a between-subject study (n=200), and compared them with two baseline strategies for groups that are also explainable, in two scenarios: high divergence (group members with different travel preferences) and low divergence (group members with similar travel preferences). Generally, all investigated aggregation strategies performed well in terms of perceived individual and group satisfaction and perceived fairness. The results also indicate that participants were sensitive to a dictator-based strategy, which affected both their individual and group satisfaction negatively (compared to the other strategies).","PeriodicalId":389616,"journal":{"name":"Proceedings of the 31st ACM Conference on Hypertext and Social Media","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116024629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pushkal Agarwal, Miriam Redi, Nishanth R. Sastry, E. Wood, Andrew Blick
{"title":"Wikipedia and Westminster: Quality and Dynamics of Wikipedia Pages about UK Politicians","authors":"Pushkal Agarwal, Miriam Redi, Nishanth R. Sastry, E. Wood, Andrew Blick","doi":"10.1145/3372923.3404817","DOIUrl":"https://doi.org/10.1145/3372923.3404817","url":null,"abstract":"Wikipedia is a major source of information providing a large variety of content online, trusted by readers from around the world. Readers go to Wikipedia to get reliable information about different subjects, one of the most popular being living people, and especially politicians. While a lot is known about the general usage and information consumption on Wikipedia, less is known about the life-cycle and quality of Wikipedia articles in the context of politics. The aim of this study is to quantify and qualify content production and consumption for articles about politicians, with a specific focus on UK Members of Parliament (MPs). First, we analyze spatio-temporal patterns of readers' and editors' engagement with MPs' Wikipedia pages, finding huge peaks of attention during election times, related to signs of engagement on other social media (e.g. Twitter). Second, we quantify editors' polarisation and find that most editors specialize in a specific party and choose specific news outlets as references. Finally we observe that the average citation quality is pretty high, with statements on 'Early life and career' missing citations most often (18%).","PeriodicalId":389616,"journal":{"name":"Proceedings of the 31st ACM Conference on Hypertext and Social Media","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127614666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Siva Charan Reddy Gangireddy, D. P., Cheng Long, Tanmoy Chakraborty
{"title":"Unsupervised Fake News Detection: A Graph-based Approach","authors":"Siva Charan Reddy Gangireddy, D. P., Cheng Long, Tanmoy Chakraborty","doi":"10.1145/3372923.3404783","DOIUrl":"https://doi.org/10.1145/3372923.3404783","url":null,"abstract":"Fake news has become more prevalent than ever, correlating with the rise of social media that allows every user to rapidly publish their views or hearsay. Today, fake news spans almost every realm of human activity, across diverse fields such as politics and healthcare. Most existing methods for fake news detection leverage supervised learning methods and expect a large labelled corpus of articles and social media user engagement information, which are often hard, time-consuming and costly to procure. In this paper, we consider the task of unsupervised fake news detection, which considers fake news detection in the absence of labelled historical data. We develop GTUT, a graph-based approach for the task which operates in three phases. Starting off with identifying a seed set of fake and legitimate articles exploiting high-level observations on inter-user behavior in fake news propagation, it progressively expands the labelling to all articles in the dataset. Our technique draws upon graph-based methods such as biclique identification, graph-based feature vector learning and label spreading. Through an extensive empirical evaluation over multiple real-world datasets, we establish the improved effectiveness of our method over state-of-the-art techniques for the task.","PeriodicalId":389616,"journal":{"name":"Proceedings of the 31st ACM Conference on Hypertext and Social Media","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127558677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Toktam A. Oghaz, Ece C. Mutlu, Jasser Jasser, Niloofar Yousefi, I. Garibay
{"title":"Probabilistic Model of Narratives Over Topical Trends in Social Media: A Discrete Time Model","authors":"Toktam A. Oghaz, Ece C. Mutlu, Jasser Jasser, Niloofar Yousefi, I. Garibay","doi":"10.1145/3372923.3404790","DOIUrl":"https://doi.org/10.1145/3372923.3404790","url":null,"abstract":"Online social media platforms are turning into the prime source of news and narratives about worldwide events. However, a systematic summarization-based narrative extraction that can facilitate communicating the main underlying events is lacking. To address this issue, we propose a novel event-based narrative summary extraction framework. Our proposed framework is designed as a probabilistic topic model, with categorical time distribution, followed by extractive text summarization. Our topic model identifies topics' recurrence over time with a varying time resolution. This framework not only captures the topic distributions from the data, but also approximates the user activity fluctuations over time. Furthermore, we define significance-dispersity trade-off (SDT) as a comparison measure to identify the topic with the highest lifetime attractiveness in a timestamped corpus. We evaluate our model on a large corpus of Twitter data, including more than one million tweets in the domain of the disinformation campaigns conducted against the White Helmets of Syria. Our results indicate that the proposed framework is effective in identifying topical trends, as well as extracting narrative summaries from text corpus with timestamped data.","PeriodicalId":389616,"journal":{"name":"Proceedings of the 31st ACM Conference on Hypertext and Social Media","volume":"129 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131508001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Milo Z. Trujillo, Mauricio G. Gruppi, C. Buntain, Benjamin D. Horne
{"title":"What is BitChute?: Characterizing the","authors":"Milo Z. Trujillo, Mauricio G. Gruppi, C. Buntain, Benjamin D. Horne","doi":"10.1145/3372923.3404833","DOIUrl":"https://doi.org/10.1145/3372923.3404833","url":null,"abstract":"In this paper, we characterize the content and discourse on BitChute, a social video-hosting platform. Launched in 2017 as an alternative to YouTube, BitChute joins an ecosystem of alternative, low content moderation platforms, including Gab, Voat, Minds, and 4chan. Uniquely, BitChute is the first of these alternative platforms to focus on video content and is growing in popularity. Our analysis reveals several key characteristics of the platform. We find that only a handful of channels receive any engagement, and almost all of those channels contain conspiracies or hate speech. This high rate of hate speech on the platform as a whole, much of which is anti-Semitic, is particularly concerning. Our results suggest that BitChute has a higher rate of hate speech than Gab but less than 4chan. Lastly, we find that while some BitChute content producers have been banned from other platforms, many maintain profiles on mainstream social media platforms, particularly YouTube. This paper contributes a first look at the content and discourse on BitChute and provides a building block for future research on low content moderation platforms.","PeriodicalId":389616,"journal":{"name":"Proceedings of the 31st ACM Conference on Hypertext and Social Media","volume":"114 1 Pt 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128931581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"NesTPP: Modeling Thread Dynamics in Online Discussion Forums","authors":"Chen Ling, G. Tong, Mozi Chen","doi":"10.1145/3372923.3404796","DOIUrl":"https://doi.org/10.1145/3372923.3404796","url":null,"abstract":"Online discussion forum creates an asynchronous conversation environment for online users to exchange ideas and share opinions through a unique thread-reply communication mode. Accurately modeling information dynamics under such a mode is important, as it provides a means of mining latent spread patterns and understanding user behaviors. In this paper, we design a novel temporal point process model to characterize information cascades in online discussion forums. The proposed model views the entire event space as a nested structure composed of main thread streams and their linked reply streams, and it explicitly models the correlations between these two types of streams through their intensity functions. Leveraging the Reddit data, we examine the performance of the designed model in different applications and compare it with other popular methods. The experimental results have shown that our model can produce competitive results, and it outperforms state-of-the-art methods in most cases.","PeriodicalId":389616,"journal":{"name":"Proceedings of the 31st ACM Conference on Hypertext and Social Media","volume":"94 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116392506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Scalable Heterogeneous Social Network Alignment through Synergistic Graph Partition","authors":"Yuxiang Ren, Lin Meng, Jiawei Zhang","doi":"10.1145/3372923.3404799","DOIUrl":"https://doi.org/10.1145/3372923.3404799","url":null,"abstract":"Social network alignment has been an important research problem for social network analysis in recent years. With the identified shared users across networks, it will provide researchers with the opportunity to achieve a more comprehensive understanding of users' social activities both within and across networks. Social network alignment is a very difficult problem. Besides the challenges introduced by the network heterogeneity, the network alignment can be reduced to a combinatorial optimization problem with an extremely large search space. The learning effectiveness and efficiency of existing alignment models will be degraded significantly as the network size increases. In this paper, we focus on studying the scalable heterogeneous social network alignment problem and propose to address it with a novel two-stage network alignment model, namely Scalable Heterogeneous Network Alignment (SHNA). Based on a group of intra- and inter-network meta diagrams, SHNA first partitions the social networks into a group of sub-networks synergistically. Via the partially known anchor links, SHNA can extract the partitioned sub-network correspondence relationships. Instead of aligning the complete input network, SHNA proposes to identify the anchor links between the matched sub-network pairs, while those between the unmatched sub-networks will be pruned to effectively shrink the search space. Extensive experiments have been done to compare SHNA with the state-of-the-art baseline methods on a real-world aligned social networks dataset. The experimental results have demonstrated both the effectiveness and efficiency of SHNA in addressing the problem.","PeriodicalId":389616,"journal":{"name":"Proceedings of the 31st ACM Conference on Hypertext and Social Media","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128199321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ninareh Mehrabi, Thamme Gowda, Fred Morstatter, Nanyun Peng, A. Galstyan
{"title":"Man is to Person as Woman is to Location: Measuring Gender Bias in Named Entity Recognition","authors":"Ninareh Mehrabi, Thamme Gowda, Fred Morstatter, Nanyun Peng, A. Galstyan","doi":"10.1145/3372923.3404804","DOIUrl":"https://doi.org/10.1145/3372923.3404804","url":null,"abstract":"In this paper, we study the bias in named entity recognition (NER) models---specifically, the difference in the ability to recognize male and female names as PERSON entity types. We evaluate NER models on a dataset containing 139 years of U.S. census baby names and find that relatively more female names, as opposed to male names, are not recognized as PERSON entities. The result of this analysis yields a new benchmark for gender bias evaluation in named entity recognition systems. The data and code for the application of this benchmark is publicly available for researchers to use.","PeriodicalId":389616,"journal":{"name":"Proceedings of the 31st ACM Conference on Hypertext and Social Media","volume":"97 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124054504","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}