C. Buntain, Richard Bonneau, Jonathan Nagler, Joshua A. Tucker
{"title":"Measuring the Ideology of Audiences for Web Links and Domains Using Differentially Private Engagement Data","authors":"C. Buntain, Richard Bonneau, Jonathan Nagler, Joshua A. Tucker","doi":"10.1609/icwsm.v17i1.22127","DOIUrl":"https://doi.org/10.1609/icwsm.v17i1.22127","url":null,"abstract":"This paper demonstrates the use of differentially private hyperlink-level engagement data for measuring ideologies of audiences for web domains, individual links, or aggregations thereof.\u0000We examine a simple metric for measuring this ideological position and assess the conditions under which the metric is robust to injected, privacy-preserving noise.\u0000This assessment provides insights into and constraints on the level of activity one should observe when applying this metric to privacy-protected data.\u0000Grounding this work is a massive dataset of social media engagement activity where privacy-preserving noise has been injected into the activity data, provided by Facebook and the Social Science One (SS1) consortium.\u0000Using this dataset, we validate our ideology measures by comparing to similar, published work on sharing-based, homophily- and content-oriented measures, where we show consistently high correlation (>0.87).\u0000We then apply this metric to individual links from several popular news domains and demonstrate how one can assess link-level distributions of ideological audiences.\u0000We further show this estimator is robust to selection of engagement types besides sharing, where domain-level audience-ideology assessments based on views and likes show no significant difference compared to sharing-based estimates.\u0000Estimates of partisanship, however, suggest the viewing audience is more moderate than the audiences who share and like these domains.\u0000Beyond providing thresholds on sufficient activity for measuring audience ideology and comparing three types of engagement, this analysis provides a blueprint for ensuring robustness of future work to differential privacy protections.","PeriodicalId":175641,"journal":{"name":"International Conference on Web and Social Media","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133713503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Duilio Balsamo, P. Bajardi, G. D. F. Morales, Corrado Monti, R. Schifanella
{"title":"The Pursuit of Peer Support for Opioid Use Recovery on Reddit","authors":"Duilio Balsamo, P. Bajardi, G. D. F. Morales, Corrado Monti, R. Schifanella","doi":"10.1609/icwsm.v17i1.22122","DOIUrl":"https://doi.org/10.1609/icwsm.v17i1.22122","url":null,"abstract":"Individuals suffering from Opioid Use Disorder and other socially stigmatized conditions often rely on peer support groups to find comfort and motivation while treating their condition. Many may face barriers in accessing peer support treatment, such as shame and social stigma, seclusion, or mobility restrictions. In this study, we quantitatively characterize the potential of the Reddit community in offering these individuals an online alternative to receiving peer support. By analyzing the social interactions of thousands of users during the start of opioid use recovery, we uncover that a particular Reddit community exhibits many characteristics similar to in-person peer support groups, featuring the exchange of support, trust, status, and similar experiences. We find that the supportive behavior of this community nudges users to change their personal behavior, and promotes abandoning opioid-related communities in favor of recovery-oriented relationships. Finally, we find that recognition, acknowledgment, and knowledge exchange are the most relevant factors in sustained engagement with the recovery community. Given this evidence, we suggest that this online community may constitute a complement or a surrogate to peer support groups when in-person meetings are not desirable or possible. Our work might inspire harm reduction policies and interventions to favor successful rehabilitation and is fundamental for future research about the use of digital media for recovery support.","PeriodicalId":175641,"journal":{"name":"International Conference on Web and Social Media","volume":"71 5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123158169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Raiyan Abdul Baten, Yozen Liu, Heinrich Peters, Francesco Barbieri, Neil Shah, Leonardo Neves, M. Bos
{"title":"Predicting Future Location Categories of Users in a Large Social Platform","authors":"Raiyan Abdul Baten, Yozen Liu, Heinrich Peters, Francesco Barbieri, Neil Shah, Leonardo Neves, M. Bos","doi":"10.1609/icwsm.v17i1.22125","DOIUrl":"https://doi.org/10.1609/icwsm.v17i1.22125","url":null,"abstract":"Understanding the users' patterns of visiting various location categories can help online platforms improve content personalization and user experiences. Current literature on predicting future location categories of a user typically employs features that can be traced back to the user, such as spatial geo-coordinates and demographic identities. Moreover, existing approaches commonly suffer from cold-start and generalization problems, and often cannot specify when the user will visit the predicted location category. In a large social platform, it is desirable for prediction models to avoid using user-identifiable data, generalize to unseen and new users, and be able to make predictions for specific times in the future. In this work, we construct a neural model, LocHabits, using data from Snapchat. The model omits user-identifiable inputs, leverages temporal and sequential regularities in the location category histories of Snapchat users and their friends, and predicts the users' next-hour location categories. We evaluate our model on several real-life, large-scale datasets from Snapchat and FourSquare, and find that the model can outperform baselines by 14.94% accuracy. We confirm that the model can (1) generalize to unseen users from different areas and times, and (2) fall back on collective trends in the cold-start scenario. We also study the relative contributions of various factors in making the predictions and find that the users' visitation preferences and most-recent visitation sequences play more important roles than time contexts, same-hour sequences, and social influence features.","PeriodicalId":175641,"journal":{"name":"International Conference on Web and Social Media","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130171848","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Characterizing Coin-Based Voting Governance in DPoS Blockchains","authors":"Chao Li, Runhua Xu, Li Duan","doi":"10.1609/icwsm.v17i1.22225","DOIUrl":"https://doi.org/10.1609/icwsm.v17i1.22225","url":null,"abstract":"Delegated-Proof-of-Stake (DPoS) blockchains are governed by a committee of dozens of members elected via coin-based voting mechanisms. This paper presents a large-scale empirical study of two critical characteristics, personal impact and participation rate, of three leading DPoS blockchains. Our findings reveal the existence of decisive voters whose votes can alter election outcomes, as well as the fact that almost half of the coins have never been used in committee elections. Our research contributes to demystifying the actual use of coin-based voting governance and offers novel insights into the potential security risks of DPoS blockchains.","PeriodicalId":175641,"journal":{"name":"International Conference on Web and Social Media","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122488314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Samir Abdaljalil, S. Hassanein, Hamdy Mubarak, Ahmed Abdelali
{"title":"Towards Generalization of Machine Learning Models: A Case Study of Arabic Sentiment Analysis","authors":"Samir Abdaljalil, S. Hassanein, Hamdy Mubarak, Ahmed Abdelali","doi":"10.1609/icwsm.v17i1.22204","DOIUrl":"https://doi.org/10.1609/icwsm.v17i1.22204","url":null,"abstract":"The abundance of social media data in the Arab world, specifically on Twitter, enabled companies and entities to exploit such rich and beneficial data that could be mined and used to extract important information, including sentiments and opinions of people towards a topic or a merchandise. However, with this plenitude comes the issue of producing models that are able to deliver consistent outcomes when tested within various contexts. Although model generalization has been thoroughly investigated in many fields, it has not been heavily investigated in the Arabic context. To address this gap, we investigate the generalization of models and data in Arabic with application to sentiment analysis, by performing a battery of experiments and building different models that are tested on five independent test sets to understand their performance when presented with unseen data. In doing so, we detail different techniques that improve the generalization of machine learning models in Arabic sentiment analysis, and share a large versatile dataset consisting of approximately 1.64M Arabic tweets and their corresponding sentiment to be used for future research. Our experiments concluded that the most consistent model is trained using a dataset labelled by a cascaded approach of two models, one that labels neutral tweets and another that identifies positive/negative tweets based on the Arabic emoji lexicon after class balancing. Both the BERT and the SVM models trained using the refined data achieve an average F-1 score of 0.62 and 0.60, and standard deviation of 0.06 and 0.04 respectively, when evaluated on five diverse test sets, outperforming other models by at least 17% relative gain in F-1. Based on our experiments, we share recommendations to improve model generalization for classification tasks.","PeriodicalId":175641,"journal":{"name":"International Conference on Web and Social Media","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126763363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Karim Lasri, Manuel Tonneau, Haaya Naushan, Niyati Malhotra, I. Farouq, Victor Orozco-Olvera, S. Fraiberger
{"title":"Large-Scale Demographic Inference of Social Media Users in a Low-Resource Scenario","authors":"Karim Lasri, Manuel Tonneau, Haaya Naushan, Niyati Malhotra, I. Farouq, Victor Orozco-Olvera, S. Fraiberger","doi":"10.1609/icwsm.v17i1.22165","DOIUrl":"https://doi.org/10.1609/icwsm.v17i1.22165","url":null,"abstract":"Characterizing the demographics of social media users\u0000enables a diversity of applications, from better targeting of policy interventions to the derivation of representative population\u0000estimates of social phenomena. Achieving high performance with supervised learning, however, can be challenging as labeled data is often scarce. Alternatively, rule-based matching strategies provide well-grounded information but only offer partial coverage over users. It is unclear, therefore, what features and models are best suited to maximize coverage over a large set of users while maintaining high performance. In this paper, we develop a cost-effective strategy for large-scale demographic inference by relying on minimal labeling efforts. We combine a name-matching strategy with graph-based methods to map the demographics of 1.8 million Nigerian Twitter users.\u0000Specifically, we compare a purely graph-based propagation model, namely Label Propagation (LP), with Graph Convolutional Networks (GCN), a graph model that also incorporates node features based on user content.\u0000We find that both models largely outperform supervised learning approaches based purely on user content that lack graph information. Notably, we find that LP achieves comparable performance to the state-of-the-art GCN while providing greater interpretability at a lower computing cost. Moreover, performance does not significantly improve with the addition of user-specific features, such as textual representations of user tweets and user geolocation. Leveraging our data collection effort, we describe the demographic composition of Nigerian Twitter finding that it is a highly non-uniform sample of the general Nigerian population.","PeriodicalId":175641,"journal":{"name":"International Conference on Web and Social Media","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127097163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exposure to Marginally Abusive Content on Twitter","authors":"J. Bandy, T. Lazovich","doi":"10.2139/ssrn.4175612","DOIUrl":"https://doi.org/10.2139/ssrn.4175612","url":null,"abstract":"Social media platforms can help people find connection and entertainment, but they can also show potentially abusive content such as insults and targeted cursing. While platforms do remove some abusive content for rule violation, some is considered \"margin content\" that does not violate any rules and thus stays on the platform. This paper presents a focused analysis of exposure to such content on Twitter, asking (RQ1) how exposure to marginally abusive content varies across Twitter users, and (RQ2) how algorithmically-ranked timelines impact exposure to marginally abusive content. Based on one month of impression data from November 2021, descriptive analyses (RQ1) show significant variation in exposure, with more active users experiencing higher rates and higher volumes of marginal impressions. Experimental analyses (RQ2) show that users with algorithmically-ranked timelines experience slightly lower rates of marginal impressions. However, they tend to register more total impression activity and thus experience a higher cumulative volume of marginal impressions. The paper concludes by discussing implications of the observed concentration, the multifaceted impact of algorithmically-ranked timelines, and potential directions for future work.","PeriodicalId":175641,"journal":{"name":"International Conference on Web and Social Media","volume":"17 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134376837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improving Mental Health Classifier Generalization with Pre-diagnosis Data","authors":"Yujian Liu, Laura Biester, Rada Mihalcea","doi":"10.1609/icwsm.v17i1.22169","DOIUrl":"https://doi.org/10.1609/icwsm.v17i1.22169","url":null,"abstract":"Recent work has shown that classifiers for depression detection often fail to generalize to new datasets. Most NLP models for this task are built on datasets that use textual reports of a depression diagnosis (e.g., statements on social media) to identify diagnosed users; this approach allows for collection of large-scale datasets, but leads to poor generalization to out-of-domain data. Notably, models tend to capture features that typify direct discussion of mental health rather than more subtle indications of depression symptoms. In this paper, we explore the hypothesis that building classifiers using exclusively social media posts from before a user's diagnosis will lead to less reliance on shortcuts and better generalization. We test our classifiers on a dataset that is based on an external survey rather than textual self-reports, and find that using pre-diagnosis data for training yields improved performance with many types of classifiers.","PeriodicalId":175641,"journal":{"name":"International Conference on Web and Social Media","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134225477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ke Zhou, Marios Constantinides, D. Quercia, S. Šćepanović
{"title":"How Circadian Rhythms Extracted from Social Media Relate to Physical Activity and Sleep","authors":"Ke Zhou, Marios Constantinides, D. Quercia, S. Šćepanović","doi":"10.1609/icwsm.v17i1.22202","DOIUrl":"https://doi.org/10.1609/icwsm.v17i1.22202","url":null,"abstract":"Circadian rhythm has been linked to both physical and mental health at an individual level in prior research. Such a link at population level has been long hypothesized but has never been tested, largely because of lack of data. To partly fix this literature gap, we need: a dataset on population-level circadian rhythms, a dataset on population-level health conditions, and strong associations between these two partly independent sets. Recent work has shown that affect on social media data relates to population-level circadian rhythms. Building upon that work, we extracted five circadian rhythm metrics from 6M Reddit posts across 18 major cities (for which the number of residents is highly correlated with the number of users), and paired them with three ground-truth health metrics (daily number of steps, sleep quantity, and sleep quality) extracted from 233K wearable users in these cities. We found that rhythms of online activity approximated sleeping patterns rather than, what the literature previously hypothesized, alertness levels. Despite that, we found that these rhythms, when computed in two specific times of the day (i.e., late at night and early morning), were still predictive of the three ground-truth health metrics: in general, healthier cities had morning spikes on social media, night dips, and expressions of positive affect. These results suggest that circadian rhythms on social media, if taken at two specific times of the day and operationalized with literature-driven metrics, can approximate the temporal evolution of people's shared underlying biological rhythm as it relates to physical activity (R2=0.492), sleep quantity (R2=0.765), and sleep quality (R2=0.624).","PeriodicalId":175641,"journal":{"name":"International Conference on Web and Social Media","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129290934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Social Influence-Maximizing Group Recommendation","authors":"Yang Sun, Bogdan Cautis, S. Maniu","doi":"10.1609/icwsm.v17i1.22191","DOIUrl":"https://doi.org/10.1609/icwsm.v17i1.22191","url":null,"abstract":"In this paper, we revisit the group recommendation problem, by taking into consideration the information diffusion in a social network, as one of the main criteria that must be maximised. While the well-known influence maximization problem has the objective to select k users (spread seeds) from a social network, so that a piece of information can spread to the largest possible number of people in the network, in our setting the seeds are known (given as a group), and we must decide which k items (pieces of information) should be recommended to them. Therefore, the recommended items should at the same time be the best match for that group's preferences, and have the potential to spread as much as possible in an underlying diffusion network, to which the group members (the seeds) belong. This problem is directly motivated by group recommendation scenarios where social networking is an inherent dimension that must be taken into account when assessing the potential impact of a certain recommendation. We present the model and formulate the problem of influence-aware group recommendation as a multiple objective optimization problem. We then describe a greedy approach for this problem and we design an optimisation approach, by adapting the top-k algorithms NRA and TA. We evaluate all these methods experimentally, in three different recommendation scenarios, for movie, micro-blog and book recommendations, based on real-world datasets from Flixster, Twitter, and Douban respectively. Unsurprisingly, with the introduction of information diffusion as an optimization criterion for group recommendation, the recommendation problem becomes more complex. However, we show that our algorithms enable spread efficiency without loss of recommendation precision, under reasonable latency.","PeriodicalId":175641,"journal":{"name":"International Conference on Web and Social Media","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121674916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}