Journal of Computational Social Science最新文献_第6页

A comparison of approaches for imbalanced classification problems in the context of retrieving relevant documents for an analysis. 在检索相关文档进行分析的背景下，不平衡分类问题的方法比较。

IF 3.2

Journal of Computational Social Science Pub Date : 2023-01-01 DOI: 10.1007/s42001-022-00191-7

Sandra Wankmüller

{"title":"A comparison of approaches for imbalanced classification problems in the context of retrieving relevant documents for an analysis.","authors":"Sandra Wankmüller","doi":"10.1007/s42001-022-00191-7","DOIUrl":"https://doi.org/10.1007/s42001-022-00191-7","url":null,"abstract":"One of the first steps in many text-based social science studies is to retrieve documents that are relevant for an analysis from large corpora of otherwise irrelevant documents. The conventional approach in social science to address this retrieval task is to apply a set of keywords and to consider those documents to be relevant that contain at least one of the keywords. But the application of incomplete keyword lists has a high risk of drawing biased inferences. More complex and costly methods such as query expansion techniques, topic model-based classification rules, and active as well as passive supervised learning could have the potential to more accurately separate relevant from irrelevant documents and thereby reduce the potential size of bias. Yet, whether applying these more expensive approaches increases retrieval performance compared to keyword lists at all, and if so, by how much, is unclear as a comparison of these approaches is lacking. This study closes this gap by comparing these methods across three retrieval tasks associated with a data set of German tweets (Linder in SSRN, 2017. 10.2139/ssrn.3026393), the Social Bias Inference Corpus (SBIC) (Sap et al. in Social bias frames: reasoning about social and power implications of language. In: Jurafsky et al. (eds) Proceedings of the 58th annual meeting of the association for computational linguistics. Association for Computational Linguistics, p 5477-5490, 2020. 10.18653/v1/2020.aclmain.486), and the Reuters-21578 corpus (Lewis in Reuters-21578 (Distribution 1.0). [Data set], 1997. http://www.daviddlewis.com/resources/testcollections/reuters21578/). Results show that query expansion techniques and topic model-based classification rules in most studied settings tend to decrease rather than increase retrieval performance. Active supervised learning, however, if applied on a not too small set of labeled training instances (e.g. 1000 documents), reaches a substantially higher retrieval performance than keyword lists.","PeriodicalId":29946,"journal":{"name":"Journal of Computational Social Science","volume":"6 1","pages":"91-163"},"PeriodicalIF":3.2,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9762672/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9469919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

A scoping review on the use of natural language processing in research on political polarization: trends and research prospects. 自然语言处理在政治极化研究中的应用综述：趋势与研究展望。

IF 2

Journal of Computational Social Science Pub Date : 2023-01-01 Epub Date: 2022-12-19 DOI: 10.1007/s42001-022-00196-2

Renáta Németh

{"title":"A scoping review on the use of natural language processing in research on political polarization: trends and research prospects.","authors":"Renáta Németh","doi":"10.1007/s42001-022-00196-2","DOIUrl":"10.1007/s42001-022-00196-2","url":null,"abstract":"As part of the \"text-as-data\" movement, Natural Language Processing (NLP) provides a computational way to examine political polarization. We conducted a methodological scoping review of studies published since 2010 (n = 154) to clarify how NLP research has conceptualized and measured political polarization, and to characterize the degree of integration of the two different research paradigms that meet in this research area. We identified biases toward US context (59%), Twitter data (43%) and machine learning approach (33%). Research covers different layers of the political public sphere (politicians, experts, media, or the lay public), however, very few studies involved more than one layer. Results indicate that only a few studies made use of domain knowledge and a high proportion of the studies were not interdisciplinary. Those studies that made efforts to interpret the results demonstrated that the characteristics of political texts depend not only on the political position of their authors, but also on other often-overlooked factors. Ignoring these factors may lead to overly optimistic performance measures. Also, spurious results may be obtained when causal relations are inferred from textual data. Our paper provides arguments for the integration of explanatory and predictive modeling paradigms, and for a more interdisciplinary approach to polarization research.Supplementary information: The online version contains supplementary material available at 10.1007/s42001-022-00196-2.","PeriodicalId":29946,"journal":{"name":"Journal of Computational Social Science","volume":"6 1","pages":"289-313"},"PeriodicalIF":2.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9762668/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9469920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

School dropout prediction and feature importance exploration in Malawi using household panel data: machine learning approach 使用家庭面板数据的马拉维辍学预测和特征重要性探索:机器学习方法

IF 3.2

Journal of Computational Social Science Pub Date : 2022-12-13 DOI: 10.1007/s42001-022-00195-3

Hazal Colak Oz, Çiçek Güven, Gonzalo Nápoles

引用次数: 2

Evaluating algorithmic homeless service allocation 评估无家可归者服务分配算法

IF 3.2

Journal of Computational Social Science Pub Date : 2022-12-10 DOI: 10.1007/s42001-022-00190-8

Wenting Qi, C. Chelmis

引用次数: 0

User behaviors in consumer-generated media under monetary reward schemes 货币奖励机制下消费者生成媒体中的用户行为

IF 3.2

Journal of Computational Social Science Pub Date : 2022-11-12 DOI: 10.1007/s42001-022-00187-3

Yutaro Usui, F. Toriumi, T. Sugawara

引用次数: 2

Group-specific behavior change following terror attacks 恐怖袭击后群体特定行为的改变

IF 3.2

Journal of Computational Social Science Pub Date : 2022-11-12 DOI: 10.1007/s42001-022-00188-2

Jonas L. Juul, Laura Alessandretti, J. Dammeyer, Ingo Zettler, Sune Lehmann, J. Mathiesen

引用次数: 0

Identification of intimate partner violence from free text descriptions in social media. 从社交媒体上的自由文本描述识别亲密伴侣暴力。

IF 2

Journal of Computational Social Science Pub Date : 2022-11-01 Epub Date: 2022-05-07 DOI: 10.1007/s42001-022-00166-8

Phan Trinh Ha, Rhea D'Silva, Ethan Chen, Mehmet Koyutürk, Günnur Karakurt

{"title":"Identification of intimate partner violence from free text descriptions in social media.","authors":"Phan Trinh Ha, Rhea D'Silva, Ethan Chen, Mehmet Koyutürk, Günnur Karakurt","doi":"10.1007/s42001-022-00166-8","DOIUrl":"10.1007/s42001-022-00166-8","url":null,"abstract":"Intimate partner violence (IPV) is a significant public health problem that adversely affects the well-being of victims. IPV is often under-reported and non-physical forms of violence may not be recognized as IPV, even by victims. With the increasing popularity of social media and due to the anonymity provided by some of these platforms, people feel comfortable sharing descriptions of their relationship problems in social media. The content generated in these platforms can be useful in identifying IPV and characterizing the prevalence, causes, consequences, and correlates of IPV in broad populations. However, these descriptions are in the form of free text and no corpus of labeled data is available to perform large-scale computational and statistical analyses. Here, we use data from established questionnaires that are used to collect self-report data on IPV to train machine learning models to predict IPV from free text. Using Universal Sentence Encoder (USE) along with multiple machine learning algorithms (random forest, SVM, logistic regression, Naïve Bayes), we develop DetectIPV, a tool for detecting IPV in free text. Using DetectIPV, we comprehensively characterize the predictability of different types of violence (physical abuse, emotional abuse, sexual abuse) from free text. Our results show that a general model that is trained using examples of all violence types can identify IPV from free text with area under the ROC curve (AUROC) 89%. We also train type-specific models and observe that physical abuse can be identified with greatest accuracy (AUROC 98%), while sexual abuse can be identified with high precision but relatively low recall. While our results indicate that the prediction of emotional abuse is the most challenging, DetectIPV can identify emotional abuse with AUROC above 80%. These results establish DetectIPV as a tool that can be used to reliably detect IPV in the context of various applications, ranging from flagging social media posts to detecting IPV in large text corpuses for research purposes. DetectIPV is available as a web service at https://www.ipvlab.case.edu/ipvdetect/.","PeriodicalId":29946,"journal":{"name":"Journal of Computational Social Science","volume":"32 1","pages":"1207-1233"},"PeriodicalIF":2.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12040337/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88815530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

The gameability of redistricting criteria 重划选区标准的可游戏性

IF 3.2

Journal of Computational Social Science Pub Date : 2022-10-26 DOI: 10.1007/s42001-022-00180-w

Amariah Becker, Dara Gold

引用次数: 5

Lingual markers for automating personality profiling: background and road ahead 自动化人格分析的语言标记:背景和未来的道路

IF 3.2

Journal of Computational Social Science Pub Date : 2022-09-22 DOI: 10.1007/s42001-022-00184-6

Mohmad Azhar Teli, M. Chachoo

引用次数: 1

Using word embedding models to capture changing media discourses: a study on the role of legitimacy, gender and genre in 24,000 music reviews, 1999–2021 使用词嵌入模型捕捉不断变化的媒体话语:1999-2021年24000篇音乐评论中合法性、性别和流派作用的研究