{"title":"Data Imbalances in Coincidence Analysis: A Simulation Study","authors":"Martyna Daria Swiatczak, Michael Baumgartner","doi":"10.1177/00491241241227039","DOIUrl":"https://doi.org/10.1177/00491241241227039","url":null,"abstract":"In this paper, we investigate the conditions under which data imbalances, a common data characteristic that occurs when factor values are unevenly distributed, are problematic for the performance of Coincidence Analysis (CNA). We further examine how such imbalances relate to fragmentation and noise in data. We show that even extreme data imbalances, when not combined with fragmentation or noise, do not negatively affect CNA’s performance. However, an extended series of simulation experiments on fuzzy-set data reveals that, when mixed with fragmentation or noise, data imbalances may substantially impair CNA’s performance. Furthermore, we find that the performance impairment is higher when endogenous factors are imbalanced than when exogenous factors are concerned. Our results allow us to quantify these impacts and demarcate degrees at which data imbalances should be considered as problematic. Thus, applied researchers can use our demarcation guidelines to enhance the validity of their studies.","PeriodicalId":21849,"journal":{"name":"Sociological Methods & Research","volume":"27 1","pages":""},"PeriodicalIF":6.3,"publicationDate":"2024-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140165080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dustin S. Stoltz, Marshall A. Taylor, Jennifer S. K. Dudley
{"title":"A Tool Kit for Relation Induction in Text Analysis","authors":"Dustin S. Stoltz, Marshall A. Taylor, Jennifer S. K. Dudley","doi":"10.1177/00491241241233242","DOIUrl":"https://doi.org/10.1177/00491241241233242","url":null,"abstract":"Distances derived from word embeddings can measure a range of gradational relations—similarity, hierarchy, entailment, and stereotype—and can be used at the document- and author-level in ways that overcome some of the limitations of weighted dictionary methods. We provide a comprehensive introduction to using word embeddings for relation induction, and demonstrate how such techniques can complement dictionary methods as unsupervised, deductive methods.","PeriodicalId":21849,"journal":{"name":"Sociological Methods & Research","volume":"46 1","pages":""},"PeriodicalIF":6.3,"publicationDate":"2024-02-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140015572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Occupational Percentile Rank: A New Method for Constructing a Socioeconomic Index of Occupational Status","authors":"Xi Song, Yu Xie","doi":"10.1177/00491241231207914","DOIUrl":"https://doi.org/10.1177/00491241231207914","url":null,"abstract":"In this paper, we propose a method for constructing an occupation-based socioeconomic index that can easily incorporate changes in occupational structure. The resulting index is the occupational percentile rank for a given cohort, based on contemporaneous information pertaining to educational composition and the number of workers at the occupation level. An occupation may experience an increase or decrease in its occupational rank due to changes in relative sizes and educational compositions across occupations. The method is flexible in dealing with changes in occupational and educational measurements over time. Applying the method to U.S. history from the mid-nineteenth century to the present day, we derive the index using IPUMS U.S. Census microdata from 1850 to 2000 and the American Community Surveys (ACSs) from 2001 to 2018. Compared to previous occupational measures, this new measure takes into account occupational status evolvement caused by long-term secular changes in occupational size and educational composition. The resulting percentile rank measure can be easily merged with social surveys and administrative data that include occupational measures based on the U.S. Census occupation codes and crosswalks.","PeriodicalId":21849,"journal":{"name":"Sociological Methods & Research","volume":" 17","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135286393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Marginal and Conditional Confounding Using Logits.","authors":"Kristian Bernt Karlson, Frank Popham, Anders Holm","doi":"10.1177/0049124121995548","DOIUrl":"10.1177/0049124121995548","url":null,"abstract":"<p><p>This article presents two ways of quantifying confounding using logistic response models for binary outcomes. Drawing on the distinction between marginal and conditional odds ratios in statistics, we define two corresponding measures of confounding (marginal and conditional) that can be recovered from a simple standardization approach. We investigate when marginal and conditional confounding may differ, outline why the method by Karlson, Holm, and Breen recovers conditional confounding under a \"no interaction\"-assumption, and suggest that researchers may measure marginal confounding by using inverse probability weighting. We provide two empirical examples that illustrate our standardization approach.</p>","PeriodicalId":21849,"journal":{"name":"Sociological Methods & Research","volume":"1 1","pages":"1765-1784"},"PeriodicalIF":6.3,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1177/0049124121995548","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42412945","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Edoardo Costantini, Kyle M. Lang, Tim Reeskens, Klaas Sijtsma
{"title":"High-Dimensional Imputation for the Social Sciences: A Comparison of State-of-The-Art Methods","authors":"Edoardo Costantini, Kyle M. Lang, Tim Reeskens, Klaas Sijtsma","doi":"10.1177/00491241231200194","DOIUrl":"https://doi.org/10.1177/00491241231200194","url":null,"abstract":"Including a large number of predictors in the imputation model underlying a multiple imputation (MI) procedure is one of the most challenging tasks imputers face. A variety of high-dimensional MI techniques can help, but there has been limited research on their relative performance. In this study, we investigated a wide range of extant high-dimensional MI techniques that can handle a large number of predictors in the imputation models and general missing data patterns. We assessed the relative performance of seven high-dimensional MI methods with a Monte Carlo simulation study and a resampling study based on real survey data. The performance of the methods was defined by the degree to which they facilitate unbiased and confidence-valid estimates of the parameters of complete data analysis models. We found that using lasso penalty or forward selection to select the predictors used in the MI model and using principal component analysis to reduce the dimensionality of auxiliary data produce the best results.","PeriodicalId":21849,"journal":{"name":"Sociological Methods & Research","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135308604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Current and Future Debates in Video Data Analysis","authors":"Nicolas M. Legewie, Anne Nassauer","doi":"10.1177/00491241231178275","DOIUrl":"https://doi.org/10.1177/00491241231178275","url":null,"abstract":"Video-based social science research is thriving. Across disciplines and topic areas, researchers use twenty-first century video data to gain novel insights into how social processes and events unfold on the ground. In recent years, “video data analysis” (VDA) has emerged as a methodological framework to facilitate this type of video-based research. The special issue “The Present and Future of Video-based Social Science Research: Innovations in Video Data Analysis” presents methodological innovations that speak to some of the most pressing debates around VDA. Contributions showcase the range of disciplines and research fields VDA is used in, from social interactions and collective behavior to neighborhoods, policing, and public health. This introductory article outlines two areas of growth in VDA methodology that the articles of this special issue speak to: taking advantage of scale and detail in VDA, and situating VDA in the canon of research methods.","PeriodicalId":21849,"journal":{"name":"Sociological Methods & Research","volume":"52 1","pages":"1107 - 1119"},"PeriodicalIF":6.3,"publicationDate":"2023-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42172410","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Graphical Causal Models for Survey Inference","authors":"Julian Schuessler, Peter Selb","doi":"10.1177/00491241231176851","DOIUrl":"https://doi.org/10.1177/00491241231176851","url":null,"abstract":"Directed acyclic graphs (DAGs) are now a popular tool to inform causal inferences. We discuss how DAGs can also be used to encode theoretical assumptions about nonprobability samples and survey nonresponse and to determine whether population quantities including conditional distributions and regressions can be identified. We describe sources of bias and assumptions for eliminating it in various selection scenarios. We then introduce and analyze graphical representations of multiple selection stages in the data collection process, and highlight the strong assumptions implicit in using only design weights. Furthermore, we show that the common practice of selecting adjustment variables based on correlations with sample selection and outcome variables of interest is ill-justified and that nonresponse weighting when the interest is in causal inference may come at severe costs. Finally, we identify further areas for survey methodology research that can benefit from advances in causal graph theory.","PeriodicalId":21849,"journal":{"name":"Sociological Methods & Research","volume":"294 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135479071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Linear Probability Model Revisited: Why It Works and How It Should Be Specified","authors":"Myoung-jae Lee, Goeun Lee, Jin-young Choi","doi":"10.1177/00491241231176850","DOIUrl":"https://doi.org/10.1177/00491241231176850","url":null,"abstract":"A linear model is often used to find the effect of a binary treatment [Formula: see text] on a noncontinuous outcome [Formula: see text] with covariates [Formula: see text]. Particularly, a binary [Formula: see text] gives the popular “linear probability model (LPM),” but the linear model is untenable if [Formula: see text] contains a continuous regressor. This raises the question: what kind of treatment effect does the ordinary least squares estimator (OLS) to LPM estimate? This article shows that the OLS estimates a weighted average of the [Formula: see text]-conditional heterogeneous effect plus a bias. Under the condition that [Formula: see text] is equal to the linear projection of [Formula: see text] on [Formula: see text], the bias becomes zero, and the OLS estimates the “overlap-weighted average” of the [Formula: see text]-conditional effect. Although the condition does not hold in general, specifying the [Formula: see text]-part of the LPM such that the [Formula: see text]-part predicts [Formula: see text] well, not [Formula: see text], minimizes the bias counter-intuitively. This article also shows how to estimate the overlap-weighted average without the condition by using the “propensity-score residual” [Formula: see text]. An empirical analysis demonstrates our points.","PeriodicalId":21849,"journal":{"name":"Sociological Methods & Research","volume":"203 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135791840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jackelyn Hwang, Nima Dahir, Mayuka Sarukkai, Gabby Wright
{"title":"Curating Training Data for Reliable Large-Scale Visual Data Analysis: Lessons from Identifying Trash in Street View Imagery","authors":"Jackelyn Hwang, Nima Dahir, Mayuka Sarukkai, Gabby Wright","doi":"10.1177/00491241231171945","DOIUrl":"https://doi.org/10.1177/00491241231171945","url":null,"abstract":"Visual data have dramatically increased in quantity in the digital age, presenting new opportunities for social science research. However, the extensive time and labor costs to process and analyze these data with existing approaches limit their use. Computer vision methods hold promise but often require large and nonexistent training data to identify sociologically relevant variables. We present a cost-efficient method for curating training data that utilizes simple tasks and pairwise comparisons to interpret and analyze visual data at scale using computer vision. We apply our approach to the detection of trash levels across space and over time in millions of street-level images in three physically distinct US cities. By comparing to ratings produced in a controlled setting and utilizing computational methods, we demonstrate generally high reliability in the method and identify sources that limit it. Altogether, this approach expands how visual data can be used at a large scale in sociology.","PeriodicalId":21849,"journal":{"name":"Sociological Methods & Research","volume":"52 1","pages":"1155 - 1200"},"PeriodicalIF":6.3,"publicationDate":"2023-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41859082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Method for Estimating Individual Socioeconomic Status of Twitter Users","authors":"Yuanmo He, Milena Tsvetkova","doi":"10.1177/00491241231168665","DOIUrl":"https://doi.org/10.1177/00491241231168665","url":null,"abstract":"The rise of social media has opened countless opportunities to explore social science questions with new data and methods. However, research on socioeconomic inequality remains constrained by limit...","PeriodicalId":21849,"journal":{"name":"Sociological Methods & Research","volume":"51 27","pages":""},"PeriodicalIF":6.3,"publicationDate":"2023-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50167294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}