{"title":"Birds of a Feather Flock Together and Opposites Attract: The Nonlinear Relationship Between Personality and Friendship","authors":"Haiyan Liu, Z. Zhang","doi":"10.35566/JBDS/V1N1/P3","DOIUrl":"https://doi.org/10.35566/JBDS/V1N1/P3","url":null,"abstract":"Whether birds of a feather flock together or opposites attract is a classical research question in social and personality psychology. In most existing studies, correlation-based techniques are commonly used to study the similarity/dissimilarity among social entities. Social network data comprises two primary components: actors and the possible social relations between them. It, therefore, has observations on both the dyads with and without social relations. Because of the availability of the baseline group (dyads without social relations), it is possible to contrast the two groups of dyads using social network analysis techniques. This study aims to illustrate how to use social network analysis techniques to address psychological research questions. Specifically, we will investigate how the similarity or dissimilarity of actor's characteristics relates to the likelihood for them to build social relations. By analyzing a college friendship network, we found the quadratic relations between personality similarity and friendship. Both very similar and very dissimilar personalities boost friendship among college students.","PeriodicalId":93575,"journal":{"name":"Journal of behavioral data science","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46263839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Tree-based Matching on Structural Equation Model Parameters","authors":"Sarfaraz Serang, James Sears","doi":"10.35566/jbds/v1n2/p3","DOIUrl":"https://doi.org/10.35566/jbds/v1n2/p3","url":null,"abstract":"Understanding causal effects of a treatment is often of interest in the social sciences. When treatments cannot be randomly assigned, researchers must ensure that treated and untreated participants are balanced on covariates before estimating treatment effects. Conventional practices are useful in matching such that treated and untreated participants have similar average values on their covariates. However, situations arise in which a researcher may instead want to match on model parameters. We propose an algorithm, Causal Mplus Trees, which uses decision trees to match on structural equation model parameters and estimates conditional average treatment effects in each node. We provide a proof of concept using two small simulation studies and demonstrate its application using COVID-19 data.","PeriodicalId":93575,"journal":{"name":"Journal of behavioral data science","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"69890057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Factor or Network Model? Predictions From Neural Networks","authors":"Alexander P. Christensen, Hudson F Golino","doi":"10.31234/osf.io/awkcb","DOIUrl":"https://doi.org/10.31234/osf.io/awkcb","url":null,"abstract":"The nature of associations between variables is important for constructing theory about psychological phenomena. In the last decade, this topic has received renewed interest with the introduction of psychometric network models. In psychology, network models are often contrasted with latent variable (e.g., factor) models. Recent research has shown that differences between the two tend to be more substantive than statistical. One recently developed algorithm called the Loadings Comparison Test (LCT) was developed to predict whether data were generated from a factor or small-world network model. A significant limitation of the current LCT implementation is that it's based on heuristics that were derived from descriptive statistics. In the present study, we used artificial neural networks to replace these heuristics and develop a more robust and generalizable algorithm. We performed a Monte Carlo simulation study that compared neural networks to the original LCT algorithm as well as logistic regression models that were trained on the same data. We found that the neural networks performed as well as or better than both methods for predicting whether data were generated from a factor, small-world network, or random network model. Although the neural networks were trained on small-world networks, we show that they can reliably predict the data-generating model of random networks, demonstrating generalizability beyond the trained data. We echo the call for more formal theories about the relations between variables and discuss the role of the LCT in this process.","PeriodicalId":93575,"journal":{"name":"Journal of behavioral data science","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49536807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Lasso and Group Lasso with Categorical Predictors: Impact of Coding Strategy on Variable Selection and Prediction","authors":"Y. Huang, A. Montoya","doi":"10.31234/osf.io/wc45u","DOIUrl":"https://doi.org/10.31234/osf.io/wc45u","url":null,"abstract":"Machine learning methods are being increasingly adopted in psychological research. Lasso performs variable selection and regularization, and is particularly appealing to psychology researchers because of its connection to linear regression. Researchers conflate properties of linear regression with properties of lasso; however, we demonstrate that this is not the case for models with categorical predictors. Specifically, the coding strategy used for categorical predictors impacts lasso’s performance but not linear regression. Group lasso is an alternative to lasso for models with categorical predictors. We demonstrate the inconsistency of lasso and group lasso models using a real data set: lasso performs different variable selection and has different prediction accuracy depending on the coding strategy, and group lasso performs consistent variable selection but has different prediction accuracy. Additionally, group lasso may include many predictors when very few are needed, leading to overfitting. Using Monte Carlo simulation, we show that categorical variables with one group mean differing from all others (one dominant group) are more likely to be included in the model by group lasso than lasso, leading to overfitting. This effect is strongest when the mean difference is large and there are many categories. Researchers primarily focus on the similarity between linear regression and lasso, but pay little attention to their different properties. This project demonstrates that when using lasso and group lasso, the effect of coding strategies should be considered. We conclude with recommended solutions to this issue and future directions of exploration to improve implementation of machine learning approaches in psychological science.","PeriodicalId":93575,"journal":{"name":"Journal of behavioral data science","volume":"58 32","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141206623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jin Liu, Le Kang, R. Sabo, R. Kirkpatrick, R. Perera
{"title":"Two-step growth mixture model to examine heterogeneity in nonlinear trajectories","authors":"Jin Liu, Le Kang, R. Sabo, R. Kirkpatrick, R. Perera","doi":"10.35566/jbds/v1n2/p4","DOIUrl":"https://doi.org/10.35566/jbds/v1n2/p4","url":null,"abstract":"Empirical researchers are usually interested in investigating the impacts that baseline covariates have when uncovering sample heterogeneity and separating samples into more homogeneous groups. However, a considerable number of studies in the structural equation modeling (SEM) framework usually start with vague hypotheses in terms of heterogeneity and possible causes. It suggests that (1) the determination and specification of a proper model with covariates is not straightforward, and (2) the exploration process may be computationally intensive given that a model in the SEM framework is usually complicated and the pool of candidate covariates is usually huge in the psychological and educational domain where the SEM framework is widely employed. Following Bakk and Kuha (2017), this article presents a two-step growth mixture model (GMM) that examines the relationship between latent classes of nonlinear trajectories and baseline characteristics. Our simulation studies demonstrate that the proposed model is capable of clustering the nonlinear change patterns, and estimating the parameters of interest unbiasedly, precisely, as well as exhibiting appropriate confidence interval coverage. Considering the pool of candidate covariates is usually huge and highly correlated, this study also proposes implementing exploratory factor analysis (EFA) to reduce the dimension of covariate space. We illustrate how to use the hybrid method, the two-step GMM and EFA, to efficiently explore the heterogeneity of nonlinear trajectories of longitudinal mathematics achievement data.","PeriodicalId":93575,"journal":{"name":"Journal of behavioral data science","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42693903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}