{"title":"Parameter estimation procedures for exponential-family random graph models on count-valued networks: A comparative simulation study","authors":"Peng Huang , Carter T. Butts","doi":"10.1016/j.socnet.2023.07.001","DOIUrl":null,"url":null,"abstract":"<div><p>The exponential-family random graph models (ERGMs) have emerged as an important framework for modeling social networks for a wide variety of relational types. ERGMs for valued networks are less well-developed than their unvalued counterparts, and pose particular computational challenges. Network data with edge values on the non-negative integers (count-valued networks) is an important such case, with examples ranging from the magnitude of migration and trade flows between places to the frequency of interactions and encounters between individuals. Here, we propose an efficient parallelizable subsampled maximum pseudo-likelihood estimation (MPLE) scheme for count-valued ERGMs, and compare its performance with existing Contrastive Divergence (CD) and Monte Carlo Maximum Likelihood Estimation (MCMLE) approaches via a simulation study based on migration flow networks in two U.S. states. Our results suggest that edge value variance is a key factor in method performance, while network size mainly influences their relative merits in computational time. For small-variance networks, all methods perform well in point estimations while CD greatly overestimates uncertainties, and MPLE underestimates them for dependence terms; all methods have fast estimation for small networks, but CD and subsampled multi-core MPLE provides speed advantages as network size increases. For large-variance networks, both MPLE and MCMLE offer high-quality estimates of coefficients and their uncertainty, but MPLE is significantly faster than MCMLE; MPLE is also a better seeding method for MCMLE than CD, as the latter makes MCMLE more prone to convergence failure. The study suggests that MCMLE and MPLE should be the default approach to estimate ERGMs for small-variance and large-variance valued networks, respectively. We also offer further suggestions regarding choice of computational method for valued ERGMs based on data structure, available computational resources and analytical goals.</p></div>","PeriodicalId":48353,"journal":{"name":"Social Networks","volume":"76 ","pages":"Pages 51-67"},"PeriodicalIF":2.9000,"publicationDate":"2023-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Social Networks","FirstCategoryId":"90","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0378873323000473","RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ANTHROPOLOGY","Score":null,"Total":0}
引用次数: 6
Abstract
The exponential-family random graph models (ERGMs) have emerged as an important framework for modeling social networks for a wide variety of relational types. ERGMs for valued networks are less well-developed than their unvalued counterparts, and pose particular computational challenges. Network data with edge values on the non-negative integers (count-valued networks) is an important such case, with examples ranging from the magnitude of migration and trade flows between places to the frequency of interactions and encounters between individuals. Here, we propose an efficient parallelizable subsampled maximum pseudo-likelihood estimation (MPLE) scheme for count-valued ERGMs, and compare its performance with existing Contrastive Divergence (CD) and Monte Carlo Maximum Likelihood Estimation (MCMLE) approaches via a simulation study based on migration flow networks in two U.S. states. Our results suggest that edge value variance is a key factor in method performance, while network size mainly influences their relative merits in computational time. For small-variance networks, all methods perform well in point estimations while CD greatly overestimates uncertainties, and MPLE underestimates them for dependence terms; all methods have fast estimation for small networks, but CD and subsampled multi-core MPLE provides speed advantages as network size increases. For large-variance networks, both MPLE and MCMLE offer high-quality estimates of coefficients and their uncertainty, but MPLE is significantly faster than MCMLE; MPLE is also a better seeding method for MCMLE than CD, as the latter makes MCMLE more prone to convergence failure. The study suggests that MCMLE and MPLE should be the default approach to estimate ERGMs for small-variance and large-variance valued networks, respectively. We also offer further suggestions regarding choice of computational method for valued ERGMs based on data structure, available computational resources and analytical goals.
期刊介绍:
Social Networks is an interdisciplinary and international quarterly. It provides a common forum for representatives of anthropology, sociology, history, social psychology, political science, human geography, biology, economics, communications science and other disciplines who share an interest in the study of the empirical structure of social relations and associations that may be expressed in network form. It publishes both theoretical and substantive papers. Critical reviews of major theoretical or methodological approaches using the notion of networks in the analysis of social behaviour are also included, as are reviews of recent books dealing with social networks and social structure.