A comparison of top-coding strategies for aggregated relational data

IF 2.4 2区社会学 Q1 ANTHROPOLOGY

Social Networks Pub Date : 2025-06-12 DOI:10.1016/j.socnet.2025.05.006

Jody Clay-Warner , Hui Yi , Tenshi Kawashima , Jiacheng Li , David Okech , Fred Hassan Konteh

{"title":"A comparison of top-coding strategies for aggregated relational data","authors":"Jody Clay-Warner , Hui Yi , Tenshi Kawashima , Jiacheng Li , David Okech , Fred Hassan Konteh","doi":"10.1016/j.socnet.2025.05.006","DOIUrl":null,"url":null,"abstract":"<div><div>Aggregated relational data are commonly used in conjunction with scale-up methods to measure network size. In this approach, the number of people respondents report knowing in subpopulations of known size are scaled-up to estimate the size of their personal network. Because this method is sensitive to reporting errors, researchers often top-code responses about subpopulations of known size, although there is no consensus on how to select the top-code value. Here, we compare several top-coding methods, including new approaches that utilize Dunbar’s number, using datasets collected from two aggregated relational data surveys, one from Shanghai and one from Kambia, Sierra Leone. We employ three metrics to evaluate the top-coding strategies: mean error rates in the estimation of the subpopulations of known size, error rate in estimation of the target population, and degree mean. We find that the top-coding strategies all perform equally well in the estimation of the subpopulations of known size in both datasets. The strategies based on Dunbar’s number, however, performed better than the other strategies in the estimation of the target population in Kambia. In addition, the Dunbar’s number approaches produced substantially smaller degree means in both datasets. We examine these findings wholistically and provide suggestions for how researchers should approach top-coding decisions. We ultimately conclude that there is not a one-size-fits-all solution for top-coding and that researchers should systematically examine key indicators from the data to determine if top-coding is necessary and, if so, what top-coding strategy is appropriate.</div></div>","PeriodicalId":48353,"journal":{"name":"Social Networks","volume":"83 ","pages":"Pages 50-61"},"PeriodicalIF":2.4000,"publicationDate":"2025-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Social Networks","FirstCategoryId":"90","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0378873325000371","RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ANTHROPOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Aggregated relational data are commonly used in conjunction with scale-up methods to measure network size. In this approach, the number of people respondents report knowing in subpopulations of known size are scaled-up to estimate the size of their personal network. Because this method is sensitive to reporting errors, researchers often top-code responses about subpopulations of known size, although there is no consensus on how to select the top-code value. Here, we compare several top-coding methods, including new approaches that utilize Dunbar’s number, using datasets collected from two aggregated relational data surveys, one from Shanghai and one from Kambia, Sierra Leone. We employ three metrics to evaluate the top-coding strategies: mean error rates in the estimation of the subpopulations of known size, error rate in estimation of the target population, and degree mean. We find that the top-coding strategies all perform equally well in the estimation of the subpopulations of known size in both datasets. The strategies based on Dunbar’s number, however, performed better than the other strategies in the estimation of the target population in Kambia. In addition, the Dunbar’s number approaches produced substantially smaller degree means in both datasets. We examine these findings wholistically and provide suggestions for how researchers should approach top-coding decisions. We ultimately conclude that there is not a one-size-fits-all solution for top-coding and that researchers should systematically examine key indicators from the data to determine if top-coding is necessary and, if so, what top-coding strategy is appropriate.

查看原文本刊更多论文

聚合关系数据的顶编码策略比较

聚合的关系数据通常与扩展方法结合使用来测量网络大小。在这种方法中，受访者报告在已知规模的亚群体中认识的人数按比例增加，以估计其个人网络的规模。由于这种方法对报告错误很敏感，研究人员经常对已知大小的子种群进行顶码响应，尽管在如何选择顶码值方面没有达成共识。在这里，我们比较了几种顶部编码方法，包括利用邓巴数的新方法，使用了从两个汇总关系数据调查收集的数据集，一个来自上海，一个来自塞拉利昂的坎比亚。我们采用三个指标来评估顶编码策略：估计已知大小的子种群的平均错误率，估计目标种群的错误率和程度均值。我们发现，在两个数据集中，顶编码策略在估计已知大小的子种群方面都表现得同样好。然而，基于邓巴数字的策略在估计坎比亚目标人口方面比其他策略表现更好。此外，邓巴数方法在两个数据集中产生了更小的程度均值。我们从整体上考察了这些发现，并为研究人员如何处理顶编码决策提供了建议。我们最终得出的结论是，顶编码没有一个放之万用的解决方案，研究人员应该系统地检查数据中的关键指标，以确定顶编码是否必要，如果有的话，什么顶编码策略是合适的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Social Networks Multiple-

CiteScore

5.90

自引率

12.90%

发文量

118

期刊介绍： Social Networks is an interdisciplinary and international quarterly. It provides a common forum for representatives of anthropology, sociology, history, social psychology, political science, human geography, biology, economics, communications science and other disciplines who share an interest in the study of the empirical structure of social relations and associations that may be expressed in network form. It publishes both theoretical and substantive papers. Critical reviews of major theoretical or methodological approaches using the notion of networks in the analysis of social behaviour are also included, as are reviews of recent books dealing with social networks and social structure.