{"title":"基于仿真的网络分析中缺失数据处理性能评估。","authors":"Kai Jannik Nehler, Martin Schultze","doi":"10.1080/00273171.2023.2283638","DOIUrl":null,"url":null,"abstract":"<p><p>Network analysis has gained popularity as an approach to investigate psychological constructs. However, there are currently no guidelines for applied researchers when encountering missing values. In this simulation study, we compared the performance of a two-step EM algorithm with separated steps for missing handling and regularization, a combined direct EM algorithm, and pairwise deletion. We investigated conditions with varying network sizes, numbers of observations, missing data mechanisms, and percentages of missing values. These approaches are evaluated with regard to recovering population networks in terms of loss in the precision matrix, edge set identification and network statistics. The simulation showed adequate performance only in conditions with large samples (<math><mrow><mi>n</mi><mo>≥</mo><mn>500</mn></mrow></math>) or small networks (<i>p</i> = 10). Comparing the missing data approaches, the direct EM appears to be more sensitive and superior in nearly all chosen conditions. The two-step EM yields better results when the ratio of n/p is very large - being less sensitive but more specific. Pairwise deletion failed to converge across numerous conditions and yielded inferior results overall. Overall, direct EM is recommended in most cases, as it is able to mitigate the impact of missing data quite well, while modifications to two-step EM could improve its performance.</p>","PeriodicalId":53155,"journal":{"name":"Multivariate Behavioral Research","volume":" ","pages":"461-481"},"PeriodicalIF":5.3000,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Simulation-Based Performance Evaluation of Missing Data Handling in Network Analysis.\",\"authors\":\"Kai Jannik Nehler, Martin Schultze\",\"doi\":\"10.1080/00273171.2023.2283638\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Network analysis has gained popularity as an approach to investigate psychological constructs. However, there are currently no guidelines for applied researchers when encountering missing values. In this simulation study, we compared the performance of a two-step EM algorithm with separated steps for missing handling and regularization, a combined direct EM algorithm, and pairwise deletion. We investigated conditions with varying network sizes, numbers of observations, missing data mechanisms, and percentages of missing values. These approaches are evaluated with regard to recovering population networks in terms of loss in the precision matrix, edge set identification and network statistics. The simulation showed adequate performance only in conditions with large samples (<math><mrow><mi>n</mi><mo>≥</mo><mn>500</mn></mrow></math>) or small networks (<i>p</i> = 10). Comparing the missing data approaches, the direct EM appears to be more sensitive and superior in nearly all chosen conditions. The two-step EM yields better results when the ratio of n/p is very large - being less sensitive but more specific. Pairwise deletion failed to converge across numerous conditions and yielded inferior results overall. Overall, direct EM is recommended in most cases, as it is able to mitigate the impact of missing data quite well, while modifications to two-step EM could improve its performance.</p>\",\"PeriodicalId\":53155,\"journal\":{\"name\":\"Multivariate Behavioral Research\",\"volume\":\" \",\"pages\":\"461-481\"},\"PeriodicalIF\":5.3000,\"publicationDate\":\"2024-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Multivariate Behavioral Research\",\"FirstCategoryId\":\"102\",\"ListUrlMain\":\"https://doi.org/10.1080/00273171.2023.2283638\",\"RegionNum\":3,\"RegionCategory\":\"心理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/1/21 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"MATHEMATICS, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Multivariate Behavioral Research","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.1080/00273171.2023.2283638","RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/21 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"MATHEMATICS, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
摘要
网络分析作为一种研究心理结构的方法,已经广受欢迎。然而,目前还没有针对应用研究人员在遇到缺失值时的指导原则。在这项模拟研究中,我们比较了分两步处理缺失和正则化的 EM 算法、组合式直接 EM 算法和成对删除算法的性能。我们研究了不同网络规模、观测数据数量、缺失数据机制和缺失值百分比的条件。我们从精确度矩阵损失、边缘集识别和网络统计等方面对这些方法恢复群体网络的效果进行了评估。模拟结果表明,只有在样本较大(n≥500)或网络较小(p = 10)的情况下,才有足够的性能。比较缺失数据方法,在几乎所有选择条件下,直接 EM 似乎更灵敏、更优越。当 n/p 的比率非常大时,两步电磁法会产生更好的结果--灵敏度较低,但特异性更高。成对删除法在许多条件下都无法收敛,总体结果较差。总的来说,在大多数情况下,建议使用直接 EM,因为它能够很好地减轻缺失数据的影响,而对两步 EM 的修改则可以提高其性能。
Simulation-Based Performance Evaluation of Missing Data Handling in Network Analysis.
Network analysis has gained popularity as an approach to investigate psychological constructs. However, there are currently no guidelines for applied researchers when encountering missing values. In this simulation study, we compared the performance of a two-step EM algorithm with separated steps for missing handling and regularization, a combined direct EM algorithm, and pairwise deletion. We investigated conditions with varying network sizes, numbers of observations, missing data mechanisms, and percentages of missing values. These approaches are evaluated with regard to recovering population networks in terms of loss in the precision matrix, edge set identification and network statistics. The simulation showed adequate performance only in conditions with large samples () or small networks (p = 10). Comparing the missing data approaches, the direct EM appears to be more sensitive and superior in nearly all chosen conditions. The two-step EM yields better results when the ratio of n/p is very large - being less sensitive but more specific. Pairwise deletion failed to converge across numerous conditions and yielded inferior results overall. Overall, direct EM is recommended in most cases, as it is able to mitigate the impact of missing data quite well, while modifications to two-step EM could improve its performance.
期刊介绍:
Multivariate Behavioral Research (MBR) publishes a variety of substantive, methodological, and theoretical articles in all areas of the social and behavioral sciences. Most MBR articles fall into one of two categories. Substantive articles report on applications of sophisticated multivariate research methods to study topics of substantive interest in personality, health, intelligence, industrial/organizational, and other behavioral science areas. Methodological articles present and/or evaluate new developments in multivariate methods, or address methodological issues in current research. We also encourage submission of integrative articles related to pedagogy involving multivariate research methods, and to historical treatments of interest and relevance to multivariate research methods.