W Holmes Finch, Maria Dolores Hidalgo Montesinos, Brian F French, Maria Hernandez Finch
{"title":"差异项目功能效应大小用于有效性信息。","authors":"W Holmes Finch, Maria Dolores Hidalgo Montesinos, Brian F French, Maria Hernandez Finch","doi":"10.1177/00131644241293694","DOIUrl":null,"url":null,"abstract":"<p><p>There has been an emphasis on effect sizes for differential item functioning (DIF) with the purpose to understand the magnitude of the differences that are detected through statistical significance testing. Several different effect sizes have been suggested that correspond to the method used for analysis, as have different guidelines for interpretation. The purpose of this simulation study was to compare the performance of the DIF effect size measures described for quantifying and comparing the amount of DIF in two assessments. Several factors were manipulated that were thought to influence the effect sizes or are known to influence DIF detection. This study asked the following two questions. First, do the effect sizes accurately capture aggregate DIF across items? Second, do effect sizes accurately identify which assessment has the least amount of DIF? We highlight effect sizes that had support for performing well across several simulated conditions. We also apply these effect sizes to a real data set to provide an example. Results of the study revealed that the log odds ratio of fixed effects (Ln <math> <mrow> <msub> <mrow> <mover><mrow><mi>OR</mi></mrow> <mo>¯</mo></mover> </mrow> <mrow><mi>FE</mi></mrow> </msub> </mrow> </math> ) and the variance of the Mantel-Haenszel log odds ratio ( <math> <mrow> <msup> <mrow> <mover><mrow><mi>τ</mi></mrow> <mo>^</mo></mover> </mrow> <mrow><mn>2</mn></mrow> </msup> </mrow> </math> ) were most accurate for identifying which test contains more DIF. We point to future directions with this work to aid the continued focus on effect sizes to understand DIF magnitude.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"00131644241293694"},"PeriodicalIF":2.1000,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11583394/pdf/","citationCount":"0","resultStr":"{\"title\":\"Differential Item Functioning Effect Size Use for Validity Information.\",\"authors\":\"W Holmes Finch, Maria Dolores Hidalgo Montesinos, Brian F French, Maria Hernandez Finch\",\"doi\":\"10.1177/00131644241293694\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>There has been an emphasis on effect sizes for differential item functioning (DIF) with the purpose to understand the magnitude of the differences that are detected through statistical significance testing. Several different effect sizes have been suggested that correspond to the method used for analysis, as have different guidelines for interpretation. The purpose of this simulation study was to compare the performance of the DIF effect size measures described for quantifying and comparing the amount of DIF in two assessments. Several factors were manipulated that were thought to influence the effect sizes or are known to influence DIF detection. This study asked the following two questions. First, do the effect sizes accurately capture aggregate DIF across items? Second, do effect sizes accurately identify which assessment has the least amount of DIF? We highlight effect sizes that had support for performing well across several simulated conditions. We also apply these effect sizes to a real data set to provide an example. Results of the study revealed that the log odds ratio of fixed effects (Ln <math> <mrow> <msub> <mrow> <mover><mrow><mi>OR</mi></mrow> <mo>¯</mo></mover> </mrow> <mrow><mi>FE</mi></mrow> </msub> </mrow> </math> ) and the variance of the Mantel-Haenszel log odds ratio ( <math> <mrow> <msup> <mrow> <mover><mrow><mi>τ</mi></mrow> <mo>^</mo></mover> </mrow> <mrow><mn>2</mn></mrow> </msup> </mrow> </math> ) were most accurate for identifying which test contains more DIF. We point to future directions with this work to aid the continued focus on effect sizes to understand DIF magnitude.</p>\",\"PeriodicalId\":11502,\"journal\":{\"name\":\"Educational and Psychological Measurement\",\"volume\":\" \",\"pages\":\"00131644241293694\"},\"PeriodicalIF\":2.1000,\"publicationDate\":\"2024-11-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11583394/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Educational and Psychological Measurement\",\"FirstCategoryId\":\"102\",\"ListUrlMain\":\"https://doi.org/10.1177/00131644241293694\",\"RegionNum\":3,\"RegionCategory\":\"心理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"MATHEMATICS, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Educational and Psychological Measurement","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.1177/00131644241293694","RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MATHEMATICS, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
Differential Item Functioning Effect Size Use for Validity Information.
There has been an emphasis on effect sizes for differential item functioning (DIF) with the purpose to understand the magnitude of the differences that are detected through statistical significance testing. Several different effect sizes have been suggested that correspond to the method used for analysis, as have different guidelines for interpretation. The purpose of this simulation study was to compare the performance of the DIF effect size measures described for quantifying and comparing the amount of DIF in two assessments. Several factors were manipulated that were thought to influence the effect sizes or are known to influence DIF detection. This study asked the following two questions. First, do the effect sizes accurately capture aggregate DIF across items? Second, do effect sizes accurately identify which assessment has the least amount of DIF? We highlight effect sizes that had support for performing well across several simulated conditions. We also apply these effect sizes to a real data set to provide an example. Results of the study revealed that the log odds ratio of fixed effects (Ln ) and the variance of the Mantel-Haenszel log odds ratio ( ) were most accurate for identifying which test contains more DIF. We point to future directions with this work to aid the continued focus on effect sizes to understand DIF magnitude.
期刊介绍:
Educational and Psychological Measurement (EPM) publishes referred scholarly work from all academic disciplines interested in the study of measurement theory, problems, and issues. Theoretical articles address new developments and techniques, and applied articles deal with innovation applications.