{"title":"试验品综合质量评价方法。第2部分","authors":"V. Kukharenko, L. Perkhun, N. M. Tovmachenko","doi":"10.31767/SU.4(83)2018.04.09","DOIUrl":null,"url":null,"abstract":"In the article, the description of the complex evaluation method is given, as well as the classical method of Data Mining and Item Response Theory (IRT). In the general method there are six steps. This article describes steps 4-6. \nThe fourth step of the method is to evaluate the reliability of the test. A universal two-step procedure is proposed – the assessment of the reliability of individual test tasks based on the coefficient of internal coherence of Kjuder – Richardson and the evaluation of the reliability of the test as a whole by the coefficient of generalization. The first of the coefficients is considered acceptable at the level of 0.7 and above, the second – at the level of 0.8 and above. Two-factor ANOVA variance analysis without repeated measurements in SPSS was used to calculate the second coefficient. \nAt the fifth stage of the methodology, the quality of students' differentiation is assessed by a test that is being studied. The tool for this is selected hierarchical cluster procedures, classification trees and classification discriminant functions. The calculations were performed by means of Statistica and SPSS. Three clusters of students with high, medium and low academic performance were identified. It is shown that the test under study allows the differentiation of students. \nAt the last, sixth stage, a study of the quality of the test is described based on the one-parameter model of Rash. The levels of the difficulty of the test assignment and the mastering of the student's study material are measured in logics. The analytical task of the characteristic individual curve of the test assignment and the characteristic individual curve of the student, as well as the auxiliary formulas for their calculations, are given. The description is illustrated by a specific example. It is noted that the characteristic curves of students based on the Rash model by means of MathCAD, can clearly divide the latter into two groups – strong (have positive logic) and weak (have negative logic). Recommendations on the interpretation of the obtained results for certain test tasks are formulated. In particular, in case of overlap of the characteristic curves of various test tasks, they must be deleted (normative-oriented test) or reconstructed (criterion-oriented test). This paper does not consider how to determine which test question is to be deleted or corrected, but it is indicated that this can be established with the help of a two-parameter Birnbaum model. If the density of the characteristic curves of the test tasks is not the same; It is recommended to add a test task (in the case of a normative-oriented test) or thus change the duplicate test questions (in the case of a normative-oriented test) to fill the gaps of the abscissa, where there are no characteristic curves. \nBy the practical implementation of this technique, the authors determine the development of a separate plug-in that is compatible with the Moodle distance learning platform. \nThe prospect of further research in the theoretical framework is determined by the authors of the study of the boundaries of the use of two-parameter and three-parameter models of Birnbaum to improve the process and test results of students in distance learning systems.","PeriodicalId":52812,"journal":{"name":"Statistika Ukrayini","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2018-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"The Method for Comprehensive Quality Evaluation of Tests. Part 2\",\"authors\":\"V. Kukharenko, L. Perkhun, N. M. Tovmachenko\",\"doi\":\"10.31767/SU.4(83)2018.04.09\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the article, the description of the complex evaluation method is given, as well as the classical method of Data Mining and Item Response Theory (IRT). In the general method there are six steps. This article describes steps 4-6. \\nThe fourth step of the method is to evaluate the reliability of the test. A universal two-step procedure is proposed – the assessment of the reliability of individual test tasks based on the coefficient of internal coherence of Kjuder – Richardson and the evaluation of the reliability of the test as a whole by the coefficient of generalization. The first of the coefficients is considered acceptable at the level of 0.7 and above, the second – at the level of 0.8 and above. Two-factor ANOVA variance analysis without repeated measurements in SPSS was used to calculate the second coefficient. \\nAt the fifth stage of the methodology, the quality of students' differentiation is assessed by a test that is being studied. The tool for this is selected hierarchical cluster procedures, classification trees and classification discriminant functions. The calculations were performed by means of Statistica and SPSS. Three clusters of students with high, medium and low academic performance were identified. It is shown that the test under study allows the differentiation of students. \\nAt the last, sixth stage, a study of the quality of the test is described based on the one-parameter model of Rash. The levels of the difficulty of the test assignment and the mastering of the student's study material are measured in logics. The analytical task of the characteristic individual curve of the test assignment and the characteristic individual curve of the student, as well as the auxiliary formulas for their calculations, are given. The description is illustrated by a specific example. It is noted that the characteristic curves of students based on the Rash model by means of MathCAD, can clearly divide the latter into two groups – strong (have positive logic) and weak (have negative logic). Recommendations on the interpretation of the obtained results for certain test tasks are formulated. In particular, in case of overlap of the characteristic curves of various test tasks, they must be deleted (normative-oriented test) or reconstructed (criterion-oriented test). This paper does not consider how to determine which test question is to be deleted or corrected, but it is indicated that this can be established with the help of a two-parameter Birnbaum model. If the density of the characteristic curves of the test tasks is not the same; It is recommended to add a test task (in the case of a normative-oriented test) or thus change the duplicate test questions (in the case of a normative-oriented test) to fill the gaps of the abscissa, where there are no characteristic curves. \\nBy the practical implementation of this technique, the authors determine the development of a separate plug-in that is compatible with the Moodle distance learning platform. \\nThe prospect of further research in the theoretical framework is determined by the authors of the study of the boundaries of the use of two-parameter and three-parameter models of Birnbaum to improve the process and test results of students in distance learning systems.\",\"PeriodicalId\":52812,\"journal\":{\"name\":\"Statistika Ukrayini\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-12-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Statistika Ukrayini\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.31767/SU.4(83)2018.04.09\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistika Ukrayini","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.31767/SU.4(83)2018.04.09","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
The Method for Comprehensive Quality Evaluation of Tests. Part 2
In the article, the description of the complex evaluation method is given, as well as the classical method of Data Mining and Item Response Theory (IRT). In the general method there are six steps. This article describes steps 4-6.
The fourth step of the method is to evaluate the reliability of the test. A universal two-step procedure is proposed – the assessment of the reliability of individual test tasks based on the coefficient of internal coherence of Kjuder – Richardson and the evaluation of the reliability of the test as a whole by the coefficient of generalization. The first of the coefficients is considered acceptable at the level of 0.7 and above, the second – at the level of 0.8 and above. Two-factor ANOVA variance analysis without repeated measurements in SPSS was used to calculate the second coefficient.
At the fifth stage of the methodology, the quality of students' differentiation is assessed by a test that is being studied. The tool for this is selected hierarchical cluster procedures, classification trees and classification discriminant functions. The calculations were performed by means of Statistica and SPSS. Three clusters of students with high, medium and low academic performance were identified. It is shown that the test under study allows the differentiation of students.
At the last, sixth stage, a study of the quality of the test is described based on the one-parameter model of Rash. The levels of the difficulty of the test assignment and the mastering of the student's study material are measured in logics. The analytical task of the characteristic individual curve of the test assignment and the characteristic individual curve of the student, as well as the auxiliary formulas for their calculations, are given. The description is illustrated by a specific example. It is noted that the characteristic curves of students based on the Rash model by means of MathCAD, can clearly divide the latter into two groups – strong (have positive logic) and weak (have negative logic). Recommendations on the interpretation of the obtained results for certain test tasks are formulated. In particular, in case of overlap of the characteristic curves of various test tasks, they must be deleted (normative-oriented test) or reconstructed (criterion-oriented test). This paper does not consider how to determine which test question is to be deleted or corrected, but it is indicated that this can be established with the help of a two-parameter Birnbaum model. If the density of the characteristic curves of the test tasks is not the same; It is recommended to add a test task (in the case of a normative-oriented test) or thus change the duplicate test questions (in the case of a normative-oriented test) to fill the gaps of the abscissa, where there are no characteristic curves.
By the practical implementation of this technique, the authors determine the development of a separate plug-in that is compatible with the Moodle distance learning platform.
The prospect of further research in the theoretical framework is determined by the authors of the study of the boundaries of the use of two-parameter and three-parameter models of Birnbaum to improve the process and test results of students in distance learning systems.