Exploring examinees' responses to constructed response items with a supervised topic model

IF 1.8 3区心理学 Q3 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

British Journal of Mathematical & Statistical Psychology Pub Date : 2023-09-13 DOI:10.1111/bmsp.12319

Seohyun Kim, Zhenqiu Lu, Allan S. Cohen

{"title":"Exploring examinees' responses to constructed response items with a supervised topic model","authors":"Seohyun Kim, Zhenqiu Lu, Allan S. Cohen","doi":"10.1111/bmsp.12319","DOIUrl":null,"url":null,"abstract":"<p>Textual data are increasingly common in test data as many assessments include constructed response (CR) items as indicators of participants' understanding. The development of techniques based on natural language processing has made it possible for researchers to rapidly analyse large sets of textual data. One family of statistical techniques for this purpose are probabilistic topic models. Topic modelling is a technique for detecting the latent topic structure in a collection of documents and has been widely used to analyse texts in a variety of areas. The detected topics can reveal primary themes in the documents, and the relative use of topics can be useful in investigating the variability of the documents. Supervised latent Dirichlet allocation (SLDA) is a popular topic model in that family that jointly models textual data and paired responses such as could occur with participants' textual answers to CR items and their rubric-based scores. SLDA has an assumption of a homogeneous relationship between textual data and paired responses across all documents. This approach, while useful for some purposes, may not be satisfied for situations in which a population has subgroups that have different relationships. In this study, we introduce a new supervised topic model that incorporates finite-mixture modelling into the SLDA. This new model can detect latent groups of participants that have different relationships between their textual responses and associated scores. The model is illustrated with an example from an analysis of a set of textual responses and paired scores from a middle grades assessment of science inquiry knowledge. A simulation study is presented to investigate the performance of the proposed model under practical testing conditions.</p>","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":"77 1","pages":"130-150"},"PeriodicalIF":1.8000,"publicationDate":"2023-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/bmsp.12319","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"British Journal of Mathematical & Statistical Psychology","FirstCategoryId":"102","ListUrlMain":"https://bpspsychub.onlinelibrary.wiley.com/doi/10.1111/bmsp.12319","RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MATHEMATICS, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

Textual data are increasingly common in test data as many assessments include constructed response (CR) items as indicators of participants' understanding. The development of techniques based on natural language processing has made it possible for researchers to rapidly analyse large sets of textual data. One family of statistical techniques for this purpose are probabilistic topic models. Topic modelling is a technique for detecting the latent topic structure in a collection of documents and has been widely used to analyse texts in a variety of areas. The detected topics can reveal primary themes in the documents, and the relative use of topics can be useful in investigating the variability of the documents. Supervised latent Dirichlet allocation (SLDA) is a popular topic model in that family that jointly models textual data and paired responses such as could occur with participants' textual answers to CR items and their rubric-based scores. SLDA has an assumption of a homogeneous relationship between textual data and paired responses across all documents. This approach, while useful for some purposes, may not be satisfied for situations in which a population has subgroups that have different relationships. In this study, we introduce a new supervised topic model that incorporates finite-mixture modelling into the SLDA. This new model can detect latent groups of participants that have different relationships between their textual responses and associated scores. The model is illustrated with an example from an analysis of a set of textual responses and paired scores from a middle grades assessment of science inquiry knowledge. A simulation study is presented to investigate the performance of the proposed model under practical testing conditions.

Abstract Image

查看原文本刊更多论文

用监督话题模型探索考生对构建的回答项目的反应。

文本数据在测试数据中越来越普遍，因为许多评估包括构建反应(CR)项目作为参与者理解的指标。基于自然语言处理技术的发展使研究人员能够快速分析大量文本数据。用于此目的的一类统计技术是概率主题模型。主题建模是一种检测文档集合中潜在主题结构的技术，已被广泛用于分析各个领域的文本。检测到的主题可以揭示文档中的主要主题，并且主题的相对使用可以用于调查文档的可变性。监督潜狄利克雷分配(SLDA)是该家族中流行的主题模型，它联合建模文本数据和配对反应，例如参与者对CR项目的文本答案及其基于规则的分数。SLDA假设所有文档中的文本数据和成对响应之间存在同构关系。这种方法虽然对某些目的有用，但可能不适用于总体中具有不同关系的子组的情况。在本研究中，我们引入了一种新的监督主题模型，该模型将有限混合模型引入到SLDA中。这个新模型可以检测潜在的参与者群体，他们的文本回复和相关分数之间有不同的关系。该模型是由一个例子，从一组文本回应的分析和配对分数从科学探究知识的中级评估。通过仿真研究，验证了该模型在实际测试条件下的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

British Journal of Mathematical & Statistical Psychology 医学-数学跨学科应用

CiteScore

5.00

自引率

3.80%

发文量

审稿时长

>12 weeks

期刊介绍： The British Journal of Mathematical and Statistical Psychology publishes articles relating to areas of psychology which have a greater mathematical or statistical aspect of their argument than is usually acceptable to other journals including: • mathematical psychology • statistics • psychometrics • decision making • psychophysics • classification • relevant areas of mathematics, computing and computer software These include articles that address substantitive psychological issues or that develop and extend techniques useful to psychologists. New models for psychological processes, new approaches to existing data, critiques of existing models and improved algorithms for estimating the parameters of a model are examples of articles which may be favoured.