Jaclyn Martin Kowal, Kenzie Hurley Bryant, Dan Segall, Tracy Kantrowitz
{"title":"利用生成式人工智能进行评估项目开发:比较人工智能生成和人类创作的项目","authors":"Jaclyn Martin Kowal, Kenzie Hurley Bryant, Dan Segall, Tracy Kantrowitz","doi":"10.1111/ijsa.70021","DOIUrl":null,"url":null,"abstract":"<p>The use of generative AI, specifically large language models (LLMs), in test development presents an innovative approach to efficiently creating technical, knowledge-based assessment items. This study evaluates the efficacy of AI-generated items compared to human-authored counterparts within the context of employee selection testing, focusing on data science knowledge areas. Through a paired comparison approach, subject matter experts (SMEs) were asked to evaluate items produced by both LLMs and human item writers. Findings revealed a significant preference for LLM-generated items, particularly in specific knowledge domains such as Statistical Foundations and Scientific Data Analysis. However, despite the promise of generative AI in accelerating item development, human review remains critical. Issues such as multiple correct answers or ineffective distractors in AI-generated items necessitate thorough SME review and revision to ensure quality and validity. The study highlights the potential of integrating AI with human expertise to enhance the efficiency of item generation while maintaining psychometric standards in high-stakes environments. The implications for psychometric practice and the necessity of domain-specific validation are discussed, offering a framework for future research and application of AI in test development.</p>","PeriodicalId":51465,"journal":{"name":"International Journal of Selection and Assessment","volume":"33 3","pages":""},"PeriodicalIF":2.4000,"publicationDate":"2025-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/ijsa.70021","citationCount":"0","resultStr":"{\"title\":\"Harnessing Generative AI for Assessment Item Development: Comparing AI-Generated and Human-Authored Items\",\"authors\":\"Jaclyn Martin Kowal, Kenzie Hurley Bryant, Dan Segall, Tracy Kantrowitz\",\"doi\":\"10.1111/ijsa.70021\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>The use of generative AI, specifically large language models (LLMs), in test development presents an innovative approach to efficiently creating technical, knowledge-based assessment items. This study evaluates the efficacy of AI-generated items compared to human-authored counterparts within the context of employee selection testing, focusing on data science knowledge areas. Through a paired comparison approach, subject matter experts (SMEs) were asked to evaluate items produced by both LLMs and human item writers. Findings revealed a significant preference for LLM-generated items, particularly in specific knowledge domains such as Statistical Foundations and Scientific Data Analysis. However, despite the promise of generative AI in accelerating item development, human review remains critical. Issues such as multiple correct answers or ineffective distractors in AI-generated items necessitate thorough SME review and revision to ensure quality and validity. The study highlights the potential of integrating AI with human expertise to enhance the efficiency of item generation while maintaining psychometric standards in high-stakes environments. The implications for psychometric practice and the necessity of domain-specific validation are discussed, offering a framework for future research and application of AI in test development.</p>\",\"PeriodicalId\":51465,\"journal\":{\"name\":\"International Journal of Selection and Assessment\",\"volume\":\"33 3\",\"pages\":\"\"},\"PeriodicalIF\":2.4000,\"publicationDate\":\"2025-08-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1111/ijsa.70021\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Selection and Assessment\",\"FirstCategoryId\":\"91\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1111/ijsa.70021\",\"RegionNum\":4,\"RegionCategory\":\"管理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"MANAGEMENT\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Selection and Assessment","FirstCategoryId":"91","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/ijsa.70021","RegionNum":4,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MANAGEMENT","Score":null,"Total":0}
Harnessing Generative AI for Assessment Item Development: Comparing AI-Generated and Human-Authored Items
The use of generative AI, specifically large language models (LLMs), in test development presents an innovative approach to efficiently creating technical, knowledge-based assessment items. This study evaluates the efficacy of AI-generated items compared to human-authored counterparts within the context of employee selection testing, focusing on data science knowledge areas. Through a paired comparison approach, subject matter experts (SMEs) were asked to evaluate items produced by both LLMs and human item writers. Findings revealed a significant preference for LLM-generated items, particularly in specific knowledge domains such as Statistical Foundations and Scientific Data Analysis. However, despite the promise of generative AI in accelerating item development, human review remains critical. Issues such as multiple correct answers or ineffective distractors in AI-generated items necessitate thorough SME review and revision to ensure quality and validity. The study highlights the potential of integrating AI with human expertise to enhance the efficiency of item generation while maintaining psychometric standards in high-stakes environments. The implications for psychometric practice and the necessity of domain-specific validation are discussed, offering a framework for future research and application of AI in test development.
期刊介绍:
The International Journal of Selection and Assessment publishes original articles related to all aspects of personnel selection, staffing, and assessment in organizations. Using an effective combination of academic research with professional-led best practice, IJSA aims to develop new knowledge and understanding in these important areas of work psychology and contemporary workforce management.