{"title":"All Your Base Are Belong to Us: The Urgent Reality of Unproctored Testing in the Age of LLMs","authors":"Louis Hickman","doi":"10.1111/ijsa.70005","DOIUrl":"https://doi.org/10.1111/ijsa.70005","url":null,"abstract":"<p>The release of new generative artificial intelligence (AI) tools, including new large language models (LLMs), continues at a rapid pace. Upon the release of OpenAI's new o1 models, I reconducted Hickman et al.'s (2024) analyses examining how well LLMs perform on a quantitative ability (number series) test. GPT-4 scored below the 20th percentile (compared to thousands of human test takers), but o1 scored at the 95th percentile. In response to these updated findings and Lievens and Dunlop's (2025) article about the effects of LLMs on the validity of pre-employment assessments, I make an urgent call to action for selection and assessment researchers and practitioners. A recent survey suggests that a large proportion of applicants are already using generative AI tools to complete high-stakes assessments, and it seems that no current assessments will be safe for long. Thus, I offer possibilities for the future of testing, detail their benefits and drawbacks, and provide recommendations. These possibilities are: increased use of proctoring, adding strict time limits, using LLM detection software, using think-aloud (or similar) protocols, collecting and analyzing trace data, emphasizing samples over signs, and redesigning assessments to allow LLM use during completion. Several of these possibilities inspire future research to modernize assessment. Future research should seek to improve our understanding of how to design valid assessments that allow LLM use, how to effectively use trace test-taker data, and whether think-aloud protocols can help differentiate experts and novices.</p>","PeriodicalId":51465,"journal":{"name":"International Journal of Selection and Assessment","volume":"33 2","pages":""},"PeriodicalIF":2.6,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/ijsa.70005","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143533255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Valerie Schröder, Anna Luca Heimann, Pia Ingold, Nicolas Roulin, Marianne Schmid Mast, Manuel Bachmann, Martin Kleinmann
{"title":"Social Desirability Tendency in Personality-Based Job Interviews—A Question of Interview Format?","authors":"Valerie Schröder, Anna Luca Heimann, Pia Ingold, Nicolas Roulin, Marianne Schmid Mast, Manuel Bachmann, Martin Kleinmann","doi":"10.1111/ijsa.70006","DOIUrl":"https://doi.org/10.1111/ijsa.70006","url":null,"abstract":"<div>\u0000 \u0000 <p>Today's variety of interview formats raises the question of their interchangeability. For personality interviews, a crucial question is whether different formats are comparably robust against applicants' social desirability tendency (SDT) to ensure an accurate measurement. Using a within-subjects design in a simulated selection setting with 211 participants, this study examined how SDT affects personality scores in a face-to-face, asynchronous video, and written interview—all with similar interview questions designed to measure personality. Relationships between interview scores and SDT were weakest in the face-to-face format and strongest in the written format and differed depending on which personality trait was assessed. The findings highlight the suitedness of different interview formats for measuring personality with important implications for interview design and personality assessment.</p></div>","PeriodicalId":51465,"journal":{"name":"International Journal of Selection and Assessment","volume":"33 2","pages":""},"PeriodicalIF":2.6,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143533256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Philipp Schäpers, Franz W. Mönke, Chiara-Maria Frieler, Nicolas Roulin, Johannes Basch
{"title":"Attitudes Toward Cybervetting in Germany: Impact on Organizational Attractiveness Depends on Social Media Platform","authors":"Philipp Schäpers, Franz W. Mönke, Chiara-Maria Frieler, Nicolas Roulin, Johannes Basch","doi":"10.1111/ijsa.70003","DOIUrl":"https://doi.org/10.1111/ijsa.70003","url":null,"abstract":"<p>Cybervetting, assessing social media in personnel selection, is widely used. However, individuals concerned often perceive this practice negatively. We propose that attitudes toward cybervetting may depend on the platform used and the cultural context. Thus, we transfer the attitudes toward cybervetting scale to a context with strict data regulations: Germany. In an online between-subjects experiment with platform users and non-users (<i>N </i>= 100 working professionals and students), we examined attitudes toward cybervetting on different social media platforms (professional: LinkedIn vs. personal: Instagram) and their relationship with organizational attractiveness. We found that German participants viewed cybervetting on professional platforms with more skepticism than American participants. Hierarchical regression analysis revealed higher perceived fairness, lower invasion of privacy, and higher organizational attractiveness when cybervetting was done on professional platforms.</p>","PeriodicalId":51465,"journal":{"name":"International Journal of Selection and Assessment","volume":"33 1","pages":""},"PeriodicalIF":2.6,"publicationDate":"2025-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/ijsa.70003","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143424083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Why Participant Perceptions of Assessment Center Exercises Matter: Justice, Motivation, Self-Efficacy, and Performance","authors":"Sylvia G. Roch, Kathryn Devon","doi":"10.1111/ijsa.70002","DOIUrl":"https://doi.org/10.1111/ijsa.70002","url":null,"abstract":"<div>\u0000 \u0000 <p>Despite expectations, assessment center (AC) participants' performance ratings often are not strongly correlated over AC exercises. Why is a puzzle? Perhaps one piece of the puzzle is that participants view AC exercises with varying levels of motivation, justice, and self-efficacy, which relate to exercise performance, the topic of the current research. Based on 123 participants completing an AC consisting of six exercises (two leaderless group discussions, oral presentation, written case analysis, personality assessment, and cognitive ability exercise), results showed that motivation, self-efficacy, and procedural justice levels differed among exercises, which generally related to exercise performance. Two interventions designed to improve how participants perceive AC exercises (one focusing on self-efficacy and the other on justice) were not successful. Implications are discussed.</p></div>","PeriodicalId":51465,"journal":{"name":"International Journal of Selection and Assessment","volume":"33 1","pages":""},"PeriodicalIF":2.6,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143111799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Are Games Always Fun and Fair? A Comparison of Reactions to Different Game-Based Assessments","authors":"Marie Luise Ohlms, Klaus G. Melchers","doi":"10.1111/ijsa.12520","DOIUrl":"https://doi.org/10.1111/ijsa.12520","url":null,"abstract":"<p>Game-based assessment (GBA) has garnered attention in the personnel selection and assessment context owing to its postulated potential to improve applicant reactions. However, GBAs can differ considerably depending on their specific design. Therefore, we sought to determine whether test taker reactions to GBAs vary owing to the different manifestations that GBAs may take on, and to test takers' individual preferences for such assessments. In an experimental study, each of <i>N</i> = 147 participants was shown six different GBAs and asked to rate several applicant reaction variables concerning these assessments. We found that reactions to GBAs were not inherently positive even though GBAs were generally perceived as enjoyable. However, perceptions of fairness and organizational attractiveness varied considerably between GBAs. Participants' age and experience with video games were related to reactions but had less impact than the different GBAs. Our results suggest that a technology-as-designed approach, which considers GBAs as a combination of multiple components (e.g., game elements), is crucial in GBA research to provide generalizable results for theory and practice.</p>","PeriodicalId":51465,"journal":{"name":"International Journal of Selection and Assessment","volume":"33 1","pages":""},"PeriodicalIF":2.6,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/ijsa.12520","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143119821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Comparing Proctored and Unproctored Cognitive Ability Testing in High-Stakes Personnel Selection","authors":"Tore Nøttestad Norrøne, Morten Nordmo","doi":"10.1111/ijsa.70001","DOIUrl":"https://doi.org/10.1111/ijsa.70001","url":null,"abstract":"<div>\u0000 \u0000 <p>New advances in computerized adaptive testing (CAT) have increased the feasibility of high-stakes unproctored testing of general mental ability (GMA) in personnel selection contexts. This study presents the results from a within-subject investigation of the convergent validity of unproctored tests. Three batteries of cognitive ability tests were administered during personnel selection in the Norwegian Armed Forces. A total of 537 candidates completed two sets of proctored fixed-length GMA tests before and during the selection process. In addition, an at-home unproctored CAT battery of tests was administered before the selection process began. Differences and similarities between the convergent validity of the tests were evaluated. The convergent validity coefficients did not significantly differ between proctored and unproctored batteries, both on observed GMA scores and the latent factor level. The distribution and standardized residuals of test scores comparing proctored-proctored and proctored-unproctored were overall quite similar and showed no evidence of score inflation or deflation in the unproctored tests. The similarities between proctored and unproctored results also extended to the moderately searchable words similarity test. Although some unlikely individual cases were observed, the overall results suggest that the unproctored tests maintained their convergent validity.</p></div>","PeriodicalId":51465,"journal":{"name":"International Journal of Selection and Assessment","volume":"33 1","pages":""},"PeriodicalIF":2.6,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143119822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Henri T. Maindidze, Jason G. Randall, Michelle P. Martin-Raugh, Katrisha M. Smith
{"title":"A Meta-Analysis of Accent Bias in Employee Interviews: The Effects of Gender and Accent Stereotypes, Interview Modality, and Other Moderating Features","authors":"Henri T. Maindidze, Jason G. Randall, Michelle P. Martin-Raugh, Katrisha M. Smith","doi":"10.1111/ijsa.12519","DOIUrl":"https://doi.org/10.1111/ijsa.12519","url":null,"abstract":"<p>To address concerns of subtle discrimination against stigmatized groups, we meta-analyze the magnitude and moderators of bias against non-standard accents in employment interview evaluations. Results from a multi-level random-effects meta-analysis (unique effects: <i>k</i> = 41, <i>N</i> = 7,596; multi-level effects accounting for dependencies: <i>k</i> = 120, <i>N</i> = 20,873) demonstrate that standard-accented (SA) interviewees are consistently favored over non-standard-accented (NSA) interviewees (<i>d</i> = 0.46). Accent bias is stronger against women compared to men, particularly when evaluator samples are predominantly female, and was strongly predicted by interviewers' stereotypes of NSA interviewees as less competent and, to a lesser extent, as less warm. Accent bias was not significantly impacted by perceptions of comprehensibility, accentedness, accent type, interview modality, study rigor, or job speaking skill requirements.</p>","PeriodicalId":51465,"journal":{"name":"International Journal of Selection and Assessment","volume":"33 1","pages":""},"PeriodicalIF":2.6,"publicationDate":"2025-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/ijsa.12519","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143118622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Toward Theory-Based Volitional Personality Development Interventions at Work","authors":"Sofie Dupré, Bart Wille","doi":"10.1111/ijsa.70000","DOIUrl":"https://doi.org/10.1111/ijsa.70000","url":null,"abstract":"<div>\u0000 \u0000 <p>In this article, we respond to four commentaries (Li et al., 2024; Hennecke & Ingold, 2025; Perossa & Connelly, 2024; Ones et al., 2024) on our article “Personality development goals at work: A new frontier in personality assessment in organizations.” We start by addressing four overarching considerations from the commentaries, including (a) how to approach PDG assessment, (b) the feasibility of personality development interventions, (c) potential trade-offs involved, and (d) the value of personality development beyond established HR practices. Next, in an attempt to integrate these considerations and stimulate future research in this area, we outline three critical elements of what we believe can be the foundation of theory-based personality development interventions at work. For this purpose, we first posit that personality development at work can be rethought such that the focus shifts from “changing an employee's trait levels” to “expanding that employee's comfort zone across a range of personality states.” Second, to have sustained effects, interventions need to accomplish more than simply “learning new behaviors,” by effectively targeting all layers of personality—behavioral, cognitive, and emotional. Finally, we introduce optimal functioning, encompassing both performance and well-being aspects, as the ultimate criterion for evaluating the success of personality development interventions. We hope these reactions and integrative ideas will inspire future research on personality development goals assessment and personality development interventions in the work context.</p></div>","PeriodicalId":51465,"journal":{"name":"International Journal of Selection and Assessment","volume":"33 1","pages":""},"PeriodicalIF":2.6,"publicationDate":"2025-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143116278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Andrew B. Speer, Angie Y. Delacruz, Takudzwa Chawota, Lauren J. Wegmeyer, Andrew P. Tenbrink, Carter Gibson, Chris Frost
{"title":"Evaluating the Impact of Faking on the Criterion-Related Validity of Personality Assessments","authors":"Andrew B. Speer, Angie Y. Delacruz, Takudzwa Chawota, Lauren J. Wegmeyer, Andrew P. Tenbrink, Carter Gibson, Chris Frost","doi":"10.1111/ijsa.12518","DOIUrl":"https://doi.org/10.1111/ijsa.12518","url":null,"abstract":"<p>Personality assessments are commonly used in hiring, but concerns about faking have raised doubts about their effectiveness. Qualitative reviews show mixed and inconsistent impacts of faking on criterion-related validity. To address this, a series of meta-analyses were conducted using matched samples of honest and motivated respondents (i.e., instructed to fake, applicants). In 80 paired samples, the average difference in validity coefficients between honest and motivated samples across five-factor model traits ranged from 0.05 to 0.08 (largest for conscientiousness and emotional stability), with the validity ratio ranging from 64% to 72%. Validity was attenuated when candidates faked regardless of sample type, trait relevance, or the importance of impression management, though variation existed across criterion types. Both real applicant samples (<i>k</i> = 25) and instructed response conditions (<i>k</i> = 55) showed a reduction in validity across honest and motivated conditions, including when managerial ratings of job performance were the criterion. Thus, faking impacted the validity in operational samples. This suggests that practitioners should be cautious relying upon concurrent validation evidence (for personality inventories) and expect attenuated validity in operational applicant settings, particularly for conscientiousness and emotional stability scales. That said, it is important to highlight that personality assessments generally maintained useful validity even under-motivated conditions.</p>","PeriodicalId":51465,"journal":{"name":"International Journal of Selection and Assessment","volume":"33 1","pages":""},"PeriodicalIF":2.6,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/ijsa.12518","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143112513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Attention To Detail and Cyber Skill: Associated Beyond General Intelligence in Cyber-Soldier Conscripts","authors":"Pär-Anders Albinsson, Patrik Lif","doi":"10.1111/ijsa.12517","DOIUrl":"https://doi.org/10.1111/ijsa.12517","url":null,"abstract":"<p>We explore the potential of <i>attention to detail</i> as a component in the selection of conscripts for the cyber track in the Swedish Armed Forces. To measure attention to detail, we adapted the embedded figures test and administered it to conscripts as part of the extended mustering. We report results from a conscript selection with 97 test participants of which 56 continued to become cyber soldiers, finishing their training the following year. Attention to detail showed little correlation with the cognitive-ability components of the mustering test battery, suggesting that attention to detail is unlikely to be strongly associated with general intelligence for this population. Attention to detail was the only cognitive-ability component of the mustering test battery that showed a significant predictive relationship with practical post-training cyber skill (<i>R</i><sup>2</sup> = 0.10). Therefore, we believe that it could be a useful additional component in the selection process.</p>","PeriodicalId":51465,"journal":{"name":"International Journal of Selection and Assessment","volume":"33 1","pages":""},"PeriodicalIF":2.6,"publicationDate":"2024-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/ijsa.12517","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143118449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}