Liyan Xi, Qingxiong Weng, Jan Corstjens, Xiujuan Wang, Lixin Chen
{"title":"Effects of a constructed response retest strategy on faking, test perceptions, and criterion-related validity of situational judgment tests","authors":"Liyan Xi, Qingxiong Weng, Jan Corstjens, Xiujuan Wang, Lixin Chen","doi":"10.1111/ijsa.12482","DOIUrl":"10.1111/ijsa.12482","url":null,"abstract":"<p>This research proposes a faking-mitigation strategy for situational judgment tests (SJTs), referred to as the constructed response retest (CR-retest). The CR-retest strategy involves presenting SJT items in a constructed response format first, followed by equivalent closed-ended items with the same situation description. Two field experiments (<i>N</i><sub>1</sub> = 733, <i>N</i><sub>2</sub> = 273) were conducted to investigate the effects of this strategy and contrast it with a commonly used pretest warning message. Study 1 revealed that the CR-retest strategy was more effective than the warning message in reducing score inflation and improving criterion-related validity. Study 2 delved deeper by investigating the effects of the CR-retest strategy on applicant reactions in a 2 (with or without CR-retest strategy) × 2 (warning or control message) between-subjects design. The results showed that applicants reported positive fairness perceptions on SJT items with the CR-retest strategy. The CR-retest strategy was effective in reducing faking by evoking threat perceptions, whereas the warning message heightened threat and fear. Combining two strategies further decreased faking without undermining fairness perceptions. Overall, our results indicate that the CR-retest strategy could be a valuable method to mitigate faking in real-life selection settings.</p>","PeriodicalId":51465,"journal":{"name":"International Journal of Selection and Assessment","volume":"32 4","pages":"561-578"},"PeriodicalIF":2.6,"publicationDate":"2024-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141376022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Brent A. Stevenor, Louis Hickman, Michael J. Zickar, Fletcher Wimbush, Weston Beck
{"title":"Validity evidence for personality scores from algorithms trained on low-stakes verbal data and applied to high-stakes interviews","authors":"Brent A. Stevenor, Louis Hickman, Michael J. Zickar, Fletcher Wimbush, Weston Beck","doi":"10.1111/ijsa.12480","DOIUrl":"10.1111/ijsa.12480","url":null,"abstract":"<p>We present multifaceted validity evidence for machine learning models (referred to as automated video interview personality assessments (AVI-PAs) in this research) that were trained on verbal data and interviewer ratings from low-stakes interviews and applied to high-stakes interviews to infer applicant personality. The predictive models used RoBERTa embeddings and binary unigrams as predictors. In Study 1 (<i>N</i> = 107), AVI-PAs more closely reflected interviewer ratings compared to applicant and reference ratings. Also, AVI-PAs and interviewer ratings had similar relations with applicants' interview behaviors, biographical information, and hireability. In Study 2 (<i>N</i> = 25), AVI-PAs had weak-moderate (nonsignificant) relations with subsequent supervisor ratings of job performance. Empirically, the AVI-PAs were most similar to interviewer ratings. AVI-PAs, interviewer ratings, self-reports, and reference-reports all demonstrated weak discriminant validity evidence. LASSO regression provided superior (but still weak) discriminant evidence compared to elastic net regression. Despite using natural language embeddings to operationalize verbal behavior, the AVI-PAs (except emotional stability) exhibited large correlations with interviewee word count. We discuss the implications of these findings for pre-employment personality assessments and effective AVI-PA design.</p>","PeriodicalId":51465,"journal":{"name":"International Journal of Selection and Assessment","volume":"32 4","pages":"544-560"},"PeriodicalIF":2.6,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141188856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Johannes M. Basch, Nicolas Roulin, Josua Gläsner, Raphael Spengler, Julia Wilhelm
{"title":"How different backgrounds in video interviews can bias evaluations of applicants","authors":"Johannes M. Basch, Nicolas Roulin, Josua Gläsner, Raphael Spengler, Julia Wilhelm","doi":"10.1111/ijsa.12487","DOIUrl":"10.1111/ijsa.12487","url":null,"abstract":"<p>Organizations are increasingly using technology-enabled formats such as asynchronous video interviews (AVIs) to evaluate candidates. However, the personal environment of applicants visible in AVI recordings may introduce additional bias in the evaluation of interview performance. This study extends existing research by examining the influence of cues signaling affiliation with Islam or homosexuality in the background and comparing them with a neutral background using an experimental design and a German sample (<i>N</i> = 222). Results showed that visible signs of religious affiliation with Islam led to lower perceived competence, while perceived warmth and interview performance were unaffected. Visual cues of homosexuality had no effect on perceptions of the applicant. In addition, personal characteristics of the raters, such as their intrinsic religious orientation or their attitudes towards homosexuality influenced applicants’ ratings, so that a non-Muslim religious orientation was negatively associated with evaluations of the Muslim candidate and a negative attitude towards homosexuality was negatively associated with evaluations of the homosexual candidate. This study thus contributes to the literature on AVIs and discrimination against Muslims and members of the 2SLGBTQI+ community in personnel selection contexts.</p>","PeriodicalId":51465,"journal":{"name":"International Journal of Selection and Assessment","volume":"32 4","pages":"535-543"},"PeriodicalIF":2.6,"publicationDate":"2024-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/ijsa.12487","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141188914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Neil D. Christiansen, Chet Robie, Ye Ra Jeong, Gary N. Burns, Douglas E. Haaland, Mei-Chuan Kung, Ted B. Kinney
{"title":"Departures from linearity as evidence of applicant distortion on personality tests","authors":"Neil D. Christiansen, Chet Robie, Ye Ra Jeong, Gary N. Burns, Douglas E. Haaland, Mei-Chuan Kung, Ted B. Kinney","doi":"10.1111/ijsa.12481","DOIUrl":"10.1111/ijsa.12481","url":null,"abstract":"<p>Two field studies were conducted to examine how applicant faking impacts the normally linear construct relationships of personality tests using segmented regression and by partitioning samples to evaluate effects on validity across different ranges of test scores. Study 1 investigated validity decay across score ranges of applicants to a state police academy (<i>N</i> = 442). Personality test scores had nonlinear construct relations in the applicant sample, with scores from the top of the distribution being worse predictors of subsequent performance but more strongly related to social desirability scores; this pattern was not found for the partitioned scores of a cognitive test. Study 2 compared the relationship between personality test scores and job performance ratings of applicants (<i>n</i> = 97) to those of incumbents (<i>n</i> = 318) in a customer service job. Departures from linearity were observed in the applicant but not in the incumbent sample. Effects of applicant distortion on the validity of personality tests are especially concerning when validity decay increases toward the top of the distribution of test scores. Observing slope differences across ranges of applicant personality test scores can be an important tool in selection.</p>","PeriodicalId":51465,"journal":{"name":"International Journal of Selection and Assessment","volume":"32 4","pages":"521-534"},"PeriodicalIF":2.6,"publicationDate":"2024-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/ijsa.12481","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141108557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improving structured interview acceptance through training","authors":"Steve Baumgartner, Lynn Bartels, Julia Levashina","doi":"10.1111/ijsa.12473","DOIUrl":"10.1111/ijsa.12473","url":null,"abstract":"<p>Despite having predictive validity above other selection methods, structured interviews are not always used. Using the Theory of Planned Behavior as a framework, this study examines the role of interview training in increasing structured interview acceptance (SIA). Based on a survey of 190 practitioners in the fields of Human Resources, I-O Psychology, and other professionals who conduct employment interviews, our results show that not all interviewer training programs are equally effective in increasing SIA. While participation in formal interviewer training is related to SIA, SIA could be influenced more by incorporating certain training components, including <i>training on how to avoid rating errors</i> (<i>r</i> = .21), <i>learning how to evaluate interview answers</i> (<i>r</i> = .19), <i>interview practice/roleplaying</i> (<i>r</i> = .17), <i>training on job analysis</i> (<i>r</i> = .15), <i>legal issues</i> (<i>r</i> = .15), <i>background and purpose of the interview</i> (<i>r</i> = .13), <i>job requirements for the position(s) being filled</i> (<i>r</i> = .13), and <i>a discussion about interview verbal and nonverbal behaviors to avoid</i> (<i>r</i> = .13). Additionally, we found that training components display different relationship with SIA across our two sub-samples. For example, in the MTurk sample (i.e., composed primarily from a managerial population) including <i>job analysis</i>, <i>how to evaluate answers</i>, and <i>how to avoid rating errors</i> correlated significantly with SIA. However, in the non-MTurk sample (i.e., composed primarily from a HR professional population), <i>interview practice/role playing</i>, <i>rapport building</i>, <i>use of a videotaped interview to guide instructions</i>, and <i>how to make decisions from interview data</i> correlated significantly with SIA. This highlights the importance of training needs analysis to better understand the audience before training. We suggest that organizations incorporate the identified components into interviewer training to enhance the structured interviews acceptance and ensure that interviewers are more likely to implement structured interview techniques in practice.</p>","PeriodicalId":51465,"journal":{"name":"International Journal of Selection and Assessment","volume":"32 4","pages":"512-520"},"PeriodicalIF":2.6,"publicationDate":"2024-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141118947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The performance of large language models on quantitative and verbal ability tests: Initial evidence and implications for unproctored high-stakes testing","authors":"Louis Hickman, Patrick D. Dunlop, Jasper Leo Wolf","doi":"10.1111/ijsa.12479","DOIUrl":"10.1111/ijsa.12479","url":null,"abstract":"<p>Unproctored assessments are widely used in pre-employment assessment. However, widely accessible large language models (LLMs) pose challenges for unproctored personnel assessments, given that applicants may use them to artificially inflate their scores beyond their true abilities. This may be particularly concerning in cognitive ability tests, which are widely used and traditionally considered to be less fakeable by humans than personality tests. Thus, this study compares the performance of LLMs on two common types of cognitive tests: quantitative ability (number series completion) and verbal ability (use a passage of text to determine whether a statement is true). The tests investigated are used in real-world, high-stakes selection. We also examine the performance of the LLMs across different test formats (i.e., open-ended vs. multiple choice). Further, we contrast the performance of two LLMs (Generative Pretrained Transformers, GPT-3.5 and GPT-4) across multiple prompt approaches and “temperature” settings (i.e., a parameter that determines the amount of randomness in the model's output). We found that the LLMs performed well on the verbal ability test but extremely poorly on the quantitative ability test, even when accounting for the test format. GPT-4 outperformed GPT-3.5 across both types of tests. Notably, although prompt approaches and temperature settings did affect LLM test performance, those effects were mostly minor relative to differences across tests and language models. We provide recommendations for securing pre-employment testing against LLM influences. Additionally, we call for rigorous research investigating the prevalence of LLM usage in pre-employment testing as well as on how LLM usage affects selection test validity.</p>","PeriodicalId":51465,"journal":{"name":"International Journal of Selection and Assessment","volume":"32 4","pages":"499-511"},"PeriodicalIF":2.6,"publicationDate":"2024-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/ijsa.12479","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140964082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Examining the efficacy of inoculation and value-affirmation interventions in improving precandidate reactions among prospective military recruits","authors":"Justin R. Feeney, Ben Sylvester, Steve Gooch","doi":"10.1111/ijsa.12475","DOIUrl":"10.1111/ijsa.12475","url":null,"abstract":"<p>This study engaged 4848 first-time, English-speaking prospective Canadian Armed Forces applicants to evaluate pre-application interventions' efficacy on the Practice Canadian Forces Aptitude Test (PCFAT). Using a five-level between-subjects design, participants were randomly assigned to one of the following intervention conditions: inoculation message, value-affirmation message, a combination of both, placebo writing intervention, or a no-intervention control group. The interventions were anchored in inoculation theory and value-affirmation theory and aimed to reduce math anxiety and close the gender gap in test performance. Contrary to expectations, the interventions did not significantly reduce math anxiety or improve problem-solving performance. Consistent with the literature, a negative relationship was found between levels of math anxiety and problem-solving scores, and men outscored women in problem-solving across all conditions. Despite these outcomes, the study lays a foundation for future research on enhancing pre-applicant experiences in an increasingly competitive labor market. Implications and future directions are discussed.</p>","PeriodicalId":51465,"journal":{"name":"International Journal of Selection and Assessment","volume":"32 4","pages":"491-498"},"PeriodicalIF":2.6,"publicationDate":"2024-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/ijsa.12475","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140837435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sumayya Saleem, Linda White, Michal Perlman, Elizabeth Dhuey
{"title":"Promoting equity in hiring: An evaluation of the HireNext Job Posting Assessment","authors":"Sumayya Saleem, Linda White, Michal Perlman, Elizabeth Dhuey","doi":"10.1111/ijsa.12477","DOIUrl":"10.1111/ijsa.12477","url":null,"abstract":"<p>The language used in job postings can deter applicants and contribute to the employment gap, which refers to high rates of youth unemployment occurring simultaneously with high levels of job vacancies. We tested youth preferences for job postings modified using a free online tool that uses natural language processing to make them more appealing to young and diverse candidates. Using data from 1050 respondents aged 18–35 with education below a postsecondary degree, we found a consistent and statistically significant preference for modified postings, irrespective of the extent or types of changes made. More traditionally disadvantaged respondents (i.e., with lower education, lower incomes, disabilities, women, and unemployed youth) displayed a stronger preference for modified postings. These findings suggest that this tool can help employers recruit disadvantaged youth and bridge the employment gap.</p>","PeriodicalId":51465,"journal":{"name":"International Journal of Selection and Assessment","volume":"32 4","pages":"479-490"},"PeriodicalIF":2.6,"publicationDate":"2024-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/ijsa.12477","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140837267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Investigating impression management use in asynchronous video interviews across 10 countries","authors":"René Arseneault, Nicolas Roulin","doi":"10.1111/ijsa.12476","DOIUrl":"https://doi.org/10.1111/ijsa.12476","url":null,"abstract":"<p>This cross-cultural study investigates how interviewees from 10 culturally-distinct countries differ in their use of impression management (IM) tactics in asynchronous video interviews (AVIs), and the relationship(s) between those tactics and interview performance. A total of 582 participants from ten countries (India, Canada, South Africa, Poland, Spain, Iran, Germany, Chile, Philippines, China) completed an 8-question AVI for a mock position as a manager in a bank. We drew upon GLOBE's cultural framework to predict and explain observed differences in self-reported IM use and performance. We used multi-level modeling to test our hypotheses. Interviewees from our ten countries differed slightly in their IM use for various tactics, but IM use was seldom related to GLOBE cultural dimensions. Partially consistent with previous in-person interview research, honest IM tactics (e.g., self promotion) were positively, but deceptive tactics (e.g., extensive image creation) negatively, associated with interview performance. This research is the first to investigate cross-cultural IM differences in AVIs, thus addressing a critical gap in the selection literature at a time when many organizations conduct interviews virtually to save costs, streamline the hiring process, or simply conduct most of their activities remotely.</p>","PeriodicalId":51465,"journal":{"name":"International Journal of Selection and Assessment","volume":"32 3","pages":"461-477"},"PeriodicalIF":2.6,"publicationDate":"2024-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/ijsa.12476","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141967618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Justin R. Feeney, Kabir N. Daljeet, Richard D. Goffin, Travis J. Schneider
{"title":"Rating accuracy, leniency, and rater perceptions when using the RPM and BARS","authors":"Justin R. Feeney, Kabir N. Daljeet, Richard D. Goffin, Travis J. Schneider","doi":"10.1111/ijsa.12474","DOIUrl":"10.1111/ijsa.12474","url":null,"abstract":"<p>Researchers have argued that social-comparative rating formats hold important psychometric advantages over traditional absolute ratings. We asked 152 participants to observe and assess the videotaped performance of individuals completing a task using a social-comparative (Relative Percentile Method; RPM) and absolute rating (Behaviorally Anchored Rating Scale; BARS) formats. After collecting expert ratings on the same set of videos, we calculated accuracy indices and leniency. Additionally, we collected rater perceptions of accuracy and fairness for both formats. Our study revealed that the BARS was perceived as more accurate and fairer than the RPM. However, the RPM was found to be better at combating rater leniency. We discuss the implications of these findings.</p>","PeriodicalId":51465,"journal":{"name":"International Journal of Selection and Assessment","volume":"32 3","pages":"451-460"},"PeriodicalIF":2.6,"publicationDate":"2024-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/ijsa.12474","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140677838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}