Assessment Decision Guide

OPM ADG, Section VI: Annotated References, Part II

Job Knowledge Tests References:

Dubois, D., Shalin, V. L., Levi, K. R., & Borman, W. C. (1993). Job knowledge test design: A cognitively-oriented approach. U.S. Office of Naval Research Report, Institute Report 241, i-47.

This study applied cognitive methods to the measurement of performance using tests of job knowledge. The research goal was to improve the usefulness of job knowledge tests as a proxy for hands-on performance. The land navigation skills of 358 Marines were tested with a written job knowledge test consisting of multiple-choice questions, hands-on proficiency tests, and a work-sample performance test. Results indicate cognitively-oriented job knowledge tests show improved correspondence with hands-on measures of performance, compared with existing content-oriented test development procedures.

Dye, D. A., Reck, M., & McDaniel, M. A. (1993). The validity of job knowledge measures. International Journal of Selection and Assessment, 1, 153-157.

The results of this study demonstrated the validity of job knowledge tests for many jobs. Job knowledge was defined as the "cumulation of facts, principles, concepts, and other pieces of information considered important in the performance of ones job" (p. 153). In their meta-analysis of 502 validity coefficients based on 363,528 individuals, they found high levels of validity for predicting training and job performance.

Ree, M. J., Carretta, T. R., & Teachout, M. S. (1995). Role of ability and prior job knowledge in complex training performance. Journal of Applied Psychology, 80(6), 721-730.

A causal model of the role of general cognitive ability and prior job knowledge in subsequent job knowledge acquisition and work sample performance during training was developed. Participants were 3,428 U.S. Air Force officers in pilot training. The measures of ability and prior job knowledge came from the Air Force Officer Qualifying Test. The measures of job knowledge acquired during training were derived from classroom grades. Work sample measures came from check flight ratings. The model showed ability directly influenced the acquisition of job knowledge. General cognitive ability influenced work samples through job knowledge. Prior job knowledge had almost no influence on subsequent job knowledge but directly influenced the early work sample. Early training job knowledge influenced subsequent job knowledge and work sample performance. Finally, early work sample performance strongly influenced subsequent work sample performance.

Roth, P. L., Huffcutt, A. I., & Bobko, P. (2003). Ethnic group differences in measures of job performance: A new meta-analysis. Journal of Applied Psychology, 88(4), 694-706.

The authors conducted a meta-analysis of ethnic group differences in job performance. Analyses of Black-White differences within categories of job performance were conducted and subgroup differences within objective and subjective measurements were compared. Contrary to one perspective sometimes adopted in the field, objective measures are associated with very similar, if not somewhat larger, standardized ethnic group differences than subjective measures across a variety of indicators. This trend was consistent across quality, quantity, and absenteeism measures. Further, work samples and job knowledge tests are associated with larger ethnic group differences than performance ratings or measures of absenteeism. Analysis of Hispanic-White standardized differences shows they are generally lower than Black-White differences in several categories.

Sapitula, L., & Shartzer, M. C. (2001). Predicting the job performance of maintenance workers using a job knowledge test and a mechanical aptitude test. Applied H.R.M. Research, 6(1-2), 71-74.

This study examined the predictive validity of the Job Knowledge Written Test (JKWT) and the Wiesen Test of Mechanical Aptitude (WTMA, J. P. Wiesen, 1997), and the effects of race, gender, and age on scores. A total of 782 applicants completed the JKWT and the WTMA, and 102 maintenance workers were administered the JKWT, the WTMA, and a job performance appraisal. Results show no significant relationship between job performance ratings and either the JKWT or WTMA. Male applicants scored higher than did female applicants and White applicants scored higher than did minority applicants.

Personality Tests References:

Barrick, M. R., & Mount, M. K. (1991). The Big Five personality dimensions and job performance: A meta-analysis. Personnel Psychology, 44, 1-26.

Investigated the relation of the "Big Five" personality dimensions to three job performance criteria (job proficiency, training proficiency, and personnel data) for five occupational groups (professionals, police, managers, sales, and skilled/semi-skilled). A review of 117 studies yielded 162 samples totaling 23,994 subjects. Conscientiousness showed consistent relations with all job performance criteria for all occupational groups. Extraversion was a valid predictor for two occupations involving social interaction (managers and sales). Also, openness to experience and extraversion were valid predictors of the training proficiency criterion across occupations. Overall, results illustrate the benefits of using the five-factor model of personality to accumulate empirical findings. Study results have implications for research and practice in personnel psychology.

Hogan, R., Hogan, J., & Roberts, B. W. (1996). Personality measurement and employment decisions: Questions and answers. American Psychologist, 51, 469-477.

Summarizes information needed to answer the most frequent questions about the use of personality measures in applied contexts. Conclusions are (1) well-constructed measures of normal personality are valid predictors of performance in virtually all occupations, (2) they do not result in adverse impact for job applicants from minority groups, and (3) using well-developed personality measures for pre-employment screening is a way to promote social justice and increase organizational productivity.

Hough, L. M., Eaton, N. K., Dunnette, M. D., Kamp, J. D., & McCloy, R. A. (1990). Criterion-related validities of personality constructs and the effect of response distortion on those validities. Journal of Applied Psychology, 75, 581-595.

An inventory of six personality constructs and four response validity scales measuring accuracy of self-description were administered in three contexts: a concurrent criterion-related validity study, a faking experiment, and an applicant setting. Results showed (a) validities were in the .20s against targeted criterion constructs, (b) respondents successfully distorted their self-descriptions when instructed to do so, (c) response validity scales were responsive to different types of distortion, (d) applicants responses did not reflect evidence of distortion, and (e) validities remained stable regardless of possible distortion.

Hough, L. M., & Oswald, F. L. (2000). Personnel selection: Looking toward the future — Remembering the past. Annual Review of Psychology, 51, 631-664.

Reviews personnel selection research from 1995-1999. Areas covered are job analysis; performance criteria; cognitive ability and personality predictors; interview, assessment center, and biodata assessment methods; measurement issues; meta-analysis and validity generalization; evaluation of selection systems in terms of differential prediction, adverse impact, utility, and applicant reactions; emerging topics on team selection and cross-cultural issues; and finally professional, legal, and ethical standards. Three major themes are revealed: (1) better taxonomies produce better selection decisions; (2) the nature and analyses of work behavior are changing, influencing personnel selection practices; (3) the field of personality research is healthy, as new measurement methods, personality constructs, and compound constructs of well-known traits are being researched and applied to personnel selection.

Tett, R. P., Jackson, D. N, & Rothstein, M. (1991). Personality measures as predictors of job performance: A meta-analytic review. Personnel Psychology, 44, 703-742.

Based on 97 independent samples, a meta-analysis was used to (a) assess overall validity of personality measures as predictors of job performance, (b) investigate moderating effects of several study characteristics on personality scale validity, and (c) appraise predictability of job performance as a function of eight categories of personality content. Results indicated studies using confirmatory research strategies produced corrected mean personality scale validity more than twice as high as studies adopting exploratory strategies. An even higher mean validity was obtained based on studies using job analysis explicitly in selection of personality measures.

Reference Checking References:

Aamodt, M. G. (2006). Validity of recommendations and references. Assessment Council News, February, 4-6.

Reference data are subject to inflation and low reliability and generally reach only moderate levels of predictive validity. Even so, organizations are encouraged to check the references of their applicants because of widespread resume fraud and potential liability in the form of negligent hiring.

Taylor, P. J., Pajo, K., Cheung, G. W., & Stringfield, P. (2004). Dimensionality and validity of a structured telephone reference check procedure. Personnel Psychology, 57, 745-772.

Reports that reference checking, when properly structured, can prevent defamation litigation and add significant value to the selection process. Specifically tests the hypothesis that utilizing a structured, competency-based approach to reference checking can increase the predictive validity of ratings in much the same way as structuring the employment interview process. A structured job analysis was used to identify the core job-related competencies deemed essential to effective performance in a family of customer-contact jobs within a 10,000-employee service organization. These competencies (Commitment, Teamwork, and Customer Service) were incorporated into a structured reference form and contacts were asked to rate applicants on a number of behavioral indicators within each competency. A structured telephone interview with contacts was then used to obtain evidence of actual occurrences to support the ratings. Results indicated using a structured telephone reference check increased the employer's ability to predict future job performance. Results also indicated a shorter contact-applicant relationship does not undermine predictions of future job performance.

U.S. Merit Systems Protection Board. (2005). Reference checking in federal hiring: Making the call. Washington, DC: Author. Note: Report available at: http://www.mspb.gov/netsearch/viewdocs.aspx? docnumber=224106&version=224325&application=ACROBAT

Hiring officials should check references. The quality of reference checking can be improved by insisting job applicants provide at least three references who have observed their performance on the job. Supervisors should discuss the performance of their current and former employees with prospective employers. Some former supervisors will only provide basic facts about work histories (e.g., employment dates and positions held) because they are concerned with protecting the privacy of former employees. Their concern is understandable but need not interfere with reference checking. So long as reference checking discussions focus on job-related issues such as performance, reference giving is appropriate and legally defensible. Former supervisors who support reference checking inquiries can reward good employees for their past contributions and avoid "passing on" a problem employee to another agency. Agency human resources personnel can work to remove barriers to effective reference checking. For example, applicants should be required to complete Declaration of Federal Employment (OF-306) forms early in the application process. This form explicitly grants permission to check references. And this sets applicants' expectations appropriately — their performance in previous employment will be investigated.

Situational Judgment Tests References:

Hanson, M. A., Horgen, K. E., & Borman W. C. (1998, April). Situational judgment tests (SJT) as measures of knowledge/expertise. Paper presented as the 13th Annual Conference of the Society for Industrial and Organizational Psychology, Dallas, TX.

This paper discusses the situational judgment test (SJT) methodology and reasons for its popularity. This paper also investigates the nature of the construct(s) measured by these tests, why they are valid, when they are valid, and why they are sometimes not valid. The authors propose the SJT methodology is best suited for measuring knowledge or expertise, and discusses available construct validity evidence consistent with this perspective. This perspective generates several testable hypotheses, and additional research is proposed. Finally, the implications of this perspective for the development of valid and useful SJTs are discussed.

McDaniel, M. A., Morgeson, F. P, Finnegan, E. B, Campion, M. A., & Braverman, E. P. (2001). Use of situational judgment tests to predict job performance: A clarification of the literature. Journal of Applied Psychology, 86, 730-740.

This article reviews the history of situational judgment tests (SJT) and presents the results of a meta-analysis on criterion-related and construct validity. SJTs showed useful levels of validity across all jobs and situations studied. The authors also found a relatively strong relationship between SJTs and cognitive ability and the relationship depended on how the test had been developed. On the basis of the literature review and meta-analytic findings, implications for the continued use of SJTs are discussed, particularly in terms of recent investigations into tacit knowledge.

McDaniel, M. A., Whetzel, D. L., & Nguyen, N. T. (2006). Situational judgment tests for personnel selection. Alexandria, VA: IPMA Assessment Council.

Employers should take into account several factors before choosing to develop their own in-house situational judgment tests (SJTs). For example, SJT developers must make a number of decisions about the content of items, response options, response instructions, and answer key. This monograph also describes the major steps in building a situational judgment test such as conducting a critical incident workshop, creating item stems from critical incidents, generating item responses, developing item response instructions, and choosing among several scoring key methods.

Motowidlo, S. J., Dunnette, M. D., & Carter, G. W. (1990). An alternative selection procedure: The low-fidelity simulation. Journal of Applied Psychology, 75, 640-647.

A low-fidelity simulation was developed for selecting entry-level managers in the telecommunications industry. The simulation presents applicants with descriptions of work situations and five alternative responses for each situation. Applicants select one response they would most likely make and one they would least likely make in each situation. Results indicated simulation scores correlated from .28 to .37 with supervisory ratings of performance. These results show samples of even hypothetical work behavior can predict performance.

Motowidlo, S. J., & Tippins, N. (1993). Further studies of the low-fidelity simulation in the form of a situational inventory. Journal of Occupational and Organizational Psychology, 66, 337-344.

Authors examined two studies that extend the results of S. J. Motowidlo et al (1990) by providing further evidence about relations between situational inventory scores, job performance, and demographic factors. Combined results from both studies yield an overall validity estimate of .20, with small differences between race and sex subgroups, and confirm the potential usefulness of the low-fidelity simulation in the form of a situational inventory for employee selection.

Weekley, J. A., & Jones, C. (1999). Further studies of situational tests. Personnel Psychology, 52(3), 679-700.

Results are reported for two different situational judgment tests (SJTs). Across the two studies, situational test scores were significantly related to cognitive ability and experience. In one study, there was a slight tendency for experience and cognitive ability to interact in the prediction of situational judgment, such that cognitive ability became less predictive as experience increased. Situational judgment fully mediated the effects of cognitive ability in one study, but not in the other. SJT race effect sizes were consistent with past research and were smaller than those typically observed for cognitive ability tests. The evidence indicates situational judgment measures mediate a variety of job relevant skills.

Structured Interviews References:

Campion, M. A., Palmer, D. K., & Campion, J. E. (1997). A review of structure in the selection interview. Personnel Psychology, 50(3), 655-702.

Reviews the research literature and describes and evaluates the many ways selection interviews can be structured. Fifteen components of structure are identified which may enhance either the content of or the evaluation process in the interview. Each component is critiqued in terms of its impact on numerous forms of reliability, validity, and user reactions. Finally, recommendations for research and practice are presented. The authors conclude interviews can be easily enhanced by using some of the many possible components of structure, and the improvement of this popular selection procedure should be a high priority for future research and practice.

Conway, J. M., Jako, R. A., & Goodman, D. F. (1995). A meta-analysis of interrater and internal consistency reliability of selection interviews. Journal of Applied Psychology, 80(5), 565-579.

A meta-analysis of 111 inter-rater reliability coefficients and 49 coefficient alphas from selection interviews was conducted. Moderators of inter-rater reliability included study design, interviewer training, and three dimensions of interview structure (standardization of questions, of response evaluation, and of combining multiple ratings) such that standardizing questions increased reliability of ratings more for individual vs. panel interviews, and multiple ratings were useful when combined mechanically (there was no evidence of usefulness when combined subjectively), and standardization of questions and number of ratings made resulted in greater levels of validity. Upper limits of validity were estimated to be .67 for highly structured interviews and .34 for unstructured interviews.

Huffcutt, A I., & Arthur, W. (1994). Hunter and Hunter (1984) revisited: Interview validity for entry-level jobs. Journal of Applied Psychology, 79(2), 184-190.

By adopting the theory of planned behavior, this study tried to predict human resources managers' intentions toward unstructured and structured interview techniques. Managers evaluated case descriptions of both techniques and were interviewed about their own practices. The data revealed stronger intentions toward unstructured interviewing than toward structured interviewing, which was consistent with their own practices in selecting staff, which appeared to be rather unstructured. Ajzen's (1991) theory appeared to be a useful framework for predicting managers' intentions. In particular, attitudes and subjective norms were predictive of intentions to engage in either method. Only intentions toward the unstructured case were related to managers' actual behavior.

Huffcutt, A. I., & Roth, P. L. (1998). Racial group differences in employment interview evaluations. Journal of Applied Psychology, 83(2), 179-189.

The purpose of this meta-analysis was to research the various factors that can play a role in racial group differences resulting from an interview, such as the level of structure in the interview, job complexity, etc. Results suggest, in general, employment interviews do not affect minorities as much as other assessments (i.e., mental ability tests). Moreover, structured interviews tend to limit or decrease the influence of bias and stereotypes in ratings. High job complexity resulted in mean negative effect sizes for Black and Hispanic applicants, meaning they received higher overall ratings than White applicants. Behavior description interviews averaged smaller group differences than situational interviews, and group differences tended to be larger when there was a larger percentage of a minority (i.e., Black or Hispanic) in the applicant pool.

Huffcutt, A. I., Weekley, J. A., Wiesner, W. H., DeGroot, T. G., & Jones, C. (2001). Comparison of situational and behavior description interview questions for higher-level positions. Personnel Psychology, 54(3), 619-644.

This paper discusses two structured interview studies involving higher-level positions (military officer and a district manager) and had matching situational interviews and behavior description interviews (BDI) questions written to assess the same job characteristics. Results confirmed results of previous studies finding situational interviews are less effective for higher-level positions than BDIs. Moreover, results indicated very little correspondence between situational and behavior description questions written to assess the same job characteristic, and a link between BDI ratings and the personality trait Extroversion. Possible reasons for the lower situational interview effectiveness are discussed.

McFarland, L. A., Ryan, A. M., Sacco, J. M., Kriska, S. D. (2004). Examination of structured interview ratings across time: The effects of applicant race, rater race, and panel composition. Journal of Management, 30(4), 435-452.

This study looked at the effect of race on interview ratings for structured panel interviews (candidates were interviewed and rated by three raters of varying races). Results indicated panel composition produced the largest effect. Specifically, predominately White panels provided significantly more favorable ratings (of all candidates, regardless of race) than panels which consisting of predominately Black raters. Panel composition also affected ratings, such that Black raters provided higher ratings to Black candidates only when the panel was predominately Black. However, the authors caution these effects were rather small; therefore, the results should be cautiously interpreted.

Taylor, P., & Small, B. (2002). Asking applicants what they would do versus what they did do: A meta-analytic comparison of situational and past behavior employment interview questions. Journal of Occupational & Organizational Psychology, 75(3), 277-294.

Criterion-related validities and inter-rater reliabilities for structured employment interview studies using situational interview (SI) questions were compared with those from studies using behavioral description interview (BDI) questions. Validities and reliabilities were further analyzed in terms of whether descriptively-anchored rating scales were used to judge interviewees' answers, and validities for each question type were also assessed across three levels of job complexity. While both question formats yielded high validity estimates, studies using BDI questions, when used with descriptively anchored answer rating scales, yielded a substantially higher mean validity estimate than studies using the SI question format with descriptively-anchored answer rating scales (.63 vs .47). Question type (SI vs. BDI) was found to moderate interview validity. Inter-rater reliabilities were similar for both SI and BDI questions, provided descriptively-anchored rating scales were used, although they were slightly lower for BDI question studies lacking such rating scales.

Training and Experience (T & E) Evaluations References:

Lyons, T. J. (1989). Validity of Education and Experience Measured in Traditional Rating Schedule Procedures: A Review of the Literature. Office of Personnel Research and Development, U.S. Office of Personnel Management, Washington, DC, OPRD-89-02.

This paper reviews research on the validity of specific education and experience measures common to traditional rating schedule procedures used by the Federal Government. The validity of each measure is discussed and recommendations for rating schedule use are offered.

Lyons, T. J. (1988). Validity Research on Rating Schedule Methods: Status Report. Office of Personnel Research and Development, U.S. Office of Personnel Management, Washington, DC, OED-88-17.

This report summarizes the findings from a series of studies conducted on rating schedule validity. The first objective was to investigate the criterion-related validity of rating schedules used in the Federal Government and the second was to study the validity of three rating schedule methodologies. Results indicated little evidence of validity for a rating schedule method based on training and experience at either entry-level or full performance level jobs. Findings supported the validity of a Knowledge, Skills, and Abilities (KSA)-based rating schedule method for full performance level jobs, but not for entry level jobs. Except for one entry level study, results indicated the most promising validity coefficients (in the mid to upper .20's) for rating procedures employing behavioral consistency measures for both entry and full performance level jobs.

McCauley, D. E. (1987). Task-Based Rating Schedules: A Review. Office of Examination Development, U.S. Office of Personnel Management, Washington, DC, OED 87-15.

This paper reviews the evidence for the validity and practicality of the task-based rating schedule (TBRS), a self-report instrument used to assess applicants' training and experience in relation to job required tasks. A review of the background of the TBRS and the assumptions on which it is based are discussed. In addition, a discussion of meta-analytic results on the predictive validity of the TBRS is provided.

McDaniel, M. A., Schmidt, F. L., & Hunter, J. E. (1988). A meta-analysis of the validity of methods for rating training and experience in personnel selection. Personnel Psychology, 41, 283-309.

This paper discusses a meta-analysis of validity evidence of the methods (point, task, behavioral consistency, grouping, and job element) used to evaluate training and experience (T&E) ratings in personnel selection. Results indicate validity varied with the type of T&E evaluation procedure used. The job element and behavioral consistency methods each demonstrated useful levels of validity. Both the point and task methods yielded low mean validities with larger variability. Partial support was found for both the point and task methods being affected by a job experience moderator. Moderator analyses suggested the point method was most valid when the applicant pool had low mean levels of job experience and was least valid with an experienced applicant pool.

Schwartz, D. J. (1977). A job sampling approach to merit system examining. Personnel Psychology, 30(2), 175-185.

A method for collecting content validity evidence for a merit examining process or rating schedule without violating the principles of content validity is presented. This technique, called the job sampling approach, is a task-based, structured system of eliciting the information necessary to construct the rating schedule from sources most able to provide that information, and for using the information to construct the rating schedule and linking it to job performance. The steps include definition of the performance domain of the job in terms of process statements, identification of the selection and measurement objectives of the organization, development of the measurement domain in relation to the performance domain and to the selection and measurement objectives, and demonstration that a close match between the performance domain and the measurement domain was in fact achieved.

Sproule, C. F. (1990). Personnel Assessment Monographs: Recent Innovations in Public Sector Assessment (Vol 2, No. 2). International Personnel Management Association Assessment Council (IPMAAC).

This report reviews selected assessment methods and procedures (e.g., training and experience measures) used frequently in the public sector during the 1980s. Many of these, including the rating schedule, are still used today. Each section on assessments contains a variety of examples describing public sector methods, as well as a summery of related research findings. Other sections include discussions on selected Federal assessment innovations, application of technology to assessment, use of test scores, legal provisions related to assessment, and employment testing of persons with disabilities.

Table of Contents

U.S. Office of Personnel Management - Recruiting, Retaining and Honoring a World-Class Workforce to Serve the American People