A More Optimistic Perspective on Performance Appraisals

4 min read

The clocks just turned back for daylight savings time, and managers can look forward to the upcoming performance appraisal season. Yes, every manager’s least favorite activity: constrained budgets, rigid rating scales, contrived competencies, the uncomfortable feeling that comes with judging the worth of other human beings, and, of course, the reality of having to give negative reviews to your employees. Not fun.

Performance appraisals (PAs) are consequential to organizations and employees, influencing developmental feedback, pay decisions, and promotional opportunities. Yet, there have long been concerns over the accuracy of PA ratings. When a supervisor marks an employee as a “3 – meets expectations” on a 1-to-5 scale, is that truly reflective of that person’s performance? Did the person who received a rating of 4 actually exhibit superior performance? Think back to times when you were rated and how you felt.

Reliability of Ratings

The reliability of PA ratings is a statistical term that loosely reflects how much two raters agree with one another when assessing the same employee. If two managers cannot agree that Joe is a good performer, how can we trust that either manager’s evaluation actually reflects Joe’s performance at work?

For decades, academic research has worked under the assumption that the reliability of PA ratings was low—so low, in fact, that the most commonly used estimate of PA reliability suggested that only approximately 50 percent of a manager’s PA rating was due to the rated employee’s performance behaviors (Viswesvaran et al., 1996). That’s a disappointing value, and especially considering how important PA ratings are to an employee’s career (pay increases!).

Recently, my colleagues (Angie Delacruz, Lauren Wegmeyer, James Perrotta) and I conducted a study that addressed some methodological challenges to previous PA reliability estimates (in press at the Journal of Applied Psychology). Specifically, we conducted a meta-analysis (a study that aggregates the results from many studies) that isolated PA reliability studies only from situations where an employee was rated by two direct supervisors—two managers directly overseeing an employee’s work.

Our logic for this is simple. When discussing the reliability of PA ratings, we are generally talking about whether a person’s direct supervisor is reliable. Despite this, many past PA reliability estimates have relied on designs where non-direct supervisors made ratings (e.g., manager from another team, a more senior manager). In such cases, those managers are less likely to have adequately observed the employee’s job performance. For instance, consider a department divided into two teams: team A and B. Each team is led by a manager, with the vice president overseeing the entire department. In this structure, can we expect the team A manager to have an in-depth understanding of the team B employees? While possible in certain cases, it’s improbable that a manager would frequently supervise employees from a different team, nor would they likely possess the same level of insight as the employees’ direct supervisor. Similarly, it’s doubtful that the vice president would have sufficient opportunity to observe the day-to-day activities of a lower-level employee, including their work’s volume and quality, the necessity for rework, or their interactions with colleagues. Typically, such close supervision falls under the purview of the employee’s immediate manager.

Thus, in our research, we isolated to PA reliability estimates that came from two managers who each directly supervised the employee in question. After winnowing studies to a more restrictive set of 22 PA reliability estimates, we found that average PA reliability for direct supervisor ratings increased, with 65 percent of PA ratings being attributable to employee job performance. That’s a marked increase over the previous estimate.

Sound Performance Appraisal Design

Of course, one could easily hope for even higher reliability given that performance ratings facilitate employment decisions that play a considerable role in employees’ lives. Although no measure is perfect (i.e., there will always be error), 65 percent is still probably not high enough for most people. The good news is that by working from this higher baseline, organizations can likely achieve more reasonable reliability given sound PA design. For example, training raters (Roch et al., 2012), incorporating more sophisticated rating scale formats (Hoffman et al., 2012), implementing rater accountability (Mero & Motowidlo, 1995; Roch, 2006; Tenbrink & Speer, 2022), and requiring calibration meetings (Speer et al., 2019) have all been shown to have positive effects on the quality of performance ratings. Thus, if companies implement best practices design, that might be enough to achieve more appropriate PA reliability.

Just as we recently turned back the clocks, we also now turn a new page in understanding performance appraisals. The findings from this research offer optimism despite longstanding skepticism regarding PA evaluations, though more research and attention are needed.

You May Also Like

More From Author

+ There are no comments

Add yours