What Is Borderline Regression in OSCE Assessment?

The borderline regression method is a statistical technique used to set pass/fail cutoff scores on clinical exams, most commonly the Objective Structured Clinical Examination (OSCE). It works by combining two pieces of data: a detailed checklist score for each student and an examiner’s overall impression of whether that student performed at a passing level. A regression equation links the two, producing a defensible pass mark grounded in real exam performance rather than an arbitrary number.

Why Clinical Exams Need a Special Pass Mark

Unlike a written test where you might set 70% as the passing grade, clinical exams involve live patient encounters, physical examinations, and communication skills. The difficulty varies from station to station, and what counts as “good enough” depends on the specific task. A fixed percentage cutoff doesn’t account for these differences. Standard setting methods like the borderline regression method exist to calculate a unique, evidence-based pass mark for each station, reflecting how hard that particular task actually was for the group of students who took it.

How the Calculation Works

At each OSCE station, the examiner does two things. First, they mark a detailed checklist as the student performs the task, producing a numerical checklist score. Second, they give a separate overall global rating, typically on a scale such as “clear fail,” “borderline,” “clear pass,” or “good pass.” These global ratings reflect the examiner’s holistic judgment of whether the student is competent at that station.

After the exam, a simple linear regression is run with the global rating as the input variable and the checklist score as the output. This produces a best-fit line showing the relationship between the examiner’s overall impression and the detailed score. The pass mark is then read off this line: it’s the checklist score that corresponds to the “borderline” rating on the global scale. In other words, the method asks, “What checklist score does a student typically get when the examiner considers them right on the edge of passing?” That number becomes the cutoff.

How to Tell If the Result Is Reliable

The key quality check is the R-squared value of the regression. This measures how well the examiner’s global rating actually predicts the checklist score. An R-squared of 1.0 would mean the two are perfectly aligned. In practice, a value between 0.85 and 1.0 is considered strong, meaning the checklist captures roughly the same judgment as the examiner’s overall impression. A value of 0.5 or below raises a red flag: it suggests the checklist and the examiner’s holistic assessment are measuring somewhat different things.

In published studies, R-squared values commonly land around 0.5, which most researchers consider reasonable if not ideal. One study examining multiple OSCE stations found values ranging from 0.44 to 0.79, with the weakest station (a breast examination) falling below the 0.5 threshold. Stations where the checklist closely mirrors what examiners value, like an abdominal examination in that same study, tend to produce stronger R-squared values and more reliable pass marks.

When R-squared is low at a particular station, it’s worth investigating. The checklist may be poorly designed, the global rating scale may be confusing examiners, or the task itself may be difficult to assess consistently. Low values don’t necessarily invalidate the pass mark, but they signal that the station needs attention.

How It Compares to the Borderline Group Method

A closely related approach is the borderline group method. In that technique, examiners identify which students they consider “borderline,” and the pass mark is simply the average checklist score of those borderline students. It’s intuitive and easy to calculate, but it has a significant weakness: it only uses data from the small subset of students rated as borderline. If few students fall in that range, the pass mark rests on a very small sample and can be unreliable.

The borderline regression method solves this problem by using data from every student who sat the exam, not just those near the cutoff. The regression line is fitted to the full range of performances, from clear fails to strong passes, making the resulting pass mark more statistically stable. This is especially valuable in smaller exams where only a handful of students might be rated borderline. Research comparing both approaches in small-scale OSCEs has found that the regression method generally produces tighter confidence intervals around the pass mark, meaning there’s less uncertainty about where the line should be drawn.

Why It’s Widely Used in Medical Education

The borderline regression method has become one of the most popular standard-setting approaches for OSCEs across medical schools worldwide for several practical reasons. It requires no extra work from examiners beyond what they’re already doing: marking a checklist and giving a global rating. There are no pre-exam panel meetings to judge hypothetical “borderline” students, no complex multi-round discussions. The pass mark is calculated after the exam using data that was collected naturally during it.

It also produces a different pass mark for each station, which reflects reality. A station involving a straightforward blood pressure measurement should have a higher pass threshold than one requiring a complex psychiatric history. Because the method is grounded in how actual students performed and how examiners judged them, the pass marks adjust automatically to station difficulty.

Limitations to Keep in Mind

The method assumes that examiners use the global rating scale consistently and meaningfully. If examiners avoid extreme ratings, clustering everyone in the middle, the regression loses its predictive power. Training examiners to use the full range of the global scale is essential for the method to work well.

It also requires a reasonable number of students. With very small cohorts, the regression line can be heavily influenced by a few unusual performances, making the pass mark less trustworthy. Most institutions find the method works well with cohorts of at least 50 to 100 students per station, though it has been applied in smaller settings with appropriate caution.

Finally, the method is retrospective. The pass mark can only be calculated after the exam has been completed, which means students and faculty won’t know the exact cutoff in advance. For institutions that prefer to communicate a clear standard before the exam, this can feel like a drawback, though it’s also what makes the method responsive to actual exam conditions rather than predictions about them.