The Monthly Review

Achieving the Gold Standard: Launch of IRR Application

Written by Joshua Spofford | 10/3/17 3:42 PM

The Quality, Research, & Measurement Department, in partnership with Information Technology and Accrediting & Client Services, is pleased to announce the release of an Inter-Rater Reliability application for use in 4th Quarter of this year.

The application was designed and built inhouse, providing a platform for bi-annual assessment of URAC’s Reviewers to identify how consistent their ratings of a mock desktop review are to a “gold standard.” Reviewers will be asked to review documentation from a past application submission and provide categorical ratings for compliance with each of the standards. Baseline will then be determined by two technical expert panels (TEP), each consisting of three seasoned reviewers, selected based on their specific subject matter expertise (clinical and pharmacy specialties, respectively). The panelists will independently evaluate the application before convening and discussing the results. Upon consensus, the baseline for the review is established.

The reviewers’ results will be compared against the TEP’s results in two ways.

First, an arithmetic mean is calculated for each of the reviewers and in aggregate for each specialty, comparing the number of ratings matched to the TEP with the overall number of ratings made.

Second, a kappa statistic is calculated for each individual reviewer and in aggregate for each specialty group. The kappa statistic measures the instance of agreement taking into account the probability of agreeing by chance alone (e.g., guessing). Like most correlation statistics, the kappa can range from -1 to +1, where 0 represents the amount of agreement that can be expected from random chance, and 1 represents perfect agreement between raters. A negative value indicates there is less agreement than would be expected by chance. URAC has classified resulting kappa scores into three categories, based on specified ranges according to academically-acceptable results as follows: (1) High = >0.60; (2) Moderate = 0.60>x>0.41; Low = <0.41. If results are found at the individual or aggregate levels to be below “High”, intervention will be taken in the form of education and re-training (as applicable).

If you have any questions or would like to learn more, please feel free to contact me at jspofford@urac.org or 202-326-3971.