A few years ago, I had a shocking and enlightening discussion about analytic methods with a group of epidemiologists and biostatisticians from Blue Cross Blue Shield of Michigan.
We were sitting around a table at a conference center in Lansing, where we were debriefing from a meeting of the Physician Group Incentive Program. We were talking about the methods for performance comparison. Everyone knew that we needed to “risk adjust” to take into account differences in patient characteristics when comparing the performance of different physicians, clinic practice units, and physician organizations. If we failed to properly risk adjust, the poorly performing providers would surely argue “my patients are sicker.”
Traditional Risk Adjustment using Standardization
When epidemiologists want to compare the performance of two health care providers on a level playing field, the traditional method is to do risk adjustment using an approach called standardization. The concept is to determine which patient or environmental variables influence the outcome of interest. These are called confounding variables, because differences in the mix of patients based on these variables can confound the performance comparison unless they are taken into consideration. Examples of such confounding variables include age, gender, the presence of co-morbid conditions, etc. If any of the confounding variables are continuous numbers, like age, the epidemiologist must first convert them to discrete categories, or groupings. For example, if age was the confounding variable, the epidemiologist might define categories for “adults” and “children.” Or, the epidemiologist might break age into ten-year ranges. If there is more than one confounding variable, the categories are defined based on the combinations of values, such as “adult females,” “adult males,” etc. These categories are sometimes called “risk strata” or “risk cells.” Then, for each of these categories, for each of the providers being compared, the epidemiologist calculates the outcome measure of interest, such as the mortality rate or the total cost of care. The result is a matrix of measure values for each of the risk cells for each provider. This matrix can be conceptualized as a “model” of the actual performance of each provider.
To create a level playing field for comparisons, the epidemiologist then creates a standardized population. The standardized population is simply the number or percentage of patients in each of the risk cells. Then, the model of each provider’s performance is applied to that standardized population to determine what the outcomes would have been if that provider had cared for the standardized mix of patients. For each of the risk cells, the standardized number of patients is multiplied by the actual performance that provider achieved for such patients. Then, the results for all the risk cells are aggregated to obtain that provider’s risk-adjusted performance. Another way of thinking about this is that the risk adjusted outcome is the weighted average outcome for all the risk cells, where the weights are the proportion of patients in that risk cell in the standardized, level-playing-field population. If the provider’s actual mix of patients was “sicker” or “higher risk” than the standardized population, then the risk adjusted outcome will be more favorable than the unadjusted outcome.
“Observed Over Expected” Analysis
In the literature, even in respected journals, I have seen many examples of performance comparisons that used a different analytic approach called “observed over expected” or “O/E,” rather than using the traditional standardization approach. A recent example is a paper regarding the mortality-avoidance performance of childrens’ hospitals. Just as with standardization, the O/E method begins by identifying confounding variables — the patient characteristics that are predictors of the outcome of interest. With O/E analysis, confounding variables that are continuous, like age, do not have to be converted to discrete categories or groupings. All the confounding variables are used as independent variables in a regression model. Then, the resulting regression model is applied to each individual patient observation, inserting the values of the predictor variables for that patient into the regression formula to obtain the “expected” value of the outcome of interest. At that point, you have the actual observed value and the expected value for each patient (or case). Then, you sum up the observed values for all the patients for a given provider. And, you sum up the expected values for all the patients for a given provider. Finally, you divide the sum of observed values by the sum of expected values to get the O/E ratio. If the ratio is greater than one, that is interpreted to mean that the provider has a higher-than-expected value for that outcome. If the outcome variable is undesirable, such as mortality rate, complication rate or cost, an O/E ratio of greater than one is interpreted to mean that the provider performed poorly compared to the other providers. People have been routinely using O/E analysis as if it was a way of doing risk-adjusted performance comparisons — as a way of “leveling the playing field” to do performance comparisons that take into account differences in patient characteristics.
But, sitting around the table in Lansing, I was shocked to realize that O/E analysis is not actually applicable for this purpose. Why? Because O/E analysis does not actually create a level playing field.
On the contrary, O/E analysis is conceptually the opposite of the traditional standardization approach. In traditional standardization, a model of each provider’s actual performance is applied to a standardized population. In O/E analysis, the regression model is essentially a model of typical performance. That regression model is applied to the actual population that received care from a particular provider. The problem is that different providers can see a different mix of patients. Consider the following simple calculations.
In this simple example, we are comparing just two providers. We are considering just one confounding variable, age. And, we are breaking that variable into just two categories, adults and children. As shown in the example, provider A sees mostly adults, while provider B sees mostly children. Provider B performs poorly in those children, but actually performs better that A in the adult population. Because provider B sees more children, the poor performance in children dominates the O/E calculation, so provider B looks bad in terms of an O/E ratio of 1.09. But, since there are more adults than children in the overall population, which we are using as the “standardized” population, provider B’s superior performance in adults dominates the risk adjustment. So, provider B has a risk-adjusted mortality rate that is better than provider A. In other words, if you use O/E analysis for level-playing-field performance comparisons, you may get the wrong answer.
Risk adjustment using standardization is not computationally difficult. But, it does require a bit of programming work to convert continuous variables such as age into categorical variables, such as breaking age into ten-year ranges. In my opinion, O/E analysis should not be used for risk-adjusted performance comparisons. O/E analysis does have the advantage of being computationally more convenient, particularly when you have many confounding variables and when some of them are continuous variables. But, it is not that much more work to do it right.
Social tagging: O/E Analysis > PGIP > Risk Adjustment


I remember this conversation very well and was quite surprised to learn about the frequent use of O/E metrics when, in reality, what we need are risk adjusted metrics. A must have conversation for anyone in the business of assessing and comparing performance.