So, is there any good use of O/E analysis? Yes. It’s called Benchmark Opportunity Analysis.

In last week’s post, I argued that observed over expected analysis (O/E) was commonly misused as a method for doing “level playing field” performance comparisons.  I recommend against using it for that purpose.

But, is there some other good use for O/E analysis?

I can’t think of a good use for the O/E ratio itself — the metric derived by dividing observed performance by expected performance.  But, it turns out that the underlying idea of developing a model of performance that has been shown to be achievable is very useful to identify and prioritize opportunities for improvement.  The idea is to apply such a model to a health care provider’s actual population, and then compare those “achievable” results with the actual performance of the provider to see how much room there is for improvement.  I like to call this “opportunity analysis.”

There are two main variations on the “opportunity analysis” theme.  The first approach is to consider the overall average performance achieved by all providers as the goal.  The basic idea is to estimate how much the outcome will improve for each provider if they focused on remediating their performance for each risk cell where they have historically performed worse than average.  The analysis calculates the magnitude of improvement they would achieve if they were able to move their performance up to the level of mediocrity for such risk cells, while maintaining their current level of performance in any risk cells where they have historically performed at or above average.  A good name for this might be “mediocrity opportunity analysis,” to emphasize the uninspiring premise.

The second variation on this approach challenges providers to achieve excellence, rather than just mediocrity. I like to call this “benchmark opportunity analysis.”   The idea is to create a model of the actual performance of the one or more providers that achieves the best overall performance, called the “benchmark providers.” Then, this benchmark performance model is applied to the actual population of each provider, to estimate the results that could be achieved, taking into account differences in the characteristics of the patient populations.  These achievable benchmark results are compared to the actual performance observed.  The difference is interpreted as the opportunity to improve outcomes by emulating the processes that produced the benchmark performance.

As shown in this illustrative graphic, a benchmark opportunity analysis compares different improvement opportunities for the same provider.  In the example, Acme Care Partners could achieve the greatest savings by focusing their efforts on improving the appropriateness of high tech radiology services. In contrast, Acme Care Partners is already achieving benchmark performance in appropriateness of low tech radiology services, and therefore has zero opportunity for savings from improving up to the benchmark level.  That does not mean that they can’t improve.  No analysis can predict the opportunity for true innovation.  Benchmark opportunity analysis is just a tool for pointing out the largest opportunities for emulating peers that already perform well, taking into account the differences in the mix of patients between a provider organization and it’s high performing peers.

This method is generally consistent with the “achievable benchmarks of care” (ABC) framework proposed more than 10 years ago by the Center for Outcomes and Effectiveness Research and Education at the University of Alabama at Birmingham.  However, that group advises against using the method for financial performance measures, presumably out of fear that it could encourage inappropriate under-utilization.  I consider that a valid concern.   To reduce that risk, I advocate for a stronger test of “achievability” for cost and utilization performance measures.  In the conventional ABC framework for quality measures, “achievability” is defined as the level of performance of the highest-performing set of providers that, together, deliver care to at least 10% of the overall population.  Such a definition is preferable to simply setting the benchmark at the level of performance achieved by the single highest-performing provider because a single provider might have gotten lucky to achieve extremely favorable performance.  When I apply the achievable benchmark concept to utilization or cost measures, I set the benchmark more conservatively than for quality measures.  For such measures, I use 20% rather than 10% so as to avoid setting a standard that encourages extremely low utilization or cost that could represent inappropriate under-utilization.

Note that one provider may have benchmark care processes that would achieve the best outcomes in a more typical population, but that same provider may have an unusual mix of patients that includes a large portion of patients for whom they don’t perform well, creating a large opportunity for improvement.  The key point is that opportunity analysis is the right method to compare and prioritize alternative improvement initiatives for the same provider.  But the results of opportunity analyses should not be used to compare the performance of providers.

The following graphic summarizes the comparison of traditional risk adjustment, O/E analysis, and benchmark opportunity analysis.

Simple Example Calculations

For those readers interested in a little more detail, the following table uses the same raw data from the calculations from last week’s post to illustrate the approach.

As shown in this example, Provider A has worse performance (higher mortality rate) than Provider B in adults.  So, Provider B is the benchmark performer in the adult risk cell.  If Provider A improved from 6.41% mortality down to the 5.00% mortality level of Provider B, it could save the lives of 11 adults per year.  Provider B has worse performance in children.  If Provider B improved its performance in children up to the level achieved by Provider A, while still achieving its benchmark level of performance in adults, it could save 1 life per year.

Read More

Observed over expected (O/E) analysis is commonly misapplied to performance comparisons. Please don’t.

A few years ago, I had a shocking and enlightening discussion about analytic methods with a group of epidemiologists and biostatisticians from Blue Cross Blue Shield of Michigan.

PGIP Quarterly Meeting in Lansing

We were sitting around a table at a conference center in Lansing, where we were debriefing from a meeting of the Physician Group Incentive Program. We were talking about the methods for performance comparison. Everyone knew that we needed to “risk adjust” to take into account differences in patient characteristics when comparing the performance of different physicians, clinic practice units, and physician organizations. If we failed to properly risk adjust, the poorly performing providers would surely argue “my patients are sicker.”

Traditional Risk Adjustment using Standardization

When epidemiologists want to compare the performance of two health care providers on a level playing field, the traditional method is to do risk adjustment using an approach called standardization.    The concept is to determine which patient or environmental variables influence the outcome of interest.  These are called confounding variables, because differences in the mix of patients based on these variables can confound the performance comparison unless they are taken into consideration. Examples of such confounding variables include age, gender, the presence of co-morbid conditions, etc.  If any of the confounding variables are continuous numbers, like age, the epidemiologist must first convert them to discrete categories, or groupings.  For example, if age was the confounding variable, the epidemiologist might define categories for “adults” and “children.”  Or, the epidemiologist might break age into ten-year ranges.  If there is more than one confounding variable, the categories are defined based on the combinations of values, such as “adult females,” “adult males,” etc.  These categories are sometimes called “risk strata” or “risk cells.”  Then, for each of these categories, for each of the providers being compared, the epidemiologist calculates the outcome measure of interest, such as the mortality rate or the total cost of care.  The result is a matrix of measure values for each of the risk cells for each provider.  This matrix can be conceptualized as a “model” of the actual performance of each provider.

To create a level playing field for comparisons, the epidemiologist then creates a standardized population.  The standardized population is simply the number or percentage of patients in each of the risk cells.  Then, the model of each provider’s performance is applied to that standardized population to determine what the outcomes would have been if that provider had cared for the standardized mix of patients.  For each of the risk cells, the standardized number of patients is multiplied by the actual performance that provider achieved for such patients.  Then, the results for all the risk cells are aggregated to obtain that provider’s risk-adjusted performance. Another way of thinking about this is that the risk adjusted outcome is the weighted average outcome for all the risk cells, where the weights are the proportion of patients in that risk cell in the standardized, level-playing-field population.  If the provider’s actual mix of patients was “sicker” or “higher risk” than the standardized population, then the risk adjusted outcome will be more favorable than the unadjusted outcome.

“Observed Over Expected” Analysis

In the literature, even in respected journals, I have seen many examples of performance comparisons that used a different analytic approach called “observed over expected” or “O/E,” rather than using the traditional standardization approach.  A recent example is a paper regarding the mortality-avoidance performance of childrens’ hospitals.  Just as with standardization, the O/E method begins by identifying confounding variables — the patient characteristics that are predictors of the outcome of interest.   With O/E analysis, confounding variables that are continuous, like age, do not have to be converted to discrete categories or groupings.  All the confounding variables are used as independent variables in a regression model.  Then, the resulting regression model is applied to each individual patient observation, inserting the values of the predictor variables for that patient into the regression formula to obtain the “expected” value of the outcome of interest.  At that point, you have the actual observed value and the expected value for each patient (or case).  Then, you sum up the observed values for all the patients for a given provider.  And, you sum up the expected values for all the patients for a given provider.  Finally, you divide the sum of observed values by the sum of expected values to get the O/E ratio.  If the ratio is greater than one, that is interpreted to mean that the provider has a higher-than-expected value for that outcome.  If the outcome variable is undesirable, such as mortality rate, complication rate or cost, an O/E ratio of greater than one is interpreted to mean that the provider performed poorly compared to the other providers.  People have been routinely using O/E analysis as if it was a way of doing risk-adjusted performance comparisons — as a way of “leveling the playing field” to do performance comparisons that take into account differences in patient characteristics.

But, sitting around the table in Lansing, I was shocked to realize that O/E analysis is not actually applicable for this purpose.  Why? Because O/E analysis does not actually create a level playing field.

On the contrary, O/E analysis is conceptually the opposite of the traditional standardization approach.  In traditional standardization, a model of each provider’s actual performance is applied to a standardized population.  In O/E analysis, the regression model is essentially a model of typical performance.  That regression model is applied to the actual population that received care from a particular provider.  The problem is that different providers can see a different mix of patients.  Consider the following simple calculations.

In this simple example, we are comparing just two providers.  We are considering just one confounding variable, age.  And, we are breaking that variable into just two categories, adults and children.  As shown in the example, provider A sees mostly adults, while provider B sees mostly children.  Provider B performs poorly in those children, but actually performs better that A in the adult population.  Because provider B sees more children, the poor performance in children dominates the O/E calculation, so provider B looks bad in terms of an O/E ratio of 1.09.  But, since there are more adults than children in the overall population, which we are using as the “standardized” population, provider B’s superior performance in adults dominates the risk adjustment.  So, provider B has a risk-adjusted mortality rate that is better than provider A.  In other words, if you use O/E analysis for level-playing-field performance comparisons, you may get the wrong answer.

Risk adjustment using standardization is not computationally difficult.  But, it does require a bit of programming work to convert continuous variables such as age into categorical variables, such as breaking age into ten-year ranges.  In my opinion, O/E analysis should not be used for risk-adjusted performance comparisons.  O/E analysis does have the advantage of being computationally more convenient, particularly when you have many confounding variables and when some of them are continuous variables.  But, it is not that much more work to do it right.

Read More

Dr. Ward to speak at 5th Annual Predictive Modeling Summit, November 8-9, 2011, Washington, DC

The 5th Annual Predictive Modeling Summit is described by conference organizers as the “leading forum on predictive analytics applied to key health care functions, settings and populations.”

Dr. Ward will be giving a presentation entitled “Using Intervention Models and Predictive Models to Optimize Patient Selection for Care Management in ACOs” at 1pm on November 9, 2011.

Conference details:

Hope to see you there!


Read More

Klar 3: The necessity of re-qualifying the population to avoid regression-toward-the-mean bias in historical comparison groups

Ron Klar, MD, MPH

Ron Klar, MD, MPH is a health care consultant with a long history of involvement in federal health care policy and health plan innovation. He published a recent series of three posts regarding the draft rules for the Medicare Shared Savings Program (MSSP) in the Health Affairs Blog, an influential forum for debating health policy issues. This is my third in a series of 4 posts describing areas of agreement and disagreement with Dr. Klar. In my first post, I described areas of agreement.  In my second, I covered my disagreements about Dr. Klar’s proposed changes regarding care relationship derivation.  In this post, I will describe my disagreement regarding Klar’s proposed changes to the approach to selecting a comparison group for savings assessment.

In the draft MSSP rules, CMS proposed two “options” for methods of selecting the comparison group for determining savings. The rules, following the lead of the health reform legislation language, mislabels the comparison group as a “benchmark.” CMS is not really trying to determine if an ACO is better than or comparable to the best-performing provider organization, as is implied by using the term “benchmark.”  What they really intend is to compare the actual cost to the cost that would have been expected to occur if the same beneficiaries had been cared for by non-ACO providers. CMS indicates in the draft rule that they prefer option 1, which involves using the same assignment algorithm in the prior time period as is used for the accountability/performance period. This approach is described as “requalification” in the care management evaluation standards published by the Disease Management Association of America (DMAA). Option 2, for which CMS is seeking feedback, involves using historical information for the cohort of beneficiaries that was actually assigned to the ACO.

In Dr. Klar’s first post, he explained that he prefers option 2, arguing that option 1 has no “face validity” because the individual beneficiaries will be different. I strongly disagree.

As I noted in my blog post last week, when claims-based patient selection logic is applied, the selection is determined based not only on unchanging characteristics of the person (like gender), but also on data regarding health care events that happened at particular points in time. The person-months in the years before meeting the assignment criteria do not have the same risk as the person-months after meeting the assignment criteria. There is randomness in the timing of events, as people experience peaks and valleys of individual risk. When you select people based on recent health care events, you are not selecting randomly. You are preferentially picking people who tend to be in a risk peak as evidenced by recent health care utilization. Without any intervention, continuing random variation will cause the risk of the selected population to decrease over time, toward the mean risk of the overall population. This is known as a regression-toward-the-mean bias.  This type of bias is strongest when the patient is being purposefully selected based on being a high risk outlier, such as when a predictive model is used to generate a risk score used to select patients to be targeted for outreach for a care management program.  But, this type of bias exists in a weaker form for any patient selection based on recent health care utilization.  Patients naturally have higher risk in the time periods just before and after health care utilization, since they seek health care in response to illness episodes that drive cost.  To avoid regression-toward-the-mean bias, I prefer option 1, which offers a symmetrical selection process for the ACO intervention population and the historical comparison population.

Dr. Klar correctly points out that if no risk adjustment is done, ACOs could be incentivized to preferentially seek care relationships with lower risk patients. I feel this should be solved by doing risk adjustment (as has been proposed in the rule), rather than by using option 2.

Klar goes on to propose a variety of additional modifications to the rules that illustrate the complications of using the option 2 pre-post design, such as having to apply a weighted average scheme to deal with people with different numbers of years of available history and people who died during the performance period.



Read More