How to use and improve predictive models

Read More

Three ways to keep it simple — one of which is bad

“Keep it simple, stupid.”   The “K.I.S.S.” principle.  Generally a good idea, but not always.

Types of Simplification

Consider three types of simplification:

  1. Leaning.  This is about getting rid of waste. When simplifying a design, leaning involves getting rid of unnecessary features.  When simplifying communications, leaning involves getting rid of information that is duplicative, unimportant or just decorative.  Edward Tufte, one of my heroes, is a statistician, artist and graphical designer and zen master of simplicity. He famously rails against “chart junk” and advocates for maximizing the “data – ink ratio.”
  2. Summarizing.  This is about dropping one or more layers of detail.  It is accomplished by grouping smaller details into categories or themes and dropping the details from the communication.  Summarization makes the information “blurry” but still tells the truth.  Summarization satisfies some readers.  To others, it serves as an introduction and and invitation to dive deeper.
  3. Glossing.  This requires making the information conform to a desired level of simplicity, even if it means fibbing. For example, a system may have three components that interact with one another.  Describing the interactions may be tedious to explain.  The interactions may require additional arrows on a diagram.  Glossing it involves escaping this annoying complexity by denying it.   Many companies create diagrams describing the components that make up their product or solution.  As described in Ian Gorton’s book on software architecture, such marketing diagrams are colloquially called “marketecture” diagrams.  Designers of such diagrams often take great liberties with their depiction of the solution, portraying it as being made up of components that conveniently correspond to the sources of value to the prospective customer, even when the actual technology components are organized in an entirely different way.  Glossing it can sometimes be helpful to communicate some deeper truth, almost like a metaphor or parable.  But, often times, glossing involves intentionally obfuscating the truth, making the solution appear to be better or simpler than it really is.

Einstein Simplification

Read More

Conceptualizing “over-treatment” waste: Don’t deny health economics

A Health Policy Brief published in Health Affairs on December 13, 2012 referenced an analysis published last April in JAMA regarding waste in health care.  In this analysis, Don Berwick (one of my health care heroes) and Andrew Hackbarth (from RAND) estimated that waste in health care consumed between $476 billion and $992 billion of the $2.6 trillion annual health care expenditures in the US.  That’s 18-37% waste.  They divided this estimate into 5 categories of waste.  Their mid-point estimates are as follows:

Berwick and Hackbarth estimates of waste in health care - JAMA 2011

They consider “failures in care delivery” to include failures to execute preventive services or safety best practices, resulting in avoidable adverse events that require expensive remediation.  By “failures of care coordination,” they mean care that is fragmented, such as poorly planned transitions of care, resulting in avoidable hospital readmissions.  They categorize as “overtreatment” care ordered by providers that ignored scientific evidence, were motivated to increase income or to avoid medical malpractice liability, or out of convenience or habit.  They considered “administrative complexity” to be spending resulting from “inefficient or flawed rules” of insurance companies, government agencies or accreditation organizations.  They estimated the magnitude of administrative complexity by comparing administrative expense in the US to that in Canada’s single payer system.  They considered “pricing failures” to be prices that are greater than those which are justified by cost of production plus a “reasonable profit,” presumably due to the absence of price transparency or market competition.  Finally, they considered “fraud and abuse” to be the cost of fake medical bills and the additional inspections and regulations to catch such wrongdoing.

Underestimating Over-treatment

These estimates are generally in alignment with other attempts to categorize and assess the magnitude of waste in health care.  But, I think Berwick and Hackbarth’s estimates of “overtreatment” are probably far too low.  That’s because they, like so many other health care leaders, are so reluctant to address the issue of cost-effectiveness.  Obviously, the definition of over-treatment depends on one’s philosophy for determining what treatments are necessary in the first place.  Everyone would agree that a service that does more harm than good for the patient is not necessary.  Most would agree that a service that a competent, informed patient does not want is not necessary.  Some argue that, if there is no evidence that a treatment is effective, it should not be considered necessary, while others argue that even unproven treatments should be considered necessary if the patients wants it.   Berwick and Hackbarth are ambiguous about their application of this last category.

But, the big disagreement occurs when evaluating treatments for which there is evidence that the treatment offers some benefit, but the magnitude of the benefit is small in relation to the cost of the treatment.  This is a question about cost-effectiveness.  It is at the heart of medical economics.  In my experience, most health care leaders and an even higher proportion of political leaders choose to deny the principles of medical economics and the concept of cost-effectiveness.  They describe attempts to apply those principles as “rationing” — a term which has taken on a sinister, greedy meaning, rather than connoting the sincere application of rational thought to the allocation of limited resources.   Berwick and Hackbarth implicitly take that view.  They are unwilling to define over-treatment based on cost-ineffectiveness.

The analysis I want to see

For years, I’ve been looking for an analysis that attempted to estimate the magnitude of waste from over-treatment based on the principles of health economics.  The diagram below illustrates the hypothetical results of the type of analysis I’d like to see.

Diagram re Conceptualizing Overtreatment

 In this diagram, the horizontal axis represents the total cost of health care to a population.  I don’t want to see the entire US health care system.  What is more relevant is the population served by an Accountable Care Organization or an HMO.  To create such a diagram, we would first need to break down health care cost into a large number of specific treatment scenarios.  Each of these scenarios would specify a particular treatment (or diagnostic test) with enough clinical context to permit an assessment of the likely health and economic outcomes.  For each scenario, each of the possible health outcomes would be assigned a probability, a duration, and a quality of life factor.  My multiplying the duration by the quality of life factor, we could calculate the “quality-adjusted life years” (or “QALY”) for the outcome.  Then, by taking the probability-weighted average of all the possible health states for the scenario, and then dividing the result by the cost, we could calculate the “cost-effectiveness ratio” for the scenario, measured in “$/QALY.”   Then, we would sort all the treatment scenarios by the cost-effectiveness ratios, with the treatment scenarios with the most favorable health economic characteristics on the left.

Some of the scenarios will generate net savings, such as for certain preventive services where the cost of the avoided disease is greater than the initial cost of the preventive service.  These are clearly cost-effective.  On the other end of the spectrum are scenarios that offer net harm to the patient, such as when adverse side-effects are worse than the benefits of the treatment.  These are clearly cost-ineffective.  In the middle of these extremes are scenarios where there is a positive net benefit to the patient and a positive net cost borne by the population.

If a person rejects the principles of health economics, they would consider all of these middle scenarios to be “necessary” or “appropriate” regardless of how small the benefits or how large the costs.  But, among those who accept the principles of health economics, some of these scenarios could be judged to be cost-effective and others to be cost-ineffective.  Such judgments would presumably reveal some threshold cost-effectiveness ratio that generally separated the scenarios into cost-effective and cost-ineffective.  Since different people have different values, their judgments could reveal different cost-effectiveness thresholds.  If we had many people making these judgments, we could find a range of cost-effectiveness ratios that were considered to be reasonable by 90% of the people.    Applying this range to all the treatment scenarios, one could find a group of scenarios that were considered wasteful by most, and another group of scenarios that were considered wasteful only by some.

Variations on this theme have been used throughout the world for decades by various practice guidelines developers, healthcare policy analysts, health services researchers and health economists.  It is complex and time-consuming.  As I’ve discussed before, it is also controversial in the United States.

Right now, in the US, we all recognize that health care costs are too high.  We’re all focusing on merging providers into larger organizations, installing computer software, and exploring new reimbursement arrangements to address the problem.  But, I’m convinced that over-treatment with cost-ineffective services is a major source of waste.  We will inevitably get back to the core issue of having to figure out which treatment scenarios are wasteful.  We will inevitably have to overcome our denial of health economics and our irrational fear of rational allocation.

 

Read More

Don’t pave the cow paths: The challenge of re-conceptualizing health care processes

There is a popular adage for information technology professionals: “Don’t pave the cow paths.”

I recently worked with a client from Texas, and they were fond of this adage. They said it with the most excellent drawl, giving it extra credibility, as if they may have actually learned it the hard way on their own ranches.

In the IT context, I interpret the adage to mean:

When designing an information technology solution for a business area, don’t just learn how they did the process manually and then create a software application to transfer that same process to computers. Rather, try to understand the underlying problem that the business process is trying to solve and the value that the business process is intended to create, and then take the opportunity to design a different processone that is rendered feasible with enabling information technology and that delivers greater value.

When designing a new process to replace an old one, the starting point is re-conceptualization. The process designer must shed some of the terminology used to describe the old process when that terminology is too strongly tied to the details of the old process. Rather, the designer must dig down to the more essential concepts, and choose labels that are simpler and purer, seeking fresh metaphors to provide cleaner terminology. Then, the new process and the associated data structures can be re-constructed on top of a conceptual foundation that is easier to talk about, simpler to manage, and more stable.

Once we have a strong conceptual foundation, we can then flesh out the details of how the process can be made leaner and more effective, enabled by information technologies. Obviously, the proposed new process design influences the selection and configuration of enabling technologies. But, awareness of the capabilities of various technologies can also help generate ideas about candidate process designs that will be rendered feasible by the technologies. Therefore, this process is inherently iterative. The old-school philosophy of getting sign-off on detailed system requirements before considering the technology solution was a response to a valid concern that people will fall in love with a technology and then inappropriately constrain their process design accordingly. But, applying that philosophy too rigorously causes the opposite problem. If process designers don’t know what’s possible, they naturally stick with their old conceptualization, which also serves to inappropriately constrain their process design. As with most hard things, effectiveness requires finding the right balance between two undesirable extremes.

An example: the case of “registries.”  

A “registry” is a list of patients. The list includes the evidence-based services they need and whether or not they have received them. It is like a tickler file to help members of the clinical team remember what preventive services and chronic condition management services need to be done, so the team can improve their performance on quality of care metrics and provide better care to their patients.

But, if you dig down, the more fundamental purpose of the registry can be conceptualized as enabling care relationship management and care planning processes. Conceptually, health care providers need to know which patients they are taking care of. That’s care relationship management. It involves integrating different sources of information about care relationships, including derived care relationshpis based on claims data (also called “attribution”) and declared care relationships from health plans, patients and physicians. Part of the function of a registry is to clarify and make explicit those care relationships. This simple function can be considered  radical to clinicians who have become accustomed to an environment where such care relationships have been ambiguous and implicit.

If a physician has a care relationship with a patient, then, conceptually, he or she has a professional responsibility to make and execute a plan of care for that patient. Care planning is the process of determining which problems the patient has and what services are needed to address those problems. Conceptually, a good care planning process also includes provisions for multi-disciplinary input by members of the clinical team.  And, a good care planning process also includes decision support, including alerts for necessary things missing from the care plan, and critique of things that have been put on the care plan.  Such critique can be based on clinical grounds, such as drug-drug interactions, drug allergies and drug dosing appropriateness. Or, they can be based on evidence-based medicine or health economic grounds, such as in utilization management processes.

The name “registry” is tied historically to the word “register” which is a type of paper book used for keeping lists of things. In the health care context, “registries” were used by public health officials to track births, outbreaks of infectious diseases and cancer tumors. So, when people think about chronic disease registries, their mental model of keeping a paper list is a barrier to their willingness to re-conceptualize the underlying function differently.  But, more fundamentally, a “registry” is just one type of tool to facilitate care relationship management and care planning — a tool designed to be used for a narrowly defined list of problems and services, rather than being designed for more general use.

Today, there is no single care plan for most patients.  The function of keeping track of the problems that need to be addressed is either not done or it is done in a haphazard way, peppered across various structured and unstructured encounter notes, scribbled on problem lists, auto-generated in clinical summaries based on diagnosis codes on billing records, checked off on health risk appraisals, etc.   The function of figuring out which services are necessary to address each problem is peppered across numerous clinical notes, requisition forms, e-prescribing systems, order entry systems, care management “action lists” and in the fields of registry systems.  The function of facilitating interdisciplinary input to a patient’s care occurs informally in hallway conversations, morning rounds, tumor board meetings, or, most commonly, not at all.  These are all care planning functions, but most clinicians have no familiarity with the concept of linking these diverse bits of data and process in a cleaner, simpler notion of managing a single care plan to be used and updated over time by the entire care team.  As far as they are concerned, such a notion is probably infeasible and unrealistic.  They’ve never seen a technology that can enable it to become reality.

Choosing the right leap distance.

Of course, when re-conceptualizing processes, it is possible to go too far.  Old habits, mental models, terminology, and processes die hard.  If your re-conceptualization is a great leap to a distant future state of elegantly conceptualized processes, it might end up being too difficult to convince people to take the leap with you.  Other adages come to mind:  “Don’t get in front of your headlights.” Then there is President Obama’s version: Don’t get “out over your skis.”  And my favorite, often quoted by Tom Durel, my old boss at Oceania, “Never confuse a clear vision for a short distance.”

The optimal “leap distance” is a function of motivation to change.  If people start to experience great pain in their current state and begin to fear the consequences of sticking to their old ways, change happens.  As we move forward to a world where providers are taking more economic risk and facing more severe consequences for failing to improve quality of care, we will be able to pursue bolder innovation and leap greater distances in our process and technology improvements.

Read More

To resolve conflicts, re-frame polar positions as optimization between undesirable extremes. But, sometimes there is no way to win.

In politics and professional life, achieving success requires the ability to resolve conflicts.  I’ve noticed that conflicts often become entrenched because the opposing parties both simplify the conflict in black and white terms. They conceptualize their own interest as moving in one direction. And, they conceptualize their opponent as wanting to move in the opposite, wrong direction. As a result, the argument between the parties generates no light, only heat.  Each side only acknowledges the advantages of their direction and the disadvantages of the opposing direction. Neither side seeks to really understand and learn from the arguments offered by the other side.

When I’ve had the opportunity to mediate such conflicts, I almost always used the same strategies.

  • Step One.  I try to move both parties to some common ground, moving back to some basic statement that seemingly nobody could disagree with.  This generates a tiny bit of agreement momentum.
  • Step Two. Apply the momentum generated in step one to getting the parties to agree that, if you took each party’s position to an extreme, the result would be undesirable. The parties are inherently agreeing to re-conceptualize the disagreement from being a choice between polar opposite positions to an optimization problem. The idea is to choose an agreeable value on some spectrum between undesirable extremes.  If the parties make this leap, we are half way to sustainable agreement.
  • Step Three.  Get the parties to agree to avoid talking about the answer, and focus on reaching consensus on the factors and assumptions that influence the selection of the optimal answer.  Sometimes, this can be done subjectively, simply listing the factors.  Other times, it is worthwhile to get quantitative, working together on assumptions and calculations to estimate the magnitude and uncertainty of the outcomes at various points along the spectrum of alternative answers.  This quantitative approach has been described as the “explicit method,” and an example of applying it to resolve fierce disagreements about mammography guidelines is described in an earlier post.
  • Step Four.  Finally, ask the parties to apply their values to propose and explain an optimum answer, from their point of view.  In this step, the important point is to insist that the parties acknowledge that they are no longer arguing about facts or assumptions, since consensus has already been achieved on those.  If not, then go back to step three. The objective is to untangle and separate factual, logical, scientific debates from the discussion of differences in values.  If those remain tangled up, the parties inevitably resort to talking at each other, rather than engaging in productive dialog.
  • Step Five.  Try to achieve a compromise answer.  In my experience, if you’ve really completed steps 1-3, this ends up being fairly easy.
  • Step Six.  Work to sustain the compromise.  Celebrating the agreement, praising the participants for the difficult work of compromise, documenting the process and assumptions, and appealing to people to not disown their participation in the process are all part of this ongoing work.   Passive aggressiveness is the standard operating model in many settings, part of the culture of many organizations.  And, it is a very difficult habit to break.

Of course, in the routine work of mediating conflicts, I don’t really explicitly go through these six steps. This conflict resolution approach is in the back of my mind. They are really more like habits than steps.

Sometimes this approach works. Sometimes, it does not.  It can break at any step.

Notice that break downs in most of the steps are basically people issues. People won’t change their conceptualization. They are unwilling to make their assumptions explicit. They are unwilling to acknowledge differences in values. They are unwilling to compromise.

But, sometimes, the process breaks because of the nature of the issue being debated. Sometimes, conceptualizing the debate as an optimization problem between two undesirable extremes fails because there are really not good choices along the spectrum.

For example, when debating the design of a program or policy, I have often encountered a no-win trade-off between keeping it simple vs. addressing each party’s unique circumstances.  If I keep it too simple, people complain that it as a “hammer,” failing to deal with their circumstances.  If I add complexity to deal with all the circumstances, people complain that it is a maze or a contraption.  If I select some middle level of complexity, the complaints are even worse because the pain of complexity kicks in before the value of complexity is achieved.

I’ve seen this no-way-to-win scenario in my own work, in the design of information systems, wellness and care management protocols, practice guidelines and protocols, analytic models, organizational structures, governance processes, contractual terms, and provider incentive programs.  And, I’ve seen this scenario in many public policy debates, such as debates about tax policy, tariffs, banking regulations, immigration, education, and health care reform.  In cases when the extremes are more desirable than the middle ground, the only approach I can think of is to bundle multiple issues together so that one party wins some and the other party wins others, to facilitate compromise.

Read More

Telling a 46 year health care cost growth story in one graph

In a recent post to the Health Affairs Blog, Charles Roehrig, an economist who serves as VP and director of Altarum’s Center for Sustainable Health Spending, presented some very interesting graphics of long term health care cost growth in the U.S.  He shows the often-presented statistic of health care cost as a percent of Gross Domestic Product (GDP) over the 46 year period since Medicare was established in 1965.  The climbing graph is bumpy due to the effect of recessions and recoveries on the GDP measure in the denominator.  To see the underlying health care cost growth more clearly, Roehrig calculates what the GDP would have been during each time period if the economy was at a full employment state, called the “potential GDP.”  He then calculates health care cost as a percentage of potential GDP.  This creates a nice, steady upward ramp from 6% to almost 17%.

Then, using the potential GDP as a base, Roehrig created a great graphic showing how fast hospital, physician, prescription drug and other cost components grew in excess of the potential GDP.  In his blog post, Roehrig tells the story in graphs and words.  I created the following version of Roehrig’s graph to try to incorporate more of the story into the graph itself.

Roehrig concluded that the “policy responses to excess growth for hospital services, physician services, and prescription drugs seem to have been fairly successful.”  But, he referenced Tom Getzen, who warns against assuming that the recent lower growth rate is the “new normal.”  Rather, it may be temporarily low due to the lagged effects of the recent recession.  So, it may be too early to break out the champagne.

I really like showing such a long time horizon and breaking down health care costs into these five categories.  And, I am convinced that using potential GDP represents an improvement over the conventional GDP measure as a denominator for cost growth statistics.  But, I’ve never understood the popularity of expressing costs as a percentage of GDP in the first place.  In my opinion, it is more relevant to just show growth in real (inflation-adjusted) per capita costs, or the insurance industry variant of that, “per member per month” (PMPM) costs. Using GDP or potential GDP in the denominator seems to imply that, as our country gets more populous and richer, we should increase health care costs accordingly.  I agree with the idea that population growth should logically lead to increasing health care expenditures.  Expressing costs on a per capita basis handles that issue.  But, if we are prioritizing health care services as essential services, we would not necessarily need to spend that much more on health care if we got richer.

Read More

So, is there any good use of O/E analysis? Yes. It’s called Benchmark Opportunity Analysis.

In last week’s post, I argued that observed over expected analysis (O/E) was commonly misused as a method for doing “level playing field” performance comparisons.  I recommend against using it for that purpose.

But, is there some other good use for O/E analysis?

I can’t think of a good use for the O/E ratio itself — the metric derived by dividing observed performance by expected performance.  But, it turns out that the underlying idea of developing a model of performance that has been shown to be achievable is very useful to identify and prioritize opportunities for improvement.  The idea is to apply such a model to a health care provider’s actual population, and then compare those “achievable” results with the actual performance of the provider to see how much room there is for improvement.  I like to call this “opportunity analysis.”

There are two main variations on the “opportunity analysis” theme.  The first approach is to consider the overall average performance achieved by all providers as the goal.  The basic idea is to estimate how much the outcome will improve for each provider if they focused on remediating their performance for each risk cell where they have historically performed worse than average.  The analysis calculates the magnitude of improvement they would achieve if they were able to move their performance up to the level of mediocrity for such risk cells, while maintaining their current level of performance in any risk cells where they have historically performed at or above average.  A good name for this might be “mediocrity opportunity analysis,” to emphasize the uninspiring premise.

The second variation on this approach challenges providers to achieve excellence, rather than just mediocrity. I like to call this “benchmark opportunity analysis.”   The idea is to create a model of the actual performance of the one or more providers that achieves the best overall performance, called the “benchmark providers.” Then, this benchmark performance model is applied to the actual population of each provider, to estimate the results that could be achieved, taking into account differences in the characteristics of the patient populations.  These achievable benchmark results are compared to the actual performance observed.  The difference is interpreted as the opportunity to improve outcomes by emulating the processes that produced the benchmark performance.

As shown in this illustrative graphic, a benchmark opportunity analysis compares different improvement opportunities for the same provider.  In the example, Acme Care Partners could achieve the greatest savings by focusing their efforts on improving the appropriateness of high tech radiology services. In contrast, Acme Care Partners is already achieving benchmark performance in appropriateness of low tech radiology services, and therefore has zero opportunity for savings from improving up to the benchmark level.  That does not mean that they can’t improve.  No analysis can predict the opportunity for true innovation.  Benchmark opportunity analysis is just a tool for pointing out the largest opportunities for emulating peers that already perform well, taking into account the differences in the mix of patients between a provider organization and it’s high performing peers.

This method is generally consistent with the “achievable benchmarks of care” (ABC) framework proposed more than 10 years ago by the Center for Outcomes and Effectiveness Research and Education at the University of Alabama at Birmingham.  However, that group advises against using the method for financial performance measures, presumably out of fear that it could encourage inappropriate under-utilization.  I consider that a valid concern.   To reduce that risk, I advocate for a stronger test of “achievability” for cost and utilization performance measures.  In the conventional ABC framework for quality measures, “achievability” is defined as the level of performance of the highest-performing set of providers that, together, deliver care to at least 10% of the overall population.  Such a definition is preferable to simply setting the benchmark at the level of performance achieved by the single highest-performing provider because a single provider might have gotten lucky to achieve extremely favorable performance.  When I apply the achievable benchmark concept to utilization or cost measures, I set the benchmark more conservatively than for quality measures.  For such measures, I use 20% rather than 10% so as to avoid setting a standard that encourages extremely low utilization or cost that could represent inappropriate under-utilization.

Note that one provider may have benchmark care processes that would achieve the best outcomes in a more typical population, but that same provider may have an unusual mix of patients that includes a large portion of patients for whom they don’t perform well, creating a large opportunity for improvement.  The key point is that opportunity analysis is the right method to compare and prioritize alternative improvement initiatives for the same provider.  But the results of opportunity analyses should not be used to compare the performance of providers.

The following graphic summarizes the comparison of traditional risk adjustment, O/E analysis, and benchmark opportunity analysis.

Simple Example Calculations

For those readers interested in a little more detail, the following table uses the same raw data from the calculations from last week’s post to illustrate the approach.

As shown in this example, Provider A has worse performance (higher mortality rate) than Provider B in adults.  So, Provider B is the benchmark performer in the adult risk cell.  If Provider A improved from 6.41% mortality down to the 5.00% mortality level of Provider B, it could save the lives of 11 adults per year.  Provider B has worse performance in children.  If Provider B improved its performance in children up to the level achieved by Provider A, while still achieving its benchmark level of performance in adults, it could save 1 life per year.

Read More

Observed over expected (O/E) analysis is commonly misapplied to performance comparisons. Please don’t.

A few years ago, I had a shocking and enlightening discussion about analytic methods with a group of epidemiologists and biostatisticians from Blue Cross Blue Shield of Michigan.

PGIP Quarterly Meeting in Lansing

We were sitting around a table at a conference center in Lansing, where we were debriefing from a meeting of the Physician Group Incentive Program. We were talking about the methods for performance comparison. Everyone knew that we needed to “risk adjust” to take into account differences in patient characteristics when comparing the performance of different physicians, clinic practice units, and physician organizations. If we failed to properly risk adjust, the poorly performing providers would surely argue “my patients are sicker.”

Traditional Risk Adjustment using Standardization

When epidemiologists want to compare the performance of two health care providers on a level playing field, the traditional method is to do risk adjustment using an approach called standardization.    The concept is to determine which patient or environmental variables influence the outcome of interest.  These are called confounding variables, because differences in the mix of patients based on these variables can confound the performance comparison unless they are taken into consideration. Examples of such confounding variables include age, gender, the presence of co-morbid conditions, etc.  If any of the confounding variables are continuous numbers, like age, the epidemiologist must first convert them to discrete categories, or groupings.  For example, if age was the confounding variable, the epidemiologist might define categories for “adults” and “children.”  Or, the epidemiologist might break age into ten-year ranges.  If there is more than one confounding variable, the categories are defined based on the combinations of values, such as “adult females,” “adult males,” etc.  These categories are sometimes called “risk strata” or “risk cells.”  Then, for each of these categories, for each of the providers being compared, the epidemiologist calculates the outcome measure of interest, such as the mortality rate or the total cost of care.  The result is a matrix of measure values for each of the risk cells for each provider.  This matrix can be conceptualized as a “model” of the actual performance of each provider.

To create a level playing field for comparisons, the epidemiologist then creates a standardized population.  The standardized population is simply the number or percentage of patients in each of the risk cells.  Then, the model of each provider’s performance is applied to that standardized population to determine what the outcomes would have been if that provider had cared for the standardized mix of patients.  For each of the risk cells, the standardized number of patients is multiplied by the actual performance that provider achieved for such patients.  Then, the results for all the risk cells are aggregated to obtain that provider’s risk-adjusted performance. Another way of thinking about this is that the risk adjusted outcome is the weighted average outcome for all the risk cells, where the weights are the proportion of patients in that risk cell in the standardized, level-playing-field population.  If the provider’s actual mix of patients was “sicker” or “higher risk” than the standardized population, then the risk adjusted outcome will be more favorable than the unadjusted outcome.

“Observed Over Expected” Analysis

In the literature, even in respected journals, I have seen many examples of performance comparisons that used a different analytic approach called “observed over expected” or “O/E,” rather than using the traditional standardization approach.  A recent example is a paper regarding the mortality-avoidance performance of childrens’ hospitals.  Just as with standardization, the O/E method begins by identifying confounding variables — the patient characteristics that are predictors of the outcome of interest.   With O/E analysis, confounding variables that are continuous, like age, do not have to be converted to discrete categories or groupings.  All the confounding variables are used as independent variables in a regression model.  Then, the resulting regression model is applied to each individual patient observation, inserting the values of the predictor variables for that patient into the regression formula to obtain the “expected” value of the outcome of interest.  At that point, you have the actual observed value and the expected value for each patient (or case).  Then, you sum up the observed values for all the patients for a given provider.  And, you sum up the expected values for all the patients for a given provider.  Finally, you divide the sum of observed values by the sum of expected values to get the O/E ratio.  If the ratio is greater than one, that is interpreted to mean that the provider has a higher-than-expected value for that outcome.  If the outcome variable is undesirable, such as mortality rate, complication rate or cost, an O/E ratio of greater than one is interpreted to mean that the provider performed poorly compared to the other providers.  People have been routinely using O/E analysis as if it was a way of doing risk-adjusted performance comparisons — as a way of “leveling the playing field” to do performance comparisons that take into account differences in patient characteristics.

But, sitting around the table in Lansing, I was shocked to realize that O/E analysis is not actually applicable for this purpose.  Why? Because O/E analysis does not actually create a level playing field.

On the contrary, O/E analysis is conceptually the opposite of the traditional standardization approach.  In traditional standardization, a model of each provider’s actual performance is applied to a standardized population.  In O/E analysis, the regression model is essentially a model of typical performance.  That regression model is applied to the actual population that received care from a particular provider.  The problem is that different providers can see a different mix of patients.  Consider the following simple calculations.

In this simple example, we are comparing just two providers.  We are considering just one confounding variable, age.  And, we are breaking that variable into just two categories, adults and children.  As shown in the example, provider A sees mostly adults, while provider B sees mostly children.  Provider B performs poorly in those children, but actually performs better that A in the adult population.  Because provider B sees more children, the poor performance in children dominates the O/E calculation, so provider B looks bad in terms of an O/E ratio of 1.09.  But, since there are more adults than children in the overall population, which we are using as the “standardized” population, provider B’s superior performance in adults dominates the risk adjustment.  So, provider B has a risk-adjusted mortality rate that is better than provider A.  In other words, if you use O/E analysis for level-playing-field performance comparisons, you may get the wrong answer.

Risk adjustment using standardization is not computationally difficult.  But, it does require a bit of programming work to convert continuous variables such as age into categorical variables, such as breaking age into ten-year ranges.  In my opinion, O/E analysis should not be used for risk-adjusted performance comparisons.  O/E analysis does have the advantage of being computationally more convenient, particularly when you have many confounding variables and when some of them are continuous variables.  But, it is not that much more work to do it right.

Read More

How do we reduce errors in software and data analysis? Culture of Accountability vs. Culture of Learning

A young colleague recently wrote to me complaining of frustration from having to deal with a high rate of errors in software development and data analysis.  Any time you are innovating in a knowledge-intensive field such as health care, you will need to develop new software and analyze data in new ways.  Errors will inevitably result.  There’s no easy way to avoid them. Therefore, reducing errors in software development and analysis is a lifelong battle for healthcare innovators.

The conventional philosophy of reducing errors is the following:

  1. Make sure everyone clearly knows what they are responsible for
  2. Make sure you use a tightly controlled development process with clear steps, checkpoints, milestones and gates
  3. Make sure you have everything well documented, using documents created from highly detailed templates designed to assure that nothing is forgotten
  4. Make sure you have detailed testing scenarios designed in advance, and that you do “regression testing” to assure that changes to one part of a system or analysis do not cause the testing scenarios to fail
  5. Make sure everyone understands the consequences of errors, both to the organization and to them personally

These are the pillars of rigorous project management.

But, unfortunately, experience teaches that sometimes this philosophy can have some unintended consequences.  Sometimes, errors still occur. Little errors, like bugs.  And big errors, like creating something that nobody needs or wants.   For example, when you have a tightly controlled process, sometimes that communicates to people that you intend for the process to be linear, rather than iterative.  Even when you say “let’s do this iteratively,” all the steps, milestones and gates tell people that you really mean the opposite.  When you create a highly detailed template, intended to assure that nothing is forgotten, you unintentionally are switching people into a mode of “filling out the form,” rather than the much harder and more valuable work of figuring out how to effectively teach the most important concepts to the reader.  And, you unintentionally convert your quality assurance process to one that emphasizes adherence to the template, rather than the quality of the underlying ideas being taught.  When you create detailed testing scenarios, you unintentionally encourage the team to treat “passing the tests” as quality, rather than having them challenging the software or the analytic results to tests that are designed based on insights about how the software or the analytic calculations are actually structured and what types of errors might be more likely.  A software developer I know describes that as “testing smarter.” Finally, when you communicate to people the consequences to them personally of messing up, intending to increase their motivation to do error-free work, you unintentionally are telling them to allocate more of their time to avoiding blame and documentating plausible deniability.  And, you are unwittingly telling them to bury the problems that could provide the insights needed to drive real improvement.

W Edwards Deming

W. Edwards Deming famously advocated for “driving out fear.”  In his landmark book, “Out of the Crisis,” published back in 1982, Deming explains that people fear new knowledge because it could reveal their failings and because it could lead to changes that could threaten their security.  Focusing on motivating people might be a good idea if the problem is inadequate motivation.  But, in my experience, poor performance is usually not an issue of motivation, especially in professional settings. More likely, poor performance is an issue of poor tools, poor training (leading to inadequate knowledge or skills), or having the wrong talent mix for the job.

That last one — talent — is a tricky one.  We consider it enlightened to assume that everyone could do a great job if only they received the right tools and training. Saying someone lacks the necessary talent for a particular job can be considered arrogant and wrong.  I think this may be because  talent is an unchangeable characteristic, and we are taught that it is wrong to consider other unchangeable characteristics such as gender or race.  But, each person was given their own unique mix of talents.  They will make their best contribution and achieve their highest satisfaction if they are in a role that is a good fit for their talents.  On the other hand, it is devilishly hard to tell the difference between unchangeable talents vs. changeable skills and knowledge.  And, developing people’s skills and knowledge is hard work and requires patience. As a result, it is too easy for leaders to get lazy and waste real talent.  Finding the right balance between optimism and realism about peoples’ potential requires maturity, effort and some intuition.  If in doubt, take Deming’s advice and error on the side of optimism.

I’m not arguing against processes, documentation, test scenarios or accountability.  But, I am suggesting to be careful about the unintended consequences of taking those things too far and relying on them too much.

My advice to my colleague was to focus more on the following:

  1. Make sure you hire really talented people, and then invest heavily is developing their knowledge and skills
  2. Make sure you are actually analyzing data the moment you start capturing it, rather than waiting a long time to accumulate lots of data only to discover later that it was messed up all along
  3. Make sure you do analysis and develop software iteratively, with early iterations focused on the hardest and most complex part of the work to make sure you don’t discover late in the game that your approach can’t handle the difficulty and complexity
  4. Most importantly, create a culture of learning, where people feel comfortable sharing their best ideas, talking about errors and problems, taking risks, and making improvements
Read More

The debate about what to maximize when selecting candidates for care management programs: Accuracy? ROI? Net Savings? Or Cost-Effectiveness?

When doing analysis, it is really important to clarify up front what it is you are actually trying to figure out.  This sounds so obvious.  But, I am always amazed at how often sophisticated, eager analysts can zip past this important first step.

Health plans, vendors and health care providers are all involved in the design and execution of wellness and care management programs.  Each program can be conceptualized as some intervention process applied to some target population.  A program is going to add more value if the target population is comprised of the people for whom the intervention process can have the greatest beneficial impact.

I have found it useful to conceptualize the process of selecting the final target population as having two parts.  The first part is the process of identification of the population for whom the intervention process is deemed relevant.  For example, a diabetes disease management program is only relevant to patients with diabetes.  An acute-care-to-ambulatory-care transitions program is only relevant to people in an acute care hospital. A smoking cessation program is only relevant to smokers.  The second part is to determine which members of the relevant population are to actually be targeted.  To do this, a program designer must figure out what characteristics of the relevant candidates are associated with having a higher opportunity to benefit from the intervention.  For example, in disease management programs, program designers often use scores from predictive models designed to predict expected cost or probability of disease-related hospitalization.  They are using cost or likelihood of hospitalization as a proxy for the opportunity of the disease management program to be beneficial.  Program designers figure that higher utilization or cost means that there is more to be saved.  This is illustrated in the graph below, where a predictive model is used to sort patients into percentile buckets.  The highest-opportunity 1% have have an expected annual admit rate of almost 4000 admits per 1000 members, while the lowest 1% have less than 200 admits per 1000 members.  A predictive model is doing a better job when more of the area under this curve is shifted to the left.

Although it is common to use expected cost or use as a proxy for opportunity, what a program designer would really like to know is how much good the intervention process is likely to do.  Other variables besides expected cost or use can contribute to a higher opportunity.  For example, in a disease management program, the program might be more worthwhile for a patient that is already motivated to change their self-management behaviors or one that had known gaps in care or treatment non-compliance issues that the intervention process is designed to address.

Applying micro-economics to care management targeting

Once the definition of “opportunity” is determined and operationally defined to create an “opportunity score” for each member of the relevant population, we can figure out which members of the relevant population to actually target for outreach for the program.  Conceptually, we would sort all the people in the relevant population by their opportunity score.  Then, we would start by doing outreach to the person at the top of the list and work our way down the list.  But, the question then becomes how far down the list do we go?  As we go down the list, we are investing program resources to outreach and intervention effort directed at patients for which the program is accomplishing less and less.   Economists call this “diminishing returns.”

As illustrated by the red line in the graph above, there is some fixed cost to operating the program, regardless of the target rate.  For example, there are data processing costs.  Then, if the program does outreach to a greater and greater portion of the relevant population, more and more people say “yes” and the costs for the intervention go up in a more or less linear manner.  As shown by the green line, the savings increase rapidly at first, when the program is targeting the candidates with the greatest opportunity.  But, as the threshold for targeting is shifted to the right, the additional candidates being targeted have lower opportunity, and the green line begins to flatten. The blue line shows the result of subtracting the costs from the savings to get net savings.  It shows that net savings increases for a while and then begins to decrease, as the cost of intervening with additional patients begins to outweigh the savings expected to accrue from those patients.  In this analysis, net savings is highest when 41% of the relevant population of diabetic patients is targeted for the diabetes disease management program.  The black dotted line shows the result of dividing savings by cost to get the return of investment, or ROI.   With very low target rates, too few patients are accumulating savings to overcome the fixed cost.  So the ROI is less than 1.  Then, the ROI hits a peak at a target rate of 18%, and declines thereafter.  This decline is expected, since we are starting with the highest opportunity patients and working down to lower opportunity patients.   Note that in this analysis, increasing the target penetration rate from 18% to 41% leads to a lower ROI, but the net savings increases by 24%.  So, if the goal is to reduce overall cost, that is achieved by maximizing net savings, not by maximizing ROI.

Should we try to maximize accuracy?

In a recent paper published in the journal Population Health Management by Shannon Murphy, Heather Castro and Martha Sylvia from Johns Hopkins HealthCare, the authors describe their sophisticated methodology for targeting for disease management programs using “condition-specific cut-points.”  A close examination of the method reveals that it is fundamentally designed to maximize the “accuracy” of the targeting process in terms of correctly identifying in advance the members of the relevant disease-specific population that will end up being among the 5% of members with the highest actual cost.   In this context, the word accuracy is a technical term used by epidemiologists.  It means the percentage of time that the predictive model, at the selected cut-point, correctly categorized patients.  In this application, the categorization is attempting to correctly sort patients into a group that would end up among the 5% with highest cost vs. a group that would not.  By selecting the cut point based accuracy, the Hopkins methodology is implicitly equating the value of the two types of inaccuracy: false positives, where the patient would be targeted but would not have been in the high cost group, and false negatives, where the patient would not be targeted but would have been in the high cost group. But, there is no reason to think that, in the selection of targets for care management interventions, false negatives and false positive would have the same value. The value of avoiding a false negative includes the health benefits and health care cost savings that would be expected by offering the intervention. The value of avoiding a false positive includes the program cost of the intervention.  There is no reason to think that these values are equivalent.  If it is more important to avoid a false positive, then a lower cut-point is optimal.  If it is more valuable to avoid a false negative, then a higher cut-point is optimal.  Furthermore, the 5% cost threshold used in the Hopkins methodology is completely arbitrary, selected without regard to the marginal costs or benefits of the intervention process at that threshold.  Therefore, I don’t advise adopting the methodology proposed by the Hopkins team.

What about cost-effectiveness?

The concept of maximizing ROI or net savings is based on the idea that the reason a health plan invests in these programs is to save money.  But, the whole purpose of a health plan is to cover expenses for necessary health care services for the beneficiaries.  A health plan does not determine whether to cover hip replacement surgery based on whether it will save money.  They cover hip replacements surgery based on whether it is considered a “standard practice,”  or whether there is adequate evidence proving that the surgery is efficacious.  Ideally, health care services are determined based on whether they are worthwhile — whether the entire collection of health and economic outcomes is deemed to be favorable to available alternatives.  In the case of hip replacement surgery, the health outcomes include pain reduction, physical function improvement, and various possible complications such as surgical mortality, stroke during recovery, etc.  Economic outcomes include the cost of the surgery, and the cost of dealing with complications, rehabilitation and follow-up, and the savings from avoiding whatever health care would have been required to deal with ongoing pain and disability.  When attempting to compare alternatives with diverse outcomes, it is helpful to reduce all health outcomes into a single summary measure, such as the Quality-Adjusted Life Year (QALY).  Then, the incremental net cost is divided by the incremental QALYs to calculate the cost-effectiveness ratio, which is analogous to the business concept of return on investment.  If the cost-effectiveness ratio is sufficiently high, the health service is deemed worth doing.  There is no reason why wellness and care management interventions should not be considered along with other health care services based on cost effectiveness criteria.

The idea that wellness and care management interventions should only be done if they save money is really just a consequence of the approach being primarily initiated by health plans in the last decade.  I suspect that as care management services shift from health plans to health care providers over the next few years, there will be increased pressure to use the same decision criteria as is used for other provider-delivered health care services.

Read More