How to use and improve predictive models

Read More

Criticism of ProPublica’s Surgical Scorecard fails to consider the possibility of real, useful analytics.


Last week, ProPublica published a scorecard of surgical death and complication rates of more than 17,000 surgeons  for 8 elective procedures using Medicare data.  As with prior releases of health care performance metrics, the response against such “transparency” was swift and bitter.  Among those many responses is a thoughtful blog post entitled “After Transparency: Morbidity Hunter MD joins Cherry Picker MD” by Saurabh Jha, MD in The Health Care Blog.  Definitely worth your time to read.

But, although it is a clever bit of commentary, it implicitly presents a false choice between using data and not using data.

In my opinion, the decision should be conceptualized as:
  • Option 1: Not using data (and relying instead on subjective assessment or chance)
  • Option 2: Using reported metrics, interpreted by people who lack the talent and training to understand the limitations of the metrics and the methods that can address some of those limitations, and
  • Option 3: Using data, interpreted through analysis, conducted by and interpreted with the aid of people with such analytic talent and training.
By talent and training, I don’t mean technology mavens.  Keep your business intelligence professionals, data miners, “big data” experts, and most that claim the fashionable title of “data scientist.” I mean people that have training in epidemiology, biostatistics, health economics and other social science disciplines, and that have sufficient knowledge of health care.  People that can conceptualize theories of cause and effect. People that understand bias and variation.  People that can tell an interesting and actionable story supported by data, rather than just generate a “dashboard” or “score card.”  And, they must be people who have integrity and who are free of conflicts of interest that could prevent them from telling stories that are true.

Before anyone writes off option #3 as idealistic and infeasible, we should at least take the time to think through how we might make it work.

Read More

Al Lewis calls workplace wellness programs “get well quick schemes”

Al Lewis is an actuarial consultant that has long focused on challenging wellness and care management vendors to prove their value.  He founded the “Disease Management Purchasing Consortium” and established a training and certification program for “critical outcomes report analysis.”

Al has been calling out the methodological carelessness and dirty tricks of wellness and care management vendors and health plans for years.   These shoddy and unethical methods produce deceptively optimistic results, often to the delight of the customers of the programs who crave evidence that they made a wise choice. Many of the methods have been discredited long ago, but like cockroaches and ants, they just keep coming back. Faced with this unsavory state of affairs over many years, poor Al has resorted to sarcasm — probably partly to avoid getting bitter, and partly to keep his audience awake long enough to absorb the otherwise dry, tedious concepts.

He recently collaborated with Vick Khanna in a blog post in Health Affairs that focused on a particular type of wellness and care management program — workplace wellness — now a $6B industry.  Such programs typically are funded and sponsored by employers, and involve incentivizing employees to complete a health risk assessment and then, hopefully, pursue healthier lifestyle behaviors. Employers purchasing these programs typically believe they will lead to substantial, short term increases in worker productivity and decreases in health care costs. The blog post is definitely worth reading.

To summarize:

  1. Both workplace wellness program vendors and the benefit consultants who advocate for them have conflicts of interest which lead them to use deceptive methods and publish papers and marketing material which claim implausible levels of savings and return-on-investment.
  2. Although health plans often sell workplace wellness programs to self-insured employers (for a profit), virtually none of them believes they really produce savings, so they don’t spend the money on such programs for the fully-insured business for which the health plan itself bears the risk.  Health plans don’t eat their own dog food.
  3. The most common trick is to compare the outcomes for highly motivated employees who choose to complete the health risk assessments and participate in wellness interventions to the outcomes for poorly motivated employees who do not.  Epidemiologists call this “volunteer bias.”  It is a problem in evaluation studies of all types of member/patient-facing programs, but is obviously an even bigger problem with workplace wellness, when motivation to change behavior is the whole point of the program.
  4. Other common tricks include taking credit for improvements that occur due to random variation (“regression to the mean”), or taking credit for improvements that occurred before the program actually started — as was the case with the widely-touted results from Safeway’s famous workplace wellness program.
  5. They recommend that employers should avoid these “get well quick” schemes and, instead, do the harder work of creating a deep culture change promoting wellness.  If employers want to try workplace wellness programs, they should at least commit to identifying and then counting the events that the wellness program is intended to reduce to see whether they really decrease across the entire work force after the program is implemented.
  6. Lastly, they point out that the workplace wellness industry convinced the federal government to include taxpayer-financed wellness incentives in the Affordable Care Act.   The Federal Employee Plan is in the process of picking a wellness vendor.  They recommended dropping federally-funded wellness programs until valid evaluations show they work.
Read More

Dr. Ward to present at World Congress 3rd Annual Data Analytics Summit for Health Plans

For more information:

Phone: 800-767-9499

Read More

Coconut oil as an Alzheimer’s treatment? Please don’t short-circuit science.

My father recently forwarded an e-mail he received from a friend with a link to a TV news story about a physician who treated her own husband’s worsening Alzheimer’s disease with coconut oil.  My father is interested in the topic, particularly since he knows someone who suffered and died from the disease.  He forwarded the e-mail to me, asking my opinion.

The physician, Mary Newport, MD, is a neonatologist.  She explains that Alzheimer’s is thought to be similar to diabetes in that it involves an impairment in the ability of brain cells to respond to insulin and take in the glucose needed to provide energy.  As a result, brain cells die and eventually brain function is reduced.  She reasoned that the brain cells may avoid death by relying on an alternative fuel, ketones.  She identified coconut oil as a good dietary source of ketones.  So, she introduced coconut oil into her husband’s diet and noted improvement in his brain function.  She documented this improvement with a “clock test,” showing how a hand drawing of the face of a clock done after initiation of coconut oil treatment was more coherent and detailed that a drawing done before the treatment. Excited by the promising results, she wrote a book, started a web-site, and started doing radio and TV interviews to disseminate information about her new treatment.

From the simple explanation, it seems biologically plausible. And, I’m sure that Dr. Newport had nothing but the best intentions, motivated by love for her husband and a desire to help millions of people suffering from Alzheimer’s. And, it is possible that she is absolutely right. Coconut oil may be a simple, inexpensive, non-invasive, effective treatment for the disease.

But, obviously, we would not want to make decisions about treatments from a single data point, where the main outcomes measurement was a subjective assessment about how coherent a hand drawing of a clock was.

It would have been more appropriate for this physician to actually do the work of scientific research before disseminating results.  That would start with writing a study proposal, convincing peers in a study committee for a research granting agency that it was a plausible and promising idea. Then, she would conduct a randomized study, making objective measurements or collecting careful observations by impartial observers.  Then, she would analyze the results to see if there is a statistically significant difference in the outcomes between the treatment group and the control group.  The purpose of the statistical significance test is to assure that there is a low probability that any observed differences are just due to chance.  Finally, she would do the work of writing up a paper and submitting it to a peer reviewed journal to convince expert reviewers that there were no obvious flaws in the methodology.  Only then should she consider further dissemination of the information, such as by writing a book, starting her own web site, and doing TV and radio interviews.

The scientific approach to medical innovation has served us well as a society.  When this physician went straight from one observation to TV interviews, she short-cut the scientific approach.  She may be helping people with Alzheimer’s.  But, she may potentially be distracting Alzheimer’s patients from seeking proven treatments or diverting funding away from competing innovative treatment ideas that have gone through the scientific “front door.”  More generally, she may be harming our society’s commitment to a scientific approach.

The fact that the treatment is a type of food, rather than a patentable drug, admittedly changes the situation.  No drug company wants to fund research on coconut oil.  And, the coconut oil industry is not familiar with clinical research, even if they could benefit from increased demand for treatment of Alzheimer’s.  This is a good argument for why the National Institutes of Health and private research foundations should fund more research related to diet and natural remedies.   It should not be an argument for short-circuiting the scientific approach to health care innovation.

Fortunately, a research team from Oxford is pursuing a randomized clinical trial to test the use of dietary ketones as a treatment for Alzheimer’s.  The Oxford team is testing a specialized ketone ester that is thought to be ten times better than coconut oil in terms of delivering ketones to the interior of brain cells.  Results should be available later this year.  Hopefully, they will show meaningful improvement.

Read More

Slides for Dr. Ward’s presentation at 3rd Annual Predictive Modeling Congress for Health Plans, Orlando, Florida, January 31, 2012


Click here for PDF copy of slides.

Read More

Congressional Budget Office: Care management programs only work if care managers have face to face contact with patients and substantial interaction with physicians

This month, Lyle Nelson of the Congressional Budget Office (CBO) released a “working paper” summarizing the results of a decade of experience with 6 care management demonstration projects in the Medicare population.  These demonstrations included a total of 34 disease management or care coordination programs. Nelson briefly summarized the working paper in a recent blog post.

All of the 34 care management programs were designed to reduce Medicare costs primarily by maintaining or improving the health of the Medicare beneficiaries, and thereby reducing the need for expensive inpatient hospital stays.  As shown the graph below, different programs showed different effects on the rate of hospital admissions.  On average, the programs showed no effect.

Effects of 34 Disease Management and Care Coordination Programs on Hospital Admissions (Percentage Change in Hospital Admissions)


The CBO analyzed whether specific characteristics of programs led to better or worse results. They found that programs where the care management provider’s fees were at risk did not perform better or worse than those with fees not at risk.  However, they did find two things that worked.  They found that programs in which care managers had substantial direct interaction with physicians and those with significant in-person interaction with patients reduced hospital admissions by an average of 7%, while programs that did not have these features had no impact on hospital admissions.

But, after subtracting the cost of the programs themselves, almost none of the programs achieved net savings.

The programs with the most compelling performance included:

  • Massachusetts General Hospital and its affiliated physician group reduced hospital admissions between 19-24% among patients selected as “high risk” using a program that was far more tightly integrated with the health care delivery system.  Physicians in the group were involved in the design of the intervention, and care managers were staff members in primary care physicians’ practices.  The patients received the vast majority of their care within the integrated delivery system, so almost all of their health information was available and up-to-date in an electronic medical records system.  Care managers were notified immediately when a patient was admitted to the emergency room or hospital.  They had an opportunity for face-to-face interaction with patients in the clinic.  And, they had access to a pharmacist to address medication issues.
  • Two multi-specialty group practices in the Northwest reduced hospital admissions by 12-26% among high risk patients using a program that involved telemonitoring with the “Health Buddy” device that transmitted symptoms and physiologic measurements to a care manager
  • Mercy Medical Center in rural Iowa reduced hospital admissions by 17% among patients hospitalized or treated in the ER in the prior year for CHF, COPD, liver disease, stroke, vascular disease, and renal failure using a program that involved care managers, many of which were located in physician offices and/or accompanied patients on their physician visits.

The methods used for these evaluations were far stronger than those used by the self-evaluations typically advertised by vendors of care management services.  In the CBO reports, 30 of the 34 programs were evaluated based on a comparison to a randomly selected comparison group.  The remaining 4 programs were evaluated using a concurrent comparison group selected using the same selection criteria.  In all cases, the programs were evaluated on an “intent to treat” basis, where study subjects were included in the evaluation regardless of whether they participated in the voluntary programs, thereby removing a source of bias that causes mischief in less rigorous evaluations.

To me, the take-away message is that provider-based care management is promising, but health-plan-style telephonic care management has not been successful, even in a senior population, where finding high risk targets is far easier and even when the care management services provider is highly motivated to succeed.

Read More

AHRQ guidance: Forget about proving Medical Home effectiveness with small pilot studies attempting to measure all-patient cost savings

The vast majority of already-published and currently-underway studies of the effectiveness of the Patient Centered Medical Home (PCMH) model of care are pilot studies with fewer than 50 PCMH practices.  Most of these studies report or intend to report reductions in hospitalization rates and savings across the entire population of patients served by the PCMH practices.  New  guidance from the Federal Government calls the value of such reports into question.

The AHRQ recently released an excellent pair of white papers offering guidance on the proper evaluation of PCMH initiatives.  The first is a four page overview intended for decisionmakers, entitled “Improving Evaluations of the Medical Home.”   The second is a 56 page document that goes into more detail, entitled “Building the Evidence Based for the Medical Home: What Sample and Sample Size Do Studies Need?”  The whitepapers were prepared by Deborah Peikes, Stacy Dale and Eric Lundquist of Mathematica Policy Research and Janice Genevro and David Meyers from the AHRQ.

The white papers emphasize a number of key points:

  • Base evaluation plans on plausible estimates of the effects of PCMH.  Based on a review of the evidence so far, the white paper suggested that a successful program could plausibly hope to reduce cost or hospitalizations, on average, by 15 percent for chronically ill patients and 5 percent for all patients.
  • Use a proper concurrent comparison group, rather than doing a “pre-post” analysis. Pre-post analyses, although common, are inconclusive because they can easily by confounded by other factors changing during the same time period, such as economic conditions, health care policy changes, advances in technology, etc.
  • Focus on evaluating a large number of practices, rather than a large number of patients per practice.  The authors point out that “a study with 100 practices and 20 patients per practice has much greater power than a study of 20 practices with 100 patients each.”  They warn that small pilot studies with 20 practices or less are unlikely to produce rigorous results without combining the results with many other small studies conducted using the same metrics.  Such pilot studies, which unfortunately are very common, are really only useful for generating hypotheses, not for drawing conclusions.  The authors note that neither positive nor negative results of such small studies should be relied upon.  Small PCMH studies can show no significant impact because they did not have the power to detect such an impact.
  • Focus on evaluating health and economic outcomes in subsets of patients such as those with chronic disease.  Satisfaction can be evaluated across the entire population, but if you use data for the entire population to measure hospitalizations, emergency department visits, inpatient days, or health care costs, the lower risk portions of the population contribute noise that obscures the measurement of the effect that is occurring primarily among those most likely to experience such events in the first place.
  • Use statistical methods that account for “clustering” at the practice level, rather than treating individual patients as the unit of analysis.  Since the intervention is intended to change processes at the practice level, the individual patients within a practice are not independent of one another.  Clustering must be taken into account not only at the end of the study, when evaluating the data.  It must also be taken into account at the beginning, when determining the number of practices and patients to sample.  For example, if a study includes a total of 20,000 patients, but the patients are clustered within 20 practices, then the effective sample size is only 1,820, assuming patient outcomes are moderately clustered within practices.  When statistical methods treat such patients as independent, they are implicitly treating the sample size in such a situation as 20,000 rather than 1,820.   As a result, evaluators making such an assumption are dramatically over-estimating their power to detect the effect of the PCMH transformation.  If they adjust for clustering at the end, their findings are likely to show a lack of a significant effect, even if the PCMH program really worked.  On the other hand, if they don’t adjust for clustering in the end, there is a great risk of reporting false positive findings.  For example, in a PCMH study with 20 practices and 1,500 patients per practice, where the analysis was done without adjusting for clustering and found a positive result, there is a 60% chance that the positive result is false, based on typical assumptions.
These recommendations are based not only on the experience of the authors, but on modeling that they did to explore the implications of different study scenarios with different numbers of patients, intervention practices, control practices and measures of interest.  These models calculate the minimum detectable effect (MDE) based on assumptions regarding typical characteristics of the patient populations, practices, and plausible effects of the PCMH program, based on a review of prior studies and the authors’ experience.  The models illustrate that, when measuring the impact of PCMH on costs or hospitalization rates for all the patients receiving PCMH care, the MDE drops as the number of practices in the PCMH intervention group increases.  But, even with 500 PCMH practices, the studies cannot detect the 5% cost or hospitalization reduction that the authors consider to be the plausible impact of PCMH on the entire population.

The authors re-ran the models, assuming that the measure of cost and hospitalization would consider only the sub-population of patients with chronic diseases.

The model showed that, based on reasonable assumptions, at least 35 PCMH practices, plus an equivalent number of concurrent comparison practices, would be required to detect the 15% effect that the literature suggests is the plausible effect of PCMH on cost and hospitalizations among patients with chronic diseases.  Even when focusing on the chronic disease sub-population, a pilot evaluation with only 10 PCMH practices and 10 comparison practices could not detect an effect smaller than 30%, an effect size they considered implausible.

I found this modeling exercise to be very informative and very worrisome, given the large number of pilot studies underway that are unlikely to provide conclusive results and the risk that people will try to draw incorrect conclusions when those results become available.  Often, health care leaders find these calculations inconvenient and frustrating, as if the bearers of this mathematical news are being overly rigorous and “academic.”

Note that these concepts and conclusions are applicable not only to evaluations of PCMH, but also of other programs intended to improve processes or capabilities at the level of a physician’s practice, a clinic or a physician organization such as health information technology investments, training staff in Lean methods, or implementing gain-sharing or other performance-based incentives.

Read More

Telling a 46 year health care cost growth story in one graph

In a recent post to the Health Affairs Blog, Charles Roehrig, an economist who serves as VP and director of Altarum’s Center for Sustainable Health Spending, presented some very interesting graphics of long term health care cost growth in the U.S.  He shows the often-presented statistic of health care cost as a percent of Gross Domestic Product (GDP) over the 46 year period since Medicare was established in 1965.  The climbing graph is bumpy due to the effect of recessions and recoveries on the GDP measure in the denominator.  To see the underlying health care cost growth more clearly, Roehrig calculates what the GDP would have been during each time period if the economy was at a full employment state, called the “potential GDP.”  He then calculates health care cost as a percentage of potential GDP.  This creates a nice, steady upward ramp from 6% to almost 17%.

Then, using the potential GDP as a base, Roehrig created a great graphic showing how fast hospital, physician, prescription drug and other cost components grew in excess of the potential GDP.  In his blog post, Roehrig tells the story in graphs and words.  I created the following version of Roehrig’s graph to try to incorporate more of the story into the graph itself.

Roehrig concluded that the “policy responses to excess growth for hospital services, physician services, and prescription drugs seem to have been fairly successful.”  But, he referenced Tom Getzen, who warns against assuming that the recent lower growth rate is the “new normal.”  Rather, it may be temporarily low due to the lagged effects of the recent recession.  So, it may be too early to break out the champagne.

I really like showing such a long time horizon and breaking down health care costs into these five categories.  And, I am convinced that using potential GDP represents an improvement over the conventional GDP measure as a denominator for cost growth statistics.  But, I’ve never understood the popularity of expressing costs as a percentage of GDP in the first place.  In my opinion, it is more relevant to just show growth in real (inflation-adjusted) per capita costs, or the insurance industry variant of that, “per member per month” (PMPM) costs. Using GDP or potential GDP in the denominator seems to imply that, as our country gets more populous and richer, we should increase health care costs accordingly.  I agree with the idea that population growth should logically lead to increasing health care expenditures.  Expressing costs on a per capita basis handles that issue.  But, if we are prioritizing health care services as essential services, we would not necessarily need to spend that much more on health care if we got richer.

Read More

So, is there any good use of O/E analysis? Yes. It’s called Benchmark Opportunity Analysis.

In last week’s post, I argued that observed over expected analysis (O/E) was commonly misused as a method for doing “level playing field” performance comparisons.  I recommend against using it for that purpose.

But, is there some other good use for O/E analysis?

I can’t think of a good use for the O/E ratio itself — the metric derived by dividing observed performance by expected performance.  But, it turns out that the underlying idea of developing a model of performance that has been shown to be achievable is very useful to identify and prioritize opportunities for improvement.  The idea is to apply such a model to a health care provider’s actual population, and then compare those “achievable” results with the actual performance of the provider to see how much room there is for improvement.  I like to call this “opportunity analysis.”

There are two main variations on the “opportunity analysis” theme.  The first approach is to consider the overall average performance achieved by all providers as the goal.  The basic idea is to estimate how much the outcome will improve for each provider if they focused on remediating their performance for each risk cell where they have historically performed worse than average.  The analysis calculates the magnitude of improvement they would achieve if they were able to move their performance up to the level of mediocrity for such risk cells, while maintaining their current level of performance in any risk cells where they have historically performed at or above average.  A good name for this might be “mediocrity opportunity analysis,” to emphasize the uninspiring premise.

The second variation on this approach challenges providers to achieve excellence, rather than just mediocrity. I like to call this “benchmark opportunity analysis.”   The idea is to create a model of the actual performance of the one or more providers that achieves the best overall performance, called the “benchmark providers.” Then, this benchmark performance model is applied to the actual population of each provider, to estimate the results that could be achieved, taking into account differences in the characteristics of the patient populations.  These achievable benchmark results are compared to the actual performance observed.  The difference is interpreted as the opportunity to improve outcomes by emulating the processes that produced the benchmark performance.

As shown in this illustrative graphic, a benchmark opportunity analysis compares different improvement opportunities for the same provider.  In the example, Acme Care Partners could achieve the greatest savings by focusing their efforts on improving the appropriateness of high tech radiology services. In contrast, Acme Care Partners is already achieving benchmark performance in appropriateness of low tech radiology services, and therefore has zero opportunity for savings from improving up to the benchmark level.  That does not mean that they can’t improve.  No analysis can predict the opportunity for true innovation.  Benchmark opportunity analysis is just a tool for pointing out the largest opportunities for emulating peers that already perform well, taking into account the differences in the mix of patients between a provider organization and it’s high performing peers.

This method is generally consistent with the “achievable benchmarks of care” (ABC) framework proposed more than 10 years ago by the Center for Outcomes and Effectiveness Research and Education at the University of Alabama at Birmingham.  However, that group advises against using the method for financial performance measures, presumably out of fear that it could encourage inappropriate under-utilization.  I consider that a valid concern.   To reduce that risk, I advocate for a stronger test of “achievability” for cost and utilization performance measures.  In the conventional ABC framework for quality measures, “achievability” is defined as the level of performance of the highest-performing set of providers that, together, deliver care to at least 10% of the overall population.  Such a definition is preferable to simply setting the benchmark at the level of performance achieved by the single highest-performing provider because a single provider might have gotten lucky to achieve extremely favorable performance.  When I apply the achievable benchmark concept to utilization or cost measures, I set the benchmark more conservatively than for quality measures.  For such measures, I use 20% rather than 10% so as to avoid setting a standard that encourages extremely low utilization or cost that could represent inappropriate under-utilization.

Note that one provider may have benchmark care processes that would achieve the best outcomes in a more typical population, but that same provider may have an unusual mix of patients that includes a large portion of patients for whom they don’t perform well, creating a large opportunity for improvement.  The key point is that opportunity analysis is the right method to compare and prioritize alternative improvement initiatives for the same provider.  But the results of opportunity analyses should not be used to compare the performance of providers.

The following graphic summarizes the comparison of traditional risk adjustment, O/E analysis, and benchmark opportunity analysis.

Simple Example Calculations

For those readers interested in a little more detail, the following table uses the same raw data from the calculations from last week’s post to illustrate the approach.

As shown in this example, Provider A has worse performance (higher mortality rate) than Provider B in adults.  So, Provider B is the benchmark performer in the adult risk cell.  If Provider A improved from 6.41% mortality down to the 5.00% mortality level of Provider B, it could save the lives of 11 adults per year.  Provider B has worse performance in children.  If Provider B improved its performance in children up to the level achieved by Provider A, while still achieving its benchmark level of performance in adults, it could save 1 life per year.

Read More