AHRQ guidance: Forget about proving Medical Home effectiveness with small pilot studies attempting to measure all-patient cost savings

The vast majority of already-published and currently-underway studies of the effectiveness of the Patient Centered Medical Home (PCMH) model of care are pilot studies with fewer than 50 PCMH practices. Most of these studies report or intend to report reductions in hospitalization rates and savings across the entire population of patients served by the PCMH practices. New guidance from the Federal Government calls the value of such reports into question.

The AHRQ recently released an excellent pair of white papers offering guidance on the proper evaluation of PCMH initiatives. The first is a four page overview intended for decisionmakers, entitled “Improving Evaluations of the Medical Home.” The second is a 56 page document that goes into more detail, entitled “Building the Evidence Based for the Medical Home: What Sample and Sample Size Do Studies Need?” The whitepapers were prepared by Deborah Peikes, Stacy Dale and Eric Lundquist of Mathematica Policy Research and Janice Genevro and David Meyers from the AHRQ.

The white papers emphasize a number of key points:

Base evaluation plans on plausible estimates of the effects of PCMH. Based on a review of the evidence so far, the white paper suggested that a successful program could plausibly hope to reduce cost or hospitalizations, on average, by 15 percent for chronically ill patients and 5 percent for all patients.
Use a proper concurrent comparison group, rather than doing a “pre-post” analysis. Pre-post analyses, although common, are inconclusive because they can easily by confounded by other factors changing during the same time period, such as economic conditions, health care policy changes, advances in technology, etc.
Focus on evaluating a large number of practices, rather than a large number of patients per practice. The authors point out that “a study with 100 practices and 20 patients per practice has much greater power than a study of 20 practices with 100 patients each.” They warn that small pilot studies with 20 practices or less are unlikely to produce rigorous results without combining the results with many other small studies conducted using the same metrics. Such pilot studies, which unfortunately are very common, are really only useful for generating hypotheses, not for drawing conclusions. The authors note that neither positive nor negative results of such small studies should be relied upon. Small PCMH studies can show no significant impact because they did not have the power to detect such an impact.
Focus on evaluating health and economic outcomes in subsets of patients such as those with chronic disease. Satisfaction can be evaluated across the entire population, but if you use data for the entire population to measure hospitalizations, emergency department visits, inpatient days, or health care costs, the lower risk portions of the population contribute noise that obscures the measurement of the effect that is occurring primarily among those most likely to experience such events in the first place.
Use statistical methods that account for “clustering” at the practice level, rather than treating individual patients as the unit of analysis. Since the intervention is intended to change processes at the practice level, the individual patients within a practice are not independent of one another. Clustering must be taken into account not only at the end of the study, when evaluating the data. It must also be taken into account at the beginning, when determining the number of practices and patients to sample. For example, if a study includes a total of 20,000 patients, but the patients are clustered within 20 practices, then the effective sample size is only 1,820, assuming patient outcomes are moderately clustered within practices. When statistical methods treat such patients as independent, they are implicitly treating the sample size in such a situation as 20,000 rather than 1,820. As a result, evaluators making such an assumption are dramatically over-estimating their power to detect the effect of the PCMH transformation. If they adjust for clustering at the end, their findings are likely to show a lack of a significant effect, even if the PCMH program really worked. On the other hand, if they don’t adjust for clustering in the end, there is a great risk of reporting false positive findings. For example, in a PCMH study with 20 practices and 1,500 patients per practice, where the analysis was done without adjusting for clustering and found a positive result, there is a 60% chance that the positive result is false, based on typical assumptions.

These recommendations are based not only on the experience of the authors, but on modeling that they did to explore the implications of different study scenarios with different numbers of patients, intervention practices, control practices and measures of interest. These models calculate the minimum detectable effect (MDE) based on assumptions regarding typical characteristics of the patient populations, practices, and plausible effects of the PCMH program, based on a review of prior studies and the authors’ experience. The models illustrate that, when measuring the impact of PCMH on costs or hospitalization rates for all the patients receiving PCMH care, the MDE drops as the number of practices in the PCMH intervention group increases. But, even with 500 PCMH practices, the studies cannot detect the 5% cost or hospitalization reduction that the authors consider to be the plausible impact of PCMH on the entire population.

The authors re-ran the models, assuming that the measure of cost and hospitalization would consider only the sub-population of patients with chronic diseases.

The model showed that, based on reasonable assumptions, at least 35 PCMH practices, plus an equivalent number of concurrent comparison practices, would be required to detect the 15% effect that the literature suggests is the plausible effect of PCMH on cost and hospitalizations among patients with chronic diseases. Even when focusing on the chronic disease sub-population, a pilot evaluation with only 10 PCMH practices and 10 comparison practices could not detect an effect smaller than 30%, an effect size they considered implausible.

I found this modeling exercise to be very informative and very worrisome, given the large number of pilot studies underway that are unlikely to provide conclusive results and the risk that people will try to draw incorrect conclusions when those results become available. Often, health care leaders find these calculations inconvenient and frustrating, as if the bearers of this mathematical news are being overly rigorous and “academic.”

Note that these concepts and conclusions are applicable not only to evaluations of PCMH, but also of other programs intended to improve processes or capabilities at the level of a physician’s practice, a clinic or a physician organization such as health information technology investments, training staff in Lean methods, or implementing gain-sharing or other performance-based incentives.

Dr. Ward

Richard E. Ward, MD, MBA, CEO of Reward Health. Also physician, health care analytics and informatics innovator, husband, father, tenor & gradually improving swimmer

AHRQ guidance: Forget about proving Medical Home effectiveness with small pilot studies attempting to measure all-patient cost savings

Dr. Ward

Share

Leave a Comment

Free Subscription to Blog

Recent Posts

CMS Innovation Center engages in linguistic manipulation and health economics denialism in new definition of “VBC”

What should we do when there is a clash between two noble goals: consumer transparency and quality improvement? Five proposed principles.

CDC’s new $200M Center for Forecasting and Outbreak Analytics mistakenly frames modeling efforts as “forecasting.” It should be all about policy decision support.

Oncology Care Model failure calls for rethinking our approach to real, sustainable improvement in specialty care

U of Michigan study: Epic’s sepsis predictive model has “poor performance” due to low AUC, but reporting AUC is like building a bridge half way over a river. How to finish the job.

What is missing from the CMS Innovation Center five year vision?