Anthem and other Blue MA Plans are worse than traditional Medicare in use of Low Value Services

In the November 1, 2024 issue of JAMA, Ciara Duggan and others from Harvard and Brigham published a paper evaluating the utilization of “low value services” across major Medicare Advantage insurers and Traditional Medicare. Low value services utilization was defined based on the Choosing Wisely campaign and the US Preventive Task Force, and incorporated in metrics by the actuarial firm Milliman within their MedInsight Health Waste Calculator software. The authors used a dataset from 2018, so the results are already a little stale, but the scale of the dataset with 2.3 million Medicare beneficiaries provided the power to dive into plan-level performance and to break out performance into service categories. They found that, compared to Traditional Medicare, some plans such as Centene, Humana and United Healthcare outperform Traditional Medicare, while other plans such as CVS Aetna, Cigna and Blue Cross Blue Shield affiliates actually perform worse than Traditional Medicare, with by far the worst performance coming from Anthem, the largest of the Blue Cross Blue Shield affiliates — with total utilization of low value services 9% higher than Traditional Medicare.

2018 Low Value Services Utilization in MA PPO vs MA HMO, shown as % difference from Traditional Medicare

The study also broke out the low value services into categories, and as shown above, comparing the performance of Medicare Advantage PPO and HMO plans and Traditional Medicare. In every category of services except diagnostic and preventative testing, the MA HMO plans outperformed the MA PPO plans. MA HMO plans, with their narrower networks and stronger influence by the plan on the beneficiaries’ relationship with their primary care physicians, performed particularly well in reducing the prescribing by those physicians of low value common medical treatments such as antibiotics for upper-respiratory infections or ear infections — leading to a 29% reduction compared to Traditional Medicare. MA HMO plans also achieved a 24% reduction compared to Traditional Medicine in the utilization of low value procedures and surgeries, which could be a function of gatekeeping or value-based contract incentives of referring primary care physicians, network selectivity of specialists with prudent and evidence-based practice patterns, and/or pre-authorizations and associated coverage denials.

The authors made an important point: MA plans have been treated as if they were homogeneous, when in fact they are heterogeneous. I think the implication of this is that we need not only to routinely update this type of analysis, but we need for the low value services utilization metrics to be associated with the programs, incentives and network characteristics that are effective in avoiding low value services utilization.

Critique of the presentation of the results

I usually do not write blog posts just reporting the results of papers I read. This humble blog is not going to meaningfully extend the reach of JAMA! I write only when there is something worthy of critique and commentary. In this case, my critique is about the way our scientific publications present data. I did the graphical analysis above to present the data based on what I think the “real” question is. Journals like JAMA present themselves as “scientific” even when the subject matter is policy decision-making. Science is about coming up with a theory regarding cause-effect relationships, and then making observations and conducting analysis intended to support or refute the theory (which are described as hypotheses in the context of particular studies). The premise of this scientific process is that a theory that was plausible before the study is more likely to be true if an observed association between some cause and some effect is stronger than some accepted threshold. In more precise words, the hypothesis is supported if the association is “statistically significant.” Science is a fantastic tool when applied for its intended purpose. But scientific methods were not designed for policy decision support. Policy decision-making is based on comparing outcomes and associated uncertainty across decision alternatives. In policy decision making, uncertainty remains relevant, but the particular method of statistical significance testing is not. In my opinion, the subject matter of the report by Duggan and colleagues tells me that the real purpose is to support policy decision-making. But in order to publish their analysis in JAMA, they needed to frame it as a scientific paper, for which they needed to state a hypothesis before conducting the analysis and then conduct significance testing, and then report the conclusions based on that significance testing. When an observed relationship is found to be not statistically significant, the expected virtuous behavior in the scientific tradition is to report it as “we observed no differences” which gets interpreted by most readers as “there was no difference.” That is a great tradition when the purpose is science because it avoids wasting the limited pages of journals and the limited time of abstract presentation attendees and journal readers on associations too likely to be observed by chance. But, policy decisions are made on a best-information-available basis. Confidence intervals are informative for policy decision-making, but the “no differences” tradition is just wrong for that purpose.

The authors knew that to get accepted in JAMA, they needed to twist their decision-oriented analysis into a proper scientific study. The authors presumably were rightly worried that even with their enormous sample, they might lack the power to achieve statistical significance in subanalysis. Therefore, they chose the safer route of framing the “key questions” of their analysis into broad questions for which they they had sufficient sample size to give them the power to have a good chance of achieving statistical significance:

“First, are there significant differences in the use of low value services between Medicare beneficiaries enrolled in MA vs Traditional Medicare at a national level?”

“Second, are these differences consistent across major MA parent insurer categories?”

Based on reading the paper, it is clear that the authors also wanted to understand if there were differences between MA PPO vs. MA HMO plans, and they wanted to know if those differences were of different direction or magnitude across the categories of low value services. But they presumably did not want to have more than two “key questions.” So, those got relegated to secondary status, and were therefore unwelcome for inclusion in the “findings” or “abstract” sections. Presumably to conform to the requirements of the particular statistical methods they used for significance testing, they chose to report “adjusted absolute differences” in the utilization rates per 100 beneficiaries — even thought those absolute rates are of no intuitive meaning to virtually all readers. Percentage differences are far more intuitive when the reader is just trying to inform policy decision-making by getting a sense of the magnitude of the effect that MA PPO and MA HMO plans might have on reducing waste on low value services of different types.

One interesting finding is that both PPO and HMO Medicare Advantage plans perform poorly compared to Traditional Medicare on use of low value diagnostic and preventive testing, about which the authors offered no explanatory theories. MA PPO plans had 37% higher use and MA HMO plans had 50% higher use compared to Traditional Medicare. Perhaps these low value services are a side effect of increased primary care vigilance and visit frequency, leading PCPs to order low value services that are not sufficiently economically consequential to justify utilization management attention by the plans. The study was not intended to answer that question. But, perhaps more importantly, when one of the service categories that makes up an overall metric behaves so differently than the other service categories, to the point that it partially cancels out observed differences in the overall measure, it makes curious minds wonder if the services are really alike enough to treat as being part of the same “population” of services within the aggregated “total low value services” metric. The rigors of JAMA’s science-based framing preclude the authors from presenting information from any less formal information gathering they might have done to shed light on that issue or from offering comments that might come off to JAMA reviewers and inappropriately speculative.

Therefore, the purpose of this blog post is to share my re-do of the results presentation as I think it should be presented to be most useful for the purposes of supporting policy decision-making, without the misapplied limitations imposed by the publisher who demands pretending it to be for purposes of science.

Share

Facebook
Twitter
LinkedIn
Email

1 thought on “Anthem and other Blue MA Plans are worse than traditional Medicare in use of Low Value Services”

  1. Good post! Interesting and actionable findings in the JAMA study when presented this way. Your “meta” point about what data to present and how is entirely consistent with the philosophy of my Data Poet blog.

Leave a Comment

Your email address will not be published. Required fields are marked *

Free Subscription to Blog

Recent Posts