Primary care physicians acknowledge over-utilization and blame it on the lawyers.

Catching up on some reading, I came across this article in Medical News Today, describing the results of survey research conducted by Brenda E. Sirovich, MD, MS, and colleagues from the VA Outcomes Group (White River Junction, Vermont), and the Dartmouth Institute for Health Policy and Clinical Practice.   They surveyed primary care physicians and published their results in the Archives of internal Medicine.  They documented that primary care physicians acknowledge over-utilizationof services received by their patients.

Their #1 theory of causation?  “It’s because of malpractice lawyers!” That is not surprising to me, and is consistent with many conversations with both front line PCPs and leaders of primary care physician organizations.

However, I personally believe that this is really the #1 rationalization of the over-utilization.  I feel that there are two main causes:

  1. Low fee-for-service reimbursement, creating the need for many short visits each day to generate enough revenue to make a good living (i.e. the “hamster wheel”).  When visits need to be short, prescriptions and referrals are important to make the patient feel satisfied that their problem is really being addressed.
  2. Lack of effective clinical leadership or even peer interaction over the actual clinical decision-making (i.e. “care-planning”) done on a day-to-day basis by the vast majority of primary care physicians

Beyond the medical school and residency stage, physicians’ care planning occurs all alone, with no-one looking over their shoulder — at least no one with sufficient quantity and quality of information to make any real assessment of clinical decision-making.  Health plans have tried to do so with utilization management programs, but the poor quality of information and the relationship distance between the physician and the health plan are too great to generate much more than antipathy.

If you eliminated malpractice worries and paid primary care physicians a monthly per-capita fixed fee, would wasteful over-utilization go down without also providing deeper clinical leadership and peer review enabled by better care planning data?  Perhaps.  But I would worry that, in that scenario, physicians would still stick with their old habits of hitting the order & referral button out of habit to please the patients who have been habituated to think of “lots of orders and referrals” as good primary care.

The “mindfulness” thing in the invited commentary by Calvin Chou, MD, PhD, from the University of California, San Francisco, is a bit much — trying too hard to coin a term.  I’ve heard that presented before, and I categorized it with “stages of change,” “empowerment,” “self-actualization,” “motivational interviewing,” and “patient activation.”   I’m not saying that such popular psychological/sociological concepts have no merit.  I’m just a Mid-Westerner who starts with more conventional theories of behavior.

Read More

The debate about what to maximize when selecting candidates for care management programs: Accuracy? ROI? Net Savings? Or Cost-Effectiveness?

When doing analysis, it is really important to clarify up front what it is you are actually trying to figure out.  This sounds so obvious.  But, I am always amazed at how often sophisticated, eager analysts can zip past this important first step.

Health plans, vendors and health care providers are all involved in the design and execution of wellness and care management programs.  Each program can be conceptualized as some intervention process applied to some target population.  A program is going to add more value if the target population is comprised of the people for whom the intervention process can have the greatest beneficial impact.

I have found it useful to conceptualize the process of selecting the final target population as having two parts.  The first part is the process of identification of the population for whom the intervention process is deemed relevant.  For example, a diabetes disease management program is only relevant to patients with diabetes.  An acute-care-to-ambulatory-care transitions program is only relevant to people in an acute care hospital. A smoking cessation program is only relevant to smokers.  The second part is to determine which members of the relevant population are to actually be targeted.  To do this, a program designer must figure out what characteristics of the relevant candidates are associated with having a higher opportunity to benefit from the intervention.  For example, in disease management programs, program designers often use scores from predictive models designed to predict expected cost or probability of disease-related hospitalization.  They are using cost or likelihood of hospitalization as a proxy for the opportunity of the disease management program to be beneficial.  Program designers figure that higher utilization or cost means that there is more to be saved.  This is illustrated in the graph below, where a predictive model is used to sort patients into percentile buckets.  The highest-opportunity 1% have have an expected annual admit rate of almost 4000 admits per 1000 members, while the lowest 1% have less than 200 admits per 1000 members.  A predictive model is doing a better job when more of the area under this curve is shifted to the left.

Although it is common to use expected cost or use as a proxy for opportunity, what a program designer would really like to know is how much good the intervention process is likely to do.  Other variables besides expected cost or use can contribute to a higher opportunity.  For example, in a disease management program, the program might be more worthwhile for a patient that is already motivated to change their self-management behaviors or one that had known gaps in care or treatment non-compliance issues that the intervention process is designed to address.

Applying micro-economics to care management targeting

Once the definition of “opportunity” is determined and operationally defined to create an “opportunity score” for each member of the relevant population, we can figure out which members of the relevant population to actually target for outreach for the program.  Conceptually, we would sort all the people in the relevant population by their opportunity score.  Then, we would start by doing outreach to the person at the top of the list and work our way down the list.  But, the question then becomes how far down the list do we go?  As we go down the list, we are investing program resources to outreach and intervention effort directed at patients for which the program is accomplishing less and less.   Economists call this “diminishing returns.”

As illustrated by the red line in the graph above, there is some fixed cost to operating the program, regardless of the target rate.  For example, there are data processing costs.  Then, if the program does outreach to a greater and greater portion of the relevant population, more and more people say “yes” and the costs for the intervention go up in a more or less linear manner.  As shown by the green line, the savings increase rapidly at first, when the program is targeting the candidates with the greatest opportunity.  But, as the threshold for targeting is shifted to the right, the additional candidates being targeted have lower opportunity, and the green line begins to flatten. The blue line shows the result of subtracting the costs from the savings to get net savings.  It shows that net savings increases for a while and then begins to decrease, as the cost of intervening with additional patients begins to outweigh the savings expected to accrue from those patients.  In this analysis, net savings is highest when 41% of the relevant population of diabetic patients is targeted for the diabetes disease management program.  The black dotted line shows the result of dividing savings by cost to get the return of investment, or ROI.   With very low target rates, too few patients are accumulating savings to overcome the fixed cost.  So the ROI is less than 1.  Then, the ROI hits a peak at a target rate of 18%, and declines thereafter.  This decline is expected, since we are starting with the highest opportunity patients and working down to lower opportunity patients.   Note that in this analysis, increasing the target penetration rate from 18% to 41% leads to a lower ROI, but the net savings increases by 24%.  So, if the goal is to reduce overall cost, that is achieved by maximizing net savings, not by maximizing ROI.

Should we try to maximize accuracy?

In a recent paper published in the journal Population Health Management by Shannon Murphy, Heather Castro and Martha Sylvia from Johns Hopkins HealthCare, the authors describe their sophisticated methodology for targeting for disease management programs using “condition-specific cut-points.”  A close examination of the method reveals that it is fundamentally designed to maximize the “accuracy” of the targeting process in terms of correctly identifying in advance the members of the relevant disease-specific population that will end up being among the 5% of members with the highest actual cost.   In this context, the word accuracy is a technical term used by epidemiologists.  It means the percentage of time that the predictive model, at the selected cut-point, correctly categorized patients.  In this application, the categorization is attempting to correctly sort patients into a group that would end up among the 5% with highest cost vs. a group that would not.  By selecting the cut point based accuracy, the Hopkins methodology is implicitly equating the value of the two types of inaccuracy: false positives, where the patient would be targeted but would not have been in the high cost group, and false negatives, where the patient would not be targeted but would have been in the high cost group. But, there is no reason to think that, in the selection of targets for care management interventions, false negatives and false positive would have the same value. The value of avoiding a false negative includes the health benefits and health care cost savings that would be expected by offering the intervention. The value of avoiding a false positive includes the program cost of the intervention.  There is no reason to think that these values are equivalent.  If it is more important to avoid a false positive, then a lower cut-point is optimal.  If it is more valuable to avoid a false negative, then a higher cut-point is optimal.  Furthermore, the 5% cost threshold used in the Hopkins methodology is completely arbitrary, selected without regard to the marginal costs or benefits of the intervention process at that threshold.  Therefore, I don’t advise adopting the methodology proposed by the Hopkins team.

What about cost-effectiveness?

The concept of maximizing ROI or net savings is based on the idea that the reason a health plan invests in these programs is to save money.  But, the whole purpose of a health plan is to cover expenses for necessary health care services for the beneficiaries.  A health plan does not determine whether to cover hip replacement surgery based on whether it will save money.  They cover hip replacements surgery based on whether it is considered a “standard practice,”  or whether there is adequate evidence proving that the surgery is efficacious.  Ideally, health care services are determined based on whether they are worthwhile — whether the entire collection of health and economic outcomes is deemed to be favorable to available alternatives.  In the case of hip replacement surgery, the health outcomes include pain reduction, physical function improvement, and various possible complications such as surgical mortality, stroke during recovery, etc.  Economic outcomes include the cost of the surgery, and the cost of dealing with complications, rehabilitation and follow-up, and the savings from avoiding whatever health care would have been required to deal with ongoing pain and disability.  When attempting to compare alternatives with diverse outcomes, it is helpful to reduce all health outcomes into a single summary measure, such as the Quality-Adjusted Life Year (QALY).  Then, the incremental net cost is divided by the incremental QALYs to calculate the cost-effectiveness ratio, which is analogous to the business concept of return on investment.  If the cost-effectiveness ratio is sufficiently high, the health service is deemed worth doing.  There is no reason why wellness and care management interventions should not be considered along with other health care services based on cost effectiveness criteria.

The idea that wellness and care management interventions should only be done if they save money is really just a consequence of the approach being primarily initiated by health plans in the last decade.  I suspect that as care management services shift from health plans to health care providers over the next few years, there will be increased pressure to use the same decision criteria as is used for other provider-delivered health care services.

Read More

Hospitalists have been focused on reducing hospital length of stay, but not so much on smooth transitions to ambulatory care

A new study published in the  Annals of Internal Medicine compared the economic outcomes of hospital episodes managed by hospitalists to those managed by the patients’ primary care physicians in a Medicare population. Previous studies focused only on the cost of the hospital stay itself, and showed that hospitalists were able to reduce length of stay and hospital cost. These economic savings accrue primarily to hospitals who are reimbursed with a fixed DRG-based payment for most hospital stays. These hospital savings have motivated hospitals to hire more hospitalist physicians. According to the Society of Hospital Medicine, 80 percent of hospitals with more than 200 beds now have hospitalists.  There are now 30,000 hospitalist physicians, and the specialty continues to grow more rapidly than any other.

But, the new study measures the economic outcomes of the entire hospital episode, including care received after the patient is discharged from the initial hospital stay.   The study shows that hospital stays managed by hospitalists had an average length of stay that was 0.64 days shorter, saving an average of $282. But, those patients were more likely to return to the emergency department and more likely to be readmitted to the hospital, leading to post-discharge costs that averaged $332 higher than for hospital episodes managed by the patients’ own primary care physicians. Thus, the use of hospitalists ends up costing Medicare $50 more per episode, increasing overall costs by $1.1 billion annually.

The study authors, Yong-Fang Kuo, PhD and James S. Goodwin, MD from the University of Texas Medical Branch in Galveston, hypothesized that “hospitalists, who typically are employed or subsidized by hospitals, may be more susceptible to behaviors that promote cost shifting.” The implication is that, if hospitalists were employed by primary care-based Accountable Care Organizations (ACOs) that were being held responsible for the total cost of care for a defined population of patients, they might be more strongly encouraged to focus on improving care transitions to reduce downstream complications and associated emergency department visits and hospital re-admissions.

Even without ACOs, there has been a great deal of effort over the last few years to improve transitions of care for patients discharged from acute care hospitals.  Most of these efforts attempt to improve both the “pitch” by the hospital-based team and the “catch” by ambulatory care providers.  But, some efforts, such as the BOOST program funded by the Hartford Foundation, have a primary emphasis on the pitch.  Other efforts, such as the STAAR program of the Commonwealth Fund and the Institute for Healthcare Improvement (IHI), tend to emphasize the catch. Hopefully such programs will lead to wide-spread improvements in quality of care and reductions in total cost of hospital episodes.  ACOs could catalyze and accelerate those improvements by linking hospital care more tightly to primary care and supporting this linkage through investments in health information exchange (HIE) capabilities designed to foster thoughtful, smooth transitions of care.

Read More

Health Care Heroes: Wilmer Rutt, MD – Adapting the R&D Concept to Health Care Provider Organizations

Wilmer Rutt, MD - Director of Henry Ford Health System Center for Clinical Effectiveness, in his office at New Center Pavillion, 1993

This morning, I read the results from a clinical trial of ovarian cancer screening in JAMA.  The trial showed that ovarian cancer screening was not effective in saving lives.  I was interested in the article because I was one of the investigators in that trial, which began in the early 1990s.  Henry Ford Health System was the largest of many recruitment sites for the Prostate, Lung, Colorectal and Ovarian (PLCO) trial, one of the largest clinical trials ever done.  I’m not surprised by the ovarian cancer results, since our models long ago suggested it was unlikely to work.  But, it is amazing to me how long it takes to figure out whether something works in health care, particularly for interventions that are preventive services or that attempt to change the delivery system.  It is unfortunate that the “learning loop” — from innovation to implementation to evaluation and back to innovation — is often far longer than our collective attention span.

But, the back story of how Henry Ford got involved in the PLCO trial is the most interesting aspect of the PLCO story for me.  It is the story that best illustrates why Wil Rutt, MD is one of my health care heroes.  When I was fresh out of University of Chicago medical school in 1990, I moved to Detroit to work with Dr. Rutt, who had recently founded the Center for Clinical Effectiveness (CCE) at the Henry Ford Health System.

In other industries — particularly product manufacturing industries — it is typical for companies to invest in internal capacity for research and development (R&D).  Universities and governments do basic research, figuring out how nature works.  But, it is companies that do R&D to apply basic knowledge to the development of successful products.  They generate ideas for product innovations.  Then they use rigorous methods of scientific research and engineering to figure out whether those innovations are successful and to develop ways of manufacturing the product.  Separate from such R&D efforts, manufacturers also have engineers in the product manufacturing area that try to improve manufacturing processes.  To do so, these engineers use methods variously described as statistical process control, continuous quality improvement, total quality management, six sigma, and lean.  Drug and biomedical device companies are product manufacturers, and share this tradition of investing in both R&D and manufacturing process improvement.

In the field of health care delivery, there has been great progress over that last few decades in adapting the process improvement methods from manufacturing for use in health care.   Drs. Don Berwick, Paul Batalden, Brent James, and Jack Billi come to mind as zealous advocates for this advancement. And, certainly there have also been plenty of health services researchers, mostly in universities and government-sponsored think tanks, who have done research on health care delivery organizations, studying such organizations as anthropologists might study gorillas in the mist.

Mark Muller, Wei Chang, Kim Sadlocha and Rick Ward in the the offices of the Center for Clinical Effectiveness, 1993

But, as of 1990, there was little or no precedent for non-academic health care provider organizations to do R&D, the kind of practical work applying rigorous scientific and engineering methodologies to improving the design of a company’s own product or service.  Wil Rutt’s CCE was one of the first attempts to apply R&D to health care delivery. He assembled a team of doctors, PhDs, IT professionals and others to design better ways for Henry Ford Health System to deliver health care.  The CCE did extra-murally funded research intended to be generalizable to the world.  But the focus was on R&D for Henry Ford, and the grants and papers were merely a means to that end.

One of Dr. Rutt’s many innovation concepts during the early 1990s was the idea to design a care process that resembled the Jiffy Lube oil-change process to deliver clinical preventive services.  At that time, there were upwards of 50 different preventive services recommended in the U.S. Preventive Services Task Force guidelines.  Dr. Rutt’s CCE developed pocket-size guideline manuals, age and gender-specific flow sheets, and preventive services quality feedback in an effort to promote adherence to preventive services guidelines by primary care physicians.  But, he concluded that it would be better to cross train non-physician staff to efficiently deliver a whole set of preventive services to patients during a single ambulatory encounter.  He wanted these services to be delivered in a convenient setting such as a shopping mall rather than on a clinical campus. He called these “Health Assessment Labs” or “HALs.”  However, Dr. Rutt needed funding to implement and rigorously evaluate the HAL concept.  Along came the National Institutes of Health (NIH), who was sponsoring the PLCO trail.  Dr. Rutt saw the opportunity for Henry Ford to be a clinical site for the PLCO.  We won a grant to do so, and became the largest of the many PLCO clinical sites.  That grant was one of the largest research grants ever received by Henry Ford Health System, which is no slouch in clinical and basic science research.  But, Dr. Rutt’s thrill was not the research fame.  It was the opportunity to do R&D on the HAL concept.

Two decades later, we are still, as a field, at the infancy of our journey to adapt the R&D concept to health care delivery.  Certain delivery systems, such as Kaiser Permanente, Mayo Clinic, Cleveland Clinic, and Novant have discussed an R&D-like concept of developing proprietary science, technology and methods for care delivery.  But, the R&D concept has not really taken hold.  Health care provider organizations do not yet consider R&D to be a core competency.  Hardly any provider organizations have an internal department dedicated to R&D.  They don’t yet see R&D as a necessary investment required to maintain organizational competitiveness.  I feel strongly that we need to finish making that advancement.  And when we do, we’ll owe a debt of gratitude to Dr. Rutt for being the pioneer.

Read More

Google engineering too slow? Facebook too invested in the wrong data model to adapt? Are you kidding me?

Few things in my work life are better than finding mind-blowing information from other industries and figuring out the implications for healthcare.

I recently read a set of slides by Paul Adams, a user experience designer that worked at Google and Facebook.  Although Adams’ presentation had 224 slides, the main thesis was relatively simple and obvious. The best insights usually are. Adams pointed out that online social media applications create a single category of relationships, called “friends.” They put every social relationship in that one bucket.  Wife, college sweetheart, boss, party friends, kids …  all just “friends.”  In contrast, real-world social networks — the kind that humans have cultivated for millions of years — are characterized by various groupings of people representing different roles, life-stages and social contexts, with different levels of strength and trust.

Diagram from Paul Adams' presentation on Real World Social Networks

He described the research that shows that people typically have 4-6 different groupings of friends.  People typically have fewer than 10 strong ties that consume most of their communication attention.  They typically don’t have more than 150 weak ties.  They have many “temporary ties” that may influence their behaviors for relatively short periods of time.  He points out that existing social media applications create problems for their users because the users publish information intended for one group of people that ends up being received by others.  Like wild party pictures being viewed by your prospective employer.

I came across Adams’ presentation through a link from a CNN article by Dhanji Prasanna that tells the story of how Adams developed these ideas when he was part of a team at Google that was developing Google’s response to Facebook.  The CNN article explains that Google had an engineering culture and a technology infrastructure that made them too slow to develop an application that took Adams’ insights to heart.  Adams then left Google to join Facebook.  But, Facebook was deeply invested in the simplified one-big-bucket social graph at the heart of the system that now has 750 million users.  So, despite Facebook’s “hacker” engineering culture that allows it to develop applications rapidly, they were unable to solve their fundamental problem.  They eventually launched Facebook Groups, which is a superficial answer to the insight that people have multiple groups of relationships.  But, Facebook’s central “one-big-bucket” friends model was apparently deemed too risky to touch.

My eyes rolled.  Google’s culture makes them too slow?  Facebook can’t innovate?  Are you kidding me?  If only we could experience a tenth of the agility shown by those two companies in health care, which has long suffered from a powerful aversion to risk and change in both care delivery and information technology.

But, there are deeper connections between Adams’ insights about social networks and our challenges in transforming our health care system.

First, the health behaviors of patients are strongly influenced by their social networks. For years, health care providers, health plans and vendors of wellness and care management services have attempted to promote smoking cessation, exercise, healthy diet, compliance with medication orders, and other health and lifestyle behaviors by designing “interventions” that target individual patients.  A whole industry of “health risk assessment” and “predictive modeling” was built up to try to identify which individual patients to target.  But, such an approach has produced unimpressive results.  That should not have been surprising.  Decades old research about the diffusion of innovations has shown that lifestyle behaviors in a population change through social networks.  People follow the lead of the people around them.  Therefore, to be effective, wellness and care management programs need to be designed to work through those existing social networks.  We need to be targeting groups of people that are already connected, rather than just reaching out to individuals.  We need to be designing our communications and incentive approaches so as to augment and leverage our patients’ social networks.  To support such social-network-oriented clinical programs, we need information systems that capture information about those social networks and that are designed to interact with them.   But, when we examine the fundamental data model and features of the market-leading electronic health record (EHR) systems, such capabilities are nowhere to be found.  Those vendors, blessed with a large installed base, may be unable to make such fundamental changes to their systems.  Like Google and Facebook, the leading  EHR vendors may not be agile enough to address our emerging understanding of the importance of social networks that exist among our patients.

Second, the relationships between patients and care providers are types of social network relationships.  I call these care relationships.  When we talk about “accountable care,” we mean that some provider organization is taking responsibility for the quality and cost of care for a population of patients. When we talk about a “patient-centered medical home,” we mean a team of primary care physicians, nurses and other care providers proactively taking care of a group of patients. But, who exactly are those patients? We have developed some very crude primary care “attribution” logic that tries to derive care relationships from claims data.  But, we do a very poor job of validating such derived care relationships or proactively declaring new care relationships.  And we don’t keep track of changes in care relationships.  We don’t have established processes to inform the participants in those relationships when one of the parties determines that they don’t intend for the relationship to exist.  We don’t distinguish between different types of care relationships.  If a patient has heart failure and sees both a primary care physician and a cardiologist, we don’t explicitly declare which physician has the care responsibility for that patient problem.

Furthermore, the referral relationships among providers are also types of social network relationships.  As with Adams’ real-world social networks, these relationships among patients, primary care doctors, specialists, hospitals, home health care nurses, pharmacists, and others are complex and dynamic.   Yet, when you examine the systems we use to keep track of these relationships, they are primitive or non-existent.  Just as over-simplification of social network relationships has reeked havoc for social media users, so has over-simplification of care relationships, care responsibilities and referral relationships harmed clinical communications and accountabilities.  This deficiency ultimately reduces the effectiveness of care. As a result, patients are harmed.

Read More

Identifying and Understanding Analysis Tricks: Regression Toward the Mean

Imagine that you are a new homeowner, shopping for insurance for your new house.  You live in an area prone to earthquakes, and you are not a big risk-taker.  You decide that you should have earthquake insurance.  You are on the web researching earthquake insurance policies. You come across the web site of Acme Insurance, an international leader in earthquake damage coverage.  The web site says they are the best earthquake insurance company because they not only pay for earthquake damage, they have an innovative program to actually prevent earthquakes experienced by their beneficiaries. The program involves assigning an earthquake prevention coordinator (EPC) to each homeowner.  The EPC does one session of telephonic earthquake prevention coaching, sends some earthquake prevention educational materials by e-mail, and makes a follow-up call to assure that the homeowner is exhibiting good earthquake prevention behaviors.   This is a proprietary program, so more details are only available to Acme Insurance beneficiaries.  The program is proven to reduce earthquakes by 99%.  You click on the link to view the research study with the documented proof.

The study was conducted by Acme Analysis, a wholly-owned earthquake informatics subsidiary of Acme Insurance.  The study begins by noting an amazing discovery.  When Acme analyzed its earthquake claims for 2010, it noted that 90% of its earthquake damage cost occurred in only 10% of its beneficiaries.  It noted that these high cost beneficiaries were living in particular cities.  For example, it noted high earthquake claims cost in Port au Prince, Haiti for damage incurred during the January 12, 2010 earthquake there.  It developed an innovative high risk classification approach based on the zodiac sign of the homeowners’ birth date and the total earthquake claims cost for damage incurred in the prior month.  On February 1, 2010, they applied this risk classification to identify high risk homeowners, most of which were Libras or Geminis living in Port au Prince.  They targeted 100 of those high risk homeowners for their earthquake prevention program.    The EPCs sprung into action, making as many earthquake prevention telephone coaching calls and sending as many earthquake prevention e-mails as they could, considering the devastated telecommunications infrastructure in Port au Prince.

The program evaluation team then compared the rate of earthquakes exceeding 6.0 on the Richter scale and average earthquake damage claims for those 100 people for the pre-intervention period vs. the post intervention period. Among the 100 beneficiaries targeted by the program, the average number of major earthquakes plummeted from 1 in the pre-intervention period (January, 2010) to 0 in the post-intervention period (March, 2010), and the number of minor earthquakes (including aftershocks) dropped from 20 down to just 10. But the program was not just good for the beneficiaries wanting to avoid earthquakes.  It was a win-win for Acme Insurance.  Earthquake damage claims had dropped from an average of $20,000 per beneficiary during the January, 2010 pre-intervention period to an average of just $200 for damage incurred during the post-intervention period in March, 2010, when two of the targeted beneficiaries experienced damage from an aftershock.  The program effectiveness was therefore 1 – (200/20,000) = 0.99.  That means the innovative program was 99% effective in preventing earthquake damage claims cost.  After considering the cost of the earthquake prevention coordinators and their long-distance telephone bills, the program return on investment (ROI) was calculated to be 52-to-1.  The program was a smashing success, proving that Acme Insurance is the right choice for earthquake coverage.

Can you spot the problem? Can you extrapolate this insight to the evaluation of health care innovations such as disease management, care coordination, utilization management, patient-centered-medical home, pay-for-performance, accountable care organizations, etc.?

The problem is called “regression toward the mean.” It is a type of bias that can affect the results of an analysis, leading to incorrect conclusions.  The problem occurs when a sub-population is selected from a larger population based on having extreme values of some measure of interest.  The fact that the particular members had an extreme value at that point in time is partly a result of their underlying unchanging characteristics, and partly a matter of chance (random variation).   Port au Prince, like certain other cities along tectonic plate boundaries, is earthquake prone.  This is an unchanging characteristic.  But, it was a matter of chance that a major earthquake hit Port au Prince in the particular month of January, 2010.  If you track Port au Prince in subsequent months, their theoretical risk of an earthquake will be somewhat higher because it is still an earthquake prone area.  But, chances are that, in any typical month, Port of Prince will not have a major earthquake.

An analogous effect can be observed when you identify “high risk” patients based on having recently experienced extreme high rates of health care utilization and associated high cost.  The high cost of such patients is partly driven by the underlying characteristics of the patients (e. g. age, gender, chronic disease status), and partly based on random chance.  If you track such patients over time, their cost-driving characteristics will lead them to have somewhat higher costs than the overall population.  But, the chance component will not remain concentrated in the selected patients.  It will be spread over the entire population.  As a result, the cost for the identified “high risk” patients will decrease substantially.  It will “regress” toward the mean.  With high risk classification methods typically used in the health care industry, my experience is that this regression is in the 20-60% range over a 3-12 month period, without any intervention at all.  Of course, the overall population cost will continue to follow its normal inflationary trend line.

This regression-toward-the-mean phenomenon has been at play in many, many evaluations of clinical programs of health plans and wellness and care management vendors.  Sometimes unwittingly.  Sometimes on purpose.  Starting back in the 1990s, disease management vendors were fond of negotiating “guarantees” and “risk sharing” arrangements with managed care organizations where they would pick high risk people and guarantee that their program would reduce cost by a certain amount.   Based on regression toward the mean, the vendor could rely on the laws of probability to achieve the promised result, regardless of the true effectiveness of their program.  The vendor would get their negotiated share of the savings.  Good work if you can get it.  It lasted for a few years until the scheme was widely discredited.  But not widely enough, it appears.  Wellness and care management vendors still routinely compare cost before and after their intervention for a cohort of patients selected based on having extreme high cost in the pre-intervention period.  Health plans and employers eat up the dramatic savings numbers, happy to see that the data “proved” that they made wise investments.

Study designs suffering from obvious regression-toward-the-mean bias will usually be excluded from publication in peer-reviewed scientific journals.  But, they do show up in less formally-reviewed clinical program evaluations by hospitals and physician organizations.  For example, in a recent analysis of a patient-centered medical home (PCMH) pilot, the authors concluded that the program had caused a “48 percent reduction in its high-risk patient population and a 35 percent reduction in per-member-per-month costs” as shown in the following graphic.

In this PCMH program, a total of 46 “high risk poly” members were selected based on having high recent health care utilization involving 15 or more health care providers.  The intervention consisted of assigning a personal health nurse that developed a personal health plan,  having a personal health record (based on health plan data), and providing reimbursement for monthly 1-hour visits with a primary care physician.  The analysis involved tracking of the risk category (based on health plan claims data) and the per-member-per-month (PMPM) cost for the cohort of 46 patients, comparing the pre-intervention period (2009) to the intervention period (2010).   I’m sure the program designers and evaluators for this PCMH pilot are well meaning and not trying to mislead anybody.  I share their enthusiasm for the PCMH model of primary care delivery.  But, I think the evaluation methodology is not up to the task of proving whether the program did or did not save money. Furthermore, even with a better study design to solve the problem of regression-to-the-mean bias, the random variation in health care costs is far too large to be able to detect even a strong effect of a PCMH program in a population of only 46 patients.  Or even 4,600 patients for that matter.  I’d guess that proper power calculations would reveal that at least 46,000 patients would be required to have a chance of proving PCMH effectiveness in reducing cost.

So, how do you solve the problem of Regression Toward the Mean?

As with any type of bias, the problem is with the comparability of the comparison group.  The gold standard study design is, of course, a randomized controlled trial (RCT), where the comparability of the comparison group is assured by randomly assigning subjects between the intervention group and the comparison group (also called the “control group”).

If randomization is not possible, one can try to find a concurrent comparison group that is not eligible for the program and which is thought to be otherwise similar to the eligible population.  The same selection criteria that is applied to the eligible population to pick targets is also used in the ineligle population to pick “simulated targets” for a comparison group.  Note that in such a concurrent control design, the comparison should be between targets and the simulated targets, without considering which of the targets volunteered to actually participate in the intervention.  This aspect of the design is called an “intention to treat” approach, intended to avoid another type of bias called “volunteer bias.” (More on that in a future post.)

Often, the evaluators do not have access to concurrent data from an ineligible population to provide a concurrent comparison group.  In such a case, an alternative is “requalification” of the population in the pre-intervention period and the post-intervention period. Requalification involves using the exact same selection criteria used to pick targets at baseline and shifting it forward in time to pick a comparison group. The result will be a comparison group that is a different list of patients than the ones picked for the intervention. Some of the targets of the intervention may be requalified to be in the comparison group. Others will not. Some members of the comparison group will be members that did not qualify to be in the intervention group.  It is counter-intuitive to some people that such an approach creates a better comparison group than just tracking the intervention group over time.  But, with requalification, you are assured that the same influence that luck had in selecting people based on recent utilization will be present in the comparison group.  The idea is to avoid bias in the selection process by trying to make the selection process symmetrical between intervention and comparison groups.

If I apply these remedies for regression toward the mean bias, does that mean I will be assured of reliable evaluation results?

Unfortunately, no. The bad news is that clinical programs are devilishly hard to properly evaluate.  There are many other sources of bias intrinsic to many evaluation designs.  And, the random variation in the measures of interest are often very large compared to the relatively weak effects of most clinical programs.   This creates a “signal to noise” problem that is particularly bad when trying to evaluate small pilot programs or the performance of individual providers over short time periods.

If you really want to create a learning loop within your organization, there is no alternative to building a team that includes people with the expertise required to identify and solve thorny evaluation and measurement problems.

Read More

How does behavioral economics and “gamification” relate to ACOs?

The transformation of our health care system requires many different people to change their behaviors. Leaders of health care organizations have to be willing to make investments that they have historically not been willing to make. Physicians have to change the way they relate to patients, the way they interact with their non-physician team members in the clinic, the way they allocate their time during clinic visits, the orders they write, and the referral patterns they have established. Patients may have to change their care relationships with providers, and change their tendency to be passive regarding health care decisions. Most importantly, and most difficult of all, patients have to change their lifestyle and self-management behaviors.

In my experience, the leaders of health care organizations and health plans tend to be unsophisticated in their approach to affecting the behaviors of front line providers. And health care providers — particularly doctors and nurses — tend to be unsophisticated in their approach to changing the behaviors of their patients. In both cases, the unsophisticated approach relies too heavily on two things: (1) education (communicating facts) and (2) communicating negative consequences of failure to change behavior.

Furthermore, in the case of affecting provider behavior, a huge amount of attention has been paid recently to the economic incentives intrinsic to fee-for-service vs. bundled payment reimbursement models. The federal government, as well as many commercial payers, have implemented “pay for performance” or “gain sharing” programs that offer relatively small financial rewards in the relatively distant future with a relatively high degree of uncertainty. Such incentives are designed with zero data about the probability that particular behaviors are going to change in response to such incentives. The government and commercial payers just know that they should begin to initiative financial incentives. And they know that for economic and political reasons, they can’t afford for those incentives to be very large. But, paying a financial incentive that is too small to “buy” the desired behavior change is pure waste. It’s just symbolic, not transformative.

To be successful, leaders of ACOs need to raise their game in behavioral economics, following the lead of many other industries that have been studying and putting into practice effective techniques for behavior change. In the electronic world, millions of people play video and computer games, and millions interact with commercial web sites that have been “gamified” — incorporating point sytems, competition, virtual rewards, and other techniques shown to modify behavior without relying on financial incentives. In the health care field, health plans have been at the forefront of exploring the application of such techniques to health and wellness. But, these applications have not yet been shown in well-designed studies to have a significant impact on health outcomes and health care costs (notwithstanding a slew of poorly-designed studies showing miraculous results).

The following video is a lecture delivered at Stanford by Rajat Paharia, the founder and Chief Product Officer of Bunchball – a developer of a technology platform that supports the incorporation of “game mechanics” into commercial web sites. Rajat’s lecture is entitled “Driving User Behavior with Game Dynamics and Behavioral Economics” and was delivered February 19, 2010. The video is an hour long. But, I recommend it for clinical program designers in ACOs, physician organizations and health plans as a clear, accessible summary of the behavioral economics evidence base and the current real-world commercial applications of that body of knowledge.

Read More

“Can I get a Fast Pass?”: Learning from Disney about health care quality

While returning from a vacation to Walt Disney World yesterday, my family and I were in a disorganized crowd at the Orlando International Airport boarding the tram that shuttles travelers between terminal buildings.  A man yelled “Can I get a fast pass?”  Everyone laughed.

Anyone who has been to a Disney park knows that a “fast pass” is a little ticket that can be obtained at the entrance to popular rides that gives an assigned time after which you can return to the ride and avoid having to stand in a long line.  They are lifesavers, particularly to families with kids on hot days.  Going through the “fast pass return” gate to a ride always makes you feel like you were lucky to win a prize or that you are a special guest.  It is a process design that anticipates a customer problem and prepares in advance to resolve the problem, thereby delighting the customer.  

Over the course of our vacation, my wife and I noted hundreds of little ways that Disney was “thinking.”  Signage that always seemed to point to where we wanted to go.   Rubber tubing inserted in the train track slots at cross walks to keep buggy wheels from getting stuck.  Masking tape applied to the pavement just before the evening parade and fireworks to mark places OK for standing and removed just before the end of the parade.   The list goes on and on.

Once aboard the Orlando Airport shuttle, after making sure my daughter had a good grip on the pole (the one with the germs of thousand of travelers!), my thoughts wandered to the business trip I had taken the week before, when I missed my return flight from Chicago because I waited for more than an hour in the security line at O’Hare Airport.  While waiting in that line, I had plenty of time to study the processes at the airport to try to figure out why they are so un-delightful.  At O’Hare, at the security entrance to terminals E & F, travelers must choose among four roped-off waiting lines.  Fifteen minutes into the line, I noticed that my line and the line next to me were moving far more slowly than the other two lines.  After another half hour of waiting and inching closer to the inspection area, I had the diagnosis.  Out of 5 x-ray baggage screening machines, Homeland Security had staffed only three of the machines during this Saturday morning of Spring Break week.  (Perhaps they lacked “intel” about school schedules in the homeland?)  Two of the waiting lines fed into their own screening machine.  My waiting line merged with the other slow line to share a single screening machine.  I finally made it up to the ID check lady.  After she checked my ID against the home-printed boarding pass (some other time we can try to figure out how that is adding security value), I politely asked if I could make a suggestion and explained the problem with the lanes and suggested that they close off one of the two slow lanes.  She looked at me with a quizzical “and you think I care?” look.  After another 10 minutes in the line, I noted that there were about the same number of staff standing and watching as actively interacting with travelers or their bags.    I overheard one of these watchers say to another “What a mess.  It’s enough to make you want to help out.”  The other watcher smiled at the joke.  After I finally got through and finished hopping on one one foot to tie my shoes, putting my belt back on, and repacking my computer bag, I realized that I missed my flight.

So, why has Disney been able to outperform O’Hare, Orlando International, and Homeland Security in customer experience?

I believe that the answer is culture.   I’m a person that usually focuses on numbers and science, and the concept of “culture” sometimes seems too vague for my taste.  But, it is undeniable that some organizations achieve a culture that emphasizes quality and customer satisfaction, while other organizations do not.  In my experience, small organizations tend to have an easier time achieving such culture, since each member of the organization is close to each other and to the customer.  But Disney is a huge corporation.  I suspect that Disney has built up their culture over many decades, attracting employees that find Disney’s culture attractive, selecting employees that already have personality and character traits compatible with their culture, training new hires in their philosophy and the processes and techniques they use to pursue that philosophy, and creating an experience for those employees that their ideas count and their effort is appreciated and rewarded.

On Sunday morning, I overheard two Disney employees talking to each other, greeting each other with cheer.  One said “I’ve got to hurry over to the mono-rail, which is backing up for some reason.”  The  Disney employee was in a hurry, but took a moment to express cheer and respect for a fellow employee.  The employee had a sense of urgency to solve a problem with customer experience.  But the last three words caught my attention.  “For some reason.”  The employee had an inherent interest in the causes of the problem, not just in the problem itself.  I was left with the impression that this employee was going to help out with the current situation, while simultaneously trying to figure out and subsequently address the reason that it happened this Sunday morning, so it does not happen again on the next Sunday morning.

So what can we learn from Disney and airports about the quality of care in Accountable Care Organizations (ACO)?

For ACOs to be successful, they must be competitive — not only to purchasers, but also to patients.  They have to create processes that anticipate the needs of the patient, and solve problems before they happen.  They need to be able to learn what works and does not work.  They need to be able to overcome the professional cultures that sometimes emphasize technical competency and physiologic outcomes to the exclusion of humanistic competency, and the satisfaction and delight of patients and their families.

The idea that we in the health care field can learn from those in the hospitality field is certainly not new.  At health care professional meetings, such as meetings of the American Medical Group Association (AMGA), I’ve been to numerous presentations over the years by executives from Disney, Marriott, Ritz-Carlton and other hospitality companies about how they select and train people, how they have huddles at the beginning of every shift, how they empower people to solve customers’ problems, and how they reward people that delight customers.  And many hospitals and physician organizations and some health plans have taken this advice to heart and made significant progress to nurture a patient-satisfying and quality culture.  But, I think everyone would agree that we still have a lot to learn and a long way to go.

 

Read More

Atul Gawande’s articles in the New Yorker most relevant to ACOs


Atul Gawande - image link from New Yorker

Atul Gawande is a general surgeon at Brigham and Women’s Hospital in Boston, and is an amazingly compelling writer about health care issues.  He has written a series of articles in the New Yorker, blending stories of individual patients and their care providers with a larger scientific and health policy context.  Some of the most relevant of these articles for Accountable Care Organizations (ACOs) include:

December, 2007: “The Checklist:  If something so simple can transform intensive care, what else can it do?.” In this article, Gawande explains the work of Peter Pronovost, MD, a critical care specialists at Johns Hopkins who used simple checklists and an associated process to empower nurses and create a quality culture in hospital ICUs to dramatically reduce complications from central lines and ventilators, first at his own hospital, then throughout Michigan in the Michigan Health and Hospital Association’s Keystone Project (with funding from Blue Cross Blue Shield of Michigan).  The article laments about the resistance to national implementation of the checklist approach (a resistance that has at least partially been overcome in the three years since this article was published).  Pronovost argues that the science of health care delivery should be emphasized and funded as much as the science of disease biology and therapeutics.

August, 2010: “Letting Go: What should medicine do when it can’t save your life?.” In this article, Gawande describes the cultural and psychological barriers that make it difficult for patients, family members, and doctors to prepare for good end-of-life decision-making.  He reports the success of hospice programs, end-of-life telephonic care management programs, and programs to encourage advanced directives.

January, 2011: The Hot Spotters: Can we lower medical costs by giving the neediest patients better care?.” In this article, Gawande enthusiastically describes the work of Jeffrey Brenner, MD and his “Camden Coalition”  in Camden, NJ, and Rushika Fernandopulle in Atlantic City to develop intensive patient-centered care for high risk patients, and the analytics of Verisk Health focused on predictive modeling for high risk patients.  The article includes some encouraging pre-post study results from these programs, but acknowledges the risk that results could be biased due to the “regression to the mean” effect — when a cohort of patients specifically selected based on recent high health care utilization is expected to have lower utilization in subsequent time period without any intervention.   The article also points out the resistance to change in health care, evidenced by Brenner’s inability to get state legislative approval to bring his program to Medicaid patients.

Additional biographical information about Gawande, as well as a complete list of his articles in the New Yorker, are available here.

 

Read More

Why “case load” is not a good metric for case management productivity or intensity

As shown in the following graphic recently published by the Healthcare Intelligence Network, it has become common practice to use “case load” as a metric for the productivity of nurses in case management programs or as a measure of the intensity of a case management intervention.  More broadly, case load has been used for these purposes across many wellness and care management interventions, including chronic disease management, high risk care coordination, wellness coaching, care transition coordination, and other types of programs involving nurses, physician assistants, nutritionists, social workers, and physicians.

HIN Study Results - Case Manager Monthly Case Load

If everyone is doing it, it must be right.  Right?

In my opinion, case load is almost always the wrong measure to use to assess productivity or intervention intensity.   The graphic above indicates that 38% of case management programs have a case load between 50 and 99. Let’s say that one of those case management programs told you that they had a case load of 52.   That literally means that the average nurse in that case management program has a list of 52 patients (on average) that are somewhere between their enrollment date and their discharge date for a the program.  That could mean 52 new patients every day for an intervention consisting of a single 5 minute telephone call.  Or, it could mean 1 new patient each week for a year-long intervention involving an extensive up-front assessment and twice-monthly hour-long coaching calls.   Or, it could mean 1 new patient each week for a year-long intervention consisting of a 20 minute enrollment call and a 20 minute check-up call one year later.  In that context, if I tell you that my case management program has an average case load of 52, how much do you really know?

Some case management programs try to fix this problem by creating an “acuity-adjusted” or “case-mix-adjusted” measure of case load.  In such a scheme, easier cases are counted as a fraction of a case, and more demanding cases are counted as more than one case.  Such an approach requires some type of a point system to rate the difficulty of the case.  Some case management vendors charge a fee for each “case day,” with higher fees associated with “complex” cases, and lower fees for “non-complex” cases.  You can imagine how the financial incentives associated with the case definition can affect the assessment, and how reluctant the case management vendor would be to discharge cases, cutting off the most profitable days in the tail-end of a case when the work is light.

But these “adjustments” are missing the fundamental point that case load is a “stock” measure, while the thing you are trying to measure is really a “flow.”  A stock is something that you count at a point in time, like the balance of your checking account or the number of gallons of gas left in your tank.  A flow is something that you count over a period of time, like your monthly expenses or the number of gallons per hour flowing through a hose to fill up your swimming pool.  The work output delivered by a case management nurse is something that can only be understood over a period of time.

In my experience, the best approach to measuring case management productivity and intensity is to follow “cohorts” of patients who became engaged in the intervention during a particular period of time to track how many minutes of work are done for each period of time relative to the engagement month, as shown below.

 

This graph shows that, during the calendar month in which a patient became engaged, an average of 50 minutes of nursing effort was required.   For individual patients, the nursing effort could, of course, be higher or lower.  Some patients require more time to assess.  Some patients may have become engaged at the end of the calendar month.  Some patient may drop out of the program after the first encounter.  But, on the basis of a cohort of engaged patients, the average was 50 minutes.  In subsequent months, the average minutes of nursing effort typically decreases, after initial enrollment, assessment and care planning effort dies down.  Over time, depending on the intended design of the intervention, the effort falls off as most patients have either been discharged by the nurse, have chosen to drop out, or have been lost to follow up.  This graph is a description of the intensity of the case management intervention.  In this example graph, the cumulative nursing time over the first 16 months relative to the month of engagement is 234 minutes, which could serve as a summary measure of intervention intensity.  If nursing minutes data are not available, some “work-driver” statistics could be substituted, such as the number of face-to-face or telephonic encounters of various types.  These could be converted to minutes based on measured or estimated average minutes of effort for each of the statistical units.

This type of graph can be created for different engagement months and compared over time to determine if the intervention intensity is changing over time.  It can be created for one nurse and compared to all other nurses to determine if the intervention process delivered by that nurse appears to be similar or different than other nurses.

Then, to assess productivity, the total number of nursing minutes can be measured, and compared to the expected number of minutes for cases of the same type and the same mix of “month numbers” in the case intensity timeline.

The implications of this for Accountable Care Organizations is that information systems to support wellness and care management should be designed to explicitly capture the engagement date and the intended type of wellness or care management intervention in which the patient is becoming engaged. Such systems should also capture the discharge dates, statistics about the quantity and types of wellness or care management services delivered to engaged patients, and preferably the number of minutes of effort required for each of those services.

Some people may object to this approach because it implies that the patient is being subjected to a “cook-book” intervention that fails to take into account the uniqueness of each patient.  And, they argue that they cannot specify up front the type of wellness or care management program in which the patient is becoming engaged, because they have not yet completed an assessment.  But, I would argue that nothing in this approach assumes that each patient is treated the same.  This approach merely looks at a population of patients receiving a particular intended type of intervention program.  Although each patient may follow a different path and receive a different mix and quantity of case management services, the overall mix and timing of services for a population of patients can be assessed.  If this mix and timing is not generally constant over time, then you are dealing with a process that is not in control, a problem that must be solved before meaningful program evaluation of any type can be done.

 

Read More