Lunches at Berkeley are never boring. This week I had an engaging discussion with a colleague from out of town who asked me what I thought about statistical top down approaches to evaluating energy efficiency programs. In my excitement, I almost forgot about my local organic Persian chicken skewer.
For the uninitiated, California’s Investor Owned Utilities (the people who keep your lights on…if you live around here) spend ratepayer money to improve the energy efficiency of their customers’ homes and businesses. Think rebates for more efficient fridges, air conditioners, lighting, and furnaces. The more efficient customers are, the less energy gets consumed, which is especially valuable at peak times. For doing this, the utilities get rewarded financially for energy savings produced from the programs. The million kWh question of course is how much do these programs actually save? I’m glad you asked.
Multiple Ways of Looking at Energy Efficiency
The traditional way is to take the difference in energy consumption between the old and new gadget. If you’re really fancy you downward adjust the estimated savings by a few percent to account for free riders like me, who send in energy efficiency rebates for things they would have bought anyway. These so-called “bottom up” analyses have been shown to provide decent estimates of what is possible in terms of savings, but completely ignore human behavior. Hence, when tested for accuracy, bottom up estimates have over and over again been shown to overestimate savings. There are many factors that contribute to this bias, but the most commonly cited one is the rebound effect.
Another way of course, as we have so often advocated, is using methods that have their origin in medical research. For a specific program, say a subsidy for a more efficient boiler, you give a random subset of your customers access to the subsidy and compare the energy consumption of people who had access to the program to that of the customers who didn’t. These methods have revolutionized (in a good way), the precision and validity of program evaluations. My colleagues at the Energy Institute are at the forefront of this literature and are currently teaching me (very patiently) how you do these. I am always a bit slow to the party. These methods are not easy to implement and require close collaborations with the utilities and significant upfront planning. But that is a small price to pay for high quality estimates that allow us to make the right decision as to whether to implement programs that cost ratepayers hundreds of millions of dollars.
A third option, which has given rise to a number of evaluation exercises, is called top down measurement. The idea here is to look at average energy consumption by households in a region (say census block group) for many such regions over a long time period and use statistical models to explain what share of changes in energy consumption over time can be explained by spending on energy efficiency programs. The proponents of these methods argue that this is an inexpensive way to do evaluation, the data requirements are small, the estimates can be updated frequently, and – maybe most importantly – that these estimates include some spillover effects (if your neighbor buys a unicorn powered fridge because you did). Sounds appealing.
The big problem with the majority of these studies is that they do not worry enough about what drives differences in the spending on these programs across households. I am sure you could come up with a better laundry list, but here is mine:
- Differences in environmental attitudes (greenness)
- Targeting by the utilities of specific areas
- Energy Prices
What these aggregate methods do not allow you to do is to separate the effects of my laundry list from those of the program. Or in economics speak, they are fundamentally unidentified. No matter how fancy your computer program is, you will never be able to estimate the true effect. It’s in some sense like using an X-ray machine as a lie detector. In practice you are possibly attributing the effect of weather, for example, to programs. Cold winters make me want to be more energy efficient. It’s the winter, not the rebate that made me buy a more efficient furnace. Further, the statistical estimates are just that. They provide a point estimate (best guess) with an uncertainty band around it. And that uncertainty band, as Meredith Fowlie, Carl Blumstein and I showed, can be big enough to drive a double-wide trailer through.
Time to Stop Using 1950s Regression Models
So currently there is a lot of chatter about spending more ratepayer dollars on these studies and I frankly think that majority will not be worth the paper they are printed on. To be clear, this a problem with the method, not the people implementing them. What we have seen so far, is that the estimates often are significantly bigger than bottom up estimates, which is sometimes attributed to spillover effects, but I just don’t buy it. I think we should stop blindly applying 1950s style regression models in this context.
I am also not advocating that everything has to be done by RCT. There are recent papers using observational data in non-experimental settings that try to estimate the impacts of programs on consumption. Matt Kotchen and Grant Jacobsen’s work on building codes in Gainesville, Florida is a great example. They do a very careful comparison of energy consumption by structures built pre – and post – building code and find significant and credible effects. Lucas Davis has a number of papers in Mexico that use regression techniques to back out the efficacy of rebate programs on adoption and consumption. Judd Boomhower has a nice paper on spillover effects. They all employ 21st century methods, which allow you to make causal statements about program efficiency. These can be much cheaper to do and produce credible numbers. Let’s do more of that and work closely with utilities on implementing RCTs. It’s been a great learning experience for me and an investment worthwhile!