Skip to content

I’m Not Really Down with Most Top Down Evaluations

Lunches at Berkeley are never boring. This week I had an engaging discussion with a colleague from out of town who asked me what I thought about statistical top down approaches to evaluating energy efficiency programs. In my excitement, I almost forgot about my local organic Persian chicken skewer.

For the uninitiated, California’s Investor Owned Utilities (the people who keep your lights on…if you live around here) spend ratepayer money to improve the energy efficiency of their customers’ homes and businesses. Think rebates for more efficient fridges, air conditioners, lighting, and furnaces. The more efficient customers are, the less energy gets consumed, which is especially valuable at peak times. For doing this, the utilities get rewarded financially for energy savings produced from the programs. The million kWh question of course is how much do these programs actually save? I’m glad you asked.

fish

Multiple Ways of Looking at Energy Efficiency

The traditional way is to take the difference in energy consumption between the old and new gadget. If you’re really fancy you downward adjust the estimated savings by a few percent to account for free riders like me, who send in energy efficiency rebates for things they would have bought anyway. These so-called “bottom up” analyses have been shown to provide decent estimates of what is possible in terms of savings, but completely ignore human behavior. Hence, when tested for accuracy, bottom up estimates have over and over again been shown to overestimate savings. There are many factors that contribute to this bias, but the most commonly cited one is the rebound effect.

Another way of course, as we have so often advocated, is using methods that have their origin in medical research. For a specific program, say a subsidy for a more efficient boiler, you give a random subset of your customers access to the subsidy and compare the energy consumption of people who had access to the program to that of the customers who didn’t. These methods have revolutionized (in a good way), the precision and validity of program evaluations. My colleagues at the Energy Institute are at the forefront of this literature and are currently teaching me (very patiently) how you do these. I am always a bit slow to the party. These methods are not easy to implement and require close collaborations with the utilities and significant upfront planning. But that is a small price to pay for high quality estimates that allow us to make the right decision as to whether to implement programs that cost ratepayers hundreds of millions of dollars.

A third option, which has given rise to a number of evaluation exercises, is called top down measurement. The idea here is to look at average energy consumption by households in a region (say census block group) for many such regions over a long time period and use statistical models to explain what share of changes in energy consumption over time can be explained by spending on energy efficiency programs. The proponents of these methods argue that this is an inexpensive way to do evaluation, the data requirements are small, the estimates can be updated frequently, and – maybe most importantly – that these estimates include some spillover effects (if your neighbor buys a unicorn powered fridge because you did). Sounds appealing.

The big problem with the majority of these studies is that they do not worry enough about what drives differences in the spending on these programs across households. I am sure you could come up with a better laundry list, but here is mine:

  • Differences in environmental attitudes (greenness)
  • Income
  • Targeting by the utilities of specific areas
  • Energy Prices
  • Weather
  • ….

What these aggregate methods do not allow you to do is to separate the effects of my laundry list from those of the program. Or in economics speak, they are fundamentally unidentified. No matter how fancy your computer program is, you will never be able to estimate the true effect. It’s in some sense like using an X-ray machine as a lie detector. In practice you are possibly attributing the effect of weather, for example, to programs. Cold winters make me want to be more energy efficient. It’s the winter, not the rebate that made me buy a more efficient furnace. Further, the statistical estimates are just that. They provide a point estimate (best guess) with an uncertainty band around it. And that uncertainty band, as Meredith Fowlie, Carl Blumstein and I showed, can be big enough to drive a double-wide trailer through.

Time to Stop Using 1950s Regression Models

So currently there is a lot of chatter about spending more ratepayer dollars on these studies and I frankly think that majority will not be worth the paper they are printed on. To be clear, this a problem with the method, not the people implementing them. What we have seen so far, is that the estimates often are significantly bigger than bottom up estimates, which is sometimes attributed to spillover effects, but I just don’t buy it. I think we should stop blindly applying 1950s style regression models in this context.

I am also not advocating that everything has to be done by RCT. There are recent papers using observational data in non-experimental settings that try to estimate the impacts of programs on consumption. Matt Kotchen and Grant Jacobsen’s work on building codes in Gainesville, Florida is a great example. They do a very careful comparison of energy consumption by structures built pre  – and post – building code and find significant and credible effects. Lucas Davis has a number of papers in Mexico that use regression techniques to back out the efficacy of rebate programs on adoption and consumption. Judd Boomhower has a nice paper on spillover effects. They all employ 21st century methods, which allow you to make causal statements about program efficiency. These can be much cheaper to do and produce credible numbers. Let’s do more of that and work closely with utilities on implementing RCTs. It’s been a great learning experience for me and an investment worthwhile!

Maximilian Auffhammer View All

Maximilian Auffhammer is the George Pardee Professor of International Sustainable Development at the University of California Berkeley. His fields of expertise are environmental and energy economics, with a specific focus on the impacts and regulation of climate change and air pollution.

15 thoughts on “I’m Not Really Down with Most Top Down Evaluations Leave a comment

  1. I would be interested in citations of applying RCT (Randomized Clinical Trials, I assume) to energy efficiency. I have searched in vain for many years for any so-called EMV (Estimation, Measurement and Validation) Reports that do anything other than make assumptions about the differences between efficient and less efficient devices and the population characteristics of the intended programs. It should be easy for utilities to compare matched samples of customers who participated and did not participate in their energy efficiency programs but they – or their watchdogs – don’t. I wonder why? Could it be another case of foxes and chickens?

  2. Max, Your article doesn’t talk much about the meter-level version of top-down, where actual consumption at the meter is compared to a counter-factual in much the same way. This is an approach promoted by folks like Matt Golden at Effieciency.org, so-called “metered energy efficiency”. What are you thoughts on this approach, do some of the same critiques not still prevail?

    • You must be psychic. I just had a discussion about this with or executive director in the hallway. New sources of data “bigger data” will allow us to observe things we did not observe before. I have a few papers which make use of this. In program evaluation, bigger data does not substitute for good research design. It enables better research design and hence more reliable inference. I would be excited to hear more about what people do in practice here.

  3. Max,
    Naïve question: Why does my utility pay me to save energy?
    Naïve economist’s answer: Suppose that the supply curve of generation is upward sloping (cheapest sources are used first). If new demand is met by new generation, then the utility will get a rate increase. Inasmuch as the PUC is enjoined to promote the public interest, it should incentivize least cost expansion of capacity, including the virtual capacity created by energy-saving technology. So the PUC should add the virtual and actual supply curves and give rate increases according to the aggregate least (social) cost supply curve, regardless of how the utility actually meets its demand.
    Since the PUC promotes a myriad of political objectives instead of the public interest, we need to know the purpose of estimating bits and pieces of the virtual supply curve, regardless of whether said estimates are made by regressionistas or randomistas. How will the information lead to better policy?
    Dark interpretation of the Larson cartoon: The multitude of interest groups gobbles everything, including economics.
    Jim
    P.S. If people were fully rational, efficiency pricing would effect the optimal virtual supply. Since neither is the case, incentives for calling forth virtual supply are very difficult to design. For example, you need to know how specific forms of bounded rationality cause some energy saving techniques to be under-adopted more than others.

    • They pay you with your money is the short answer. I am planning a post titled “what would max do” on how I would tweak the system to better align with economic incentives. Hang on to your hat Jim! A few weeks and there will be an entertaining post addressing your question and building on your insights here.

      • That sounds great, Max. Btw, sorry to post the same comment twice. The first one didn’t show up for a couple of days. I thought I’d pushed the wrong button or something so revised and resubmitted the above. Thanks for answering, and I’ll be looking forward to your follow up.

  4. Frankly, I think that the rapidly expanding of the Internet of Things will put many modelers out of business, provide real time consumption info and deliver even greater energy efficiencies. The IOT will also make the smart into an Einstein grid.

    However, we need to stop the silo-think and evolve into thinking about the bigger issues of combining energy, environment, huge supplies of wireless broadband, big data, big technology and non-partisan, light handed policy making on state and federal levels. After all, the government regulatory agencies are 5 years behind the industries they regulator.

    Historically, micro-managing of markets and issues have never really worked and the future has never been predictable, with the last 15 years being almost unimaginable, both in how fast things it have changed in some ways and how wrong the forecasts have been in other ways.

  5. Though expensive to use Big Energy data.(SMART) meters provide incredible details that can be manipulated to produce estimates with less uncertainty surrounding them. One may consider investing rate payers millions providing rebates on these detail taking meters in more houses. The same big data may reveal to a larger extent what drives differences in the spending on these programs across households and reduce the effects bundled into the spill over effect. It may also be used to reveal the latent reasons behind “keeping up with neighbors” buyers as well as the economic reason hiding behind “severe winter” purchase. Investment into big data collection, though expensive, can give life to the aging 1950 regressions.

  6. Max,
    Naïve question: Why does my utility pay me to save energy?
    Naïve economist’s answer: Suppose that the supply curve of generation is upward sloping (cheapest sources are used first). If new demand is met by new generation, then the utility will get a rate increase. Inasmuch as the PUC is enjoined to promote the public interest, it should incentivize least cost expansion of capacity, including the virtual capacity created by energy-saving technology. So the PUC should add the virtual and actual supply curves and grant rate increases according to the aggregate least (social) cost supply curve, regardless of how the utility actually meets its demand.
    But is there any reason to believe that this is what the PUC is shooting for? And if it’s not, what is the point of estimating bits and pieces of the virtual supply curve, regardless of whether said estimates are made by regressionistas or randomistas? Could economics be the first fish in the Larson cartoon?
    Jim

  7. Since the 1980s I have been skeptical about the efficiency and effectiveness of energy efficiency programs after reviewing many simplistic analyses that overlook the rebound effect. Most also use low discount rates to calculate the present value of future energy savings on the tacit assumption that the future energy savings are known with certainty along with fuel prices, etc.

    Jim Sweeney at the Stanford Precourt Institute has a new book out on energy efficiency. I haven’t yet received my copy but it will be interesting to see if he gets it right. Stay tuned,

  8. To meet the demands of the changing grid and to best align with a customer focused path toward energy efficiency/demand response success, the way programs are implemented and evaluated needs to change. All of the variables you mention are valid and not accommodated in current evaluation, but customer access to information about usage, to time of use rates, and to meaningful assessment of cost of energy efficiency investment versus return over time also makes a huge difference. There are two types of green, and the second type – costs and savings to customers is a huge component not fully tapped.

  9. Lovely post and fully agree. The key is NOT to throw out all methods, but choose the best that you can, under the circumstances, and then explain why it’s the better choice. Ignorance may be bliss, but it’s also methodologically suspect.