Generalizing from Energy Program Evaluations
The blogosphere has been lighting up with news and commentary on OPOWER’s recent announcement that they are pairing up with Facebook to let people share information about their home energy usage. OPOWER has also garnered a lot of attention from academics, in part because they used a series of randomized control trials (RCT) to show that sending households letters with information on how their energy usage compared to their neighbors led people to use less energy.
A paper presented at our recent POWER conference, based on the OPOWER studies, poses a conundrum for policy makers, though. Imagine you have a careful empirical study based on an RCT in the Northeast U.S. that finds providing information to consumers about their household electricity use relative to comparable households leads the average household to reduce electricity consumption by two percent. For energy policy makers in California, one question is: How useful is such a study to inferring how much the average household in California would reduce electricity consumption if they were provided similar information.
Generalizing results from one population to another is a common exercise in policy analysis, including energy policy analysis. An ex-post evaluation of the effect of a regulation in one geographic area (e.g., the Northeast) may be used to make a decision about implementing a similar regulation in another geographic area (e.g., California). Similarly, policy analysts may draw on program evaluations of the regulation of one pollutant (e.g., sulfur dioxide) in a certain geographic area to make predictions about the impact of regulating a different pollutant (e.g., greenhouse gases) in that same geographic area.
Careful econometric studies by empirical economists pay a lot of attention to whether the parameter estimates are good estimates of the true values. Being able to have confidence that a study’s parameter estimates accurately capture what really happened is referred to as internal validity. For instance, was it just random chance that people who received information about their household electricity use were more likely to reduce electricity consumption or did the information provided help convince them to do that? Well-known statistical tests, along with economic theory, can provide confidence along that dimension.
While economists and policy analysts pay a lot of attention to internal validity, less attention is often paid to external validity. External validity is the term that researchers give to the degree to which parameter estimates generated from an econometric study can be generalized to a different population of interest. For example, can parameter estimates based on a sample drawn from a population in the Northeast U.S. be generalized to a target population in California or to the population of the U.S. as a whole? One concern is that there may be specific characteristics of the population in the Northeast that prevent the study from being extrapolated to other populations of interest. In other words, is there something about the Northeast that makes it difficult — or impossible — to generalize results from the Northeast to California or to the U.S. as a whole? External validity means one can extrapolate from one population to another with confidence. Since energy program evaluations are often done with the intent to apply the results in a larger context, externally valid empirical research is valuable.
Recent work by Allcott and Mullainathan (2011) suggests that there may be a number of subtle — and potentially unobservable — factors that make generalization from energy program evaluations difficult. Allcott and Mullainathan look at fourteen virtually identical energy efficiency experiments using RCTs of the type described in the introduction. OPOWER, through partnerships with a number of electric utilities, provides information to consumers about their household electricity use relative to comparable households. Allcott and Mullainathan use these RCTs to see what happens to household electricity consumption.
Their striking result is that they find substantial unexplained variation in the cost savings across the fourteen cities, even after controlling for observables such as demographics, weather, and energy use. Many of the differences in cost savings across the fourteen cities are economically significant as well as statistically significant.
As Allcott and Mullainathan point out in their work, there are at least three dimensions along which the ability to generalize may fail: people, places and time. If the people of interest, the geographic area of interest, and/or the time period of interest are substantially different between the study’s population and the target population, caution should be exercised in extrapolating the results. Less obvious, but critically important, is that there may be sample selection concerns with how the study’s population was chosen at the outset. Allcott and Mullainathan find evidence of this as well.
These substantive differences emphasize the importance of thinking carefully before generalizing from a single policy evaluation. These substantive differences also set up a useful challenge for academics: Recognizing that policymakers need to make decisions, often with imperfect information, academics should think carefully in advance about the ability to generalize from a particular policy experiment before undertaking it.
One last word of caution: While it will be interesting to learn what the effects of the OPOWER social media challenge are, it is important to remember that participation in the challenge is voluntary, which raises the strong possibility that those people choosing to participate in the challenge behave very differently than those choosing not to participate. As a result, we will have to be careful about generalizing what we learn from it.