What do pro cycling, education reform, and vehicle emissions standards all have in common?
Cheating scandals is the flip answer. But a more fundamental connection is summed up by “Goodhart’s law” which predicts that, once a measure becomes a proxy for the purpose of enforcing a policy, it will cease to provide a good measure. Goodhart (an economist) had monetary policy in mind. But look around and you’ll find lots of other examples.
Let’s start with the greatest sport of all time: pro cycling. Doping has dogged this sport for decades. In the late 1990’s, regulators tried to crack down by requiring competitors to stay within acceptable limits defined in terms of things that proxy for doping (such as red blood cell counts). Once rules were in place, it appeared as though cyclists were cleaning up and staying within (err.. suspiciously close to) these limits. But we have since learned of the crazy things pro cyclists have been doing to dope up without testing positive.
Public education reform offers another unfortunate example. No Child Left Behind relies on standardized test scores to hold public schools accountable for student achievement. Initially, the news was good- many schools were rallying to hit test-based targets. But many of the changes made to meet these targets involve “teaching to the test” versus fundamental improvements in learning. And the pressure to improve test scores lead to cheating by some teachers and administrators.
Of course, a similar saga is unfolding in the world of vehicle testing. Automakers must comply with EPA tail pipe emissions standards and federal fuel economy standards if they want to sell new cars in the United States. To assess compliance, regulators rely heavily on measurements of vehicle performance collected in controlled testing environments. These tests are systematic and predictable. Predictably, automakers have learned to tune their vehicles to pass the test…. or cheat.
With his trademark Auffhammerian wit, Max recently blogged about the VW cheating scandal that broke in September. The story reads with all the opprobrium of a Tour de France doping expose. VW, while promoting its diesel vehicles as clean and green, was behind the scenes installing “defeat devices” that could detect when these cars were being tested and reduce emissions accordingly.
These vehicles performed beautifully during the laboratory testing. But when real drivers get behind the wheel, emissions are up to 40 times the limit. VW has since admitted to installing devices in approximately 11 million diesel cars worldwide. As of last week, the scope of the scandal was still widening.
Vehicle emissions testing appears broken
11 million diesel cars and counting is a big deal. But the VW scandal is just one part of a larger problem. This is not the first time a major automaker has been caught cheating on these tests. Less scandalous -but also troubling- are the superficial adjustments that manufacturers can use to pass these tests. Because testing protocols are well understood by automakers well in advance, manufacturers can design cars that perform better in test-mode, as compared to real world driving conditions.
As emissions and fuel economy standards get more stringent, the gap between test results and reality appears to be widening. With regards to fuel efficiency, research by David Greene and colleagues suggests that the shortfall between test cycle estimates (used to measure compliance with regulations) and in-use estimates has been increasing since 2005. This Mind the Gap report estimates that the gap between CO2 emissions rates measured in European vehicle testing and on road performance has increased from 8 percent in 2001 to 40 percent in 2014.
So where does this leave us? Depressed if you care about air quality and fuel economy. Some of the gains we think we’ve made in improving the environmental performance of our vehicles are illusory. Confused and misinformed if you are in the market for a new car. Consumers cannot hope to make informed choices among vehicles with different emissions/environmental attributes when posted performance measures provide distorted/noisy signals.
Broken but fixable?
From my blogger’s armchair, vehicle emissions testing looks broken but fixable. Particularly when compared to other manifestations of the Goodhart problem.
First, think about the outcomes we are trying to target and improve upon. In public education, these outcomes are hard to define (let alone quantify): improvements in critical thinking, deep understanding, moral character. No chance of measuring these directly with a bubble sheet and a No. 2 pencil, so we rely on crude proxies. In contrast, the environmental performance of new cars driving around on the roads is directly measurable (in principle). To overcome Goodhart’s law, why not focus more directly on what we are targeting?
Relatedly, think about the incentives to cheat. In the case of pro cycling, the regulator’s problem is greatly complicated by the fact that she is inevitably stuck monitoring (read drawing the blood from) athletes who have a strong incentive to cheat and a strong incentive to evade detection. Cynically, the same can be said of vehicle manufacturers. But we can (in principle) put more distance between the manufacturers with an incentive to cheat and the vehicles we seek to test.
Finally, new cars and measurement technologies are getting really smart. Max’s car can park itself. My neighbor’s car gives her a friendly warning when there is pedestrian or cyclist nearby. It seems relatively elementary, then, to ask our cars to log emissions and fuel efficiency information. It is estimated that over 60 percent of new cars will have “connected capabilities” by 2017, allowing them to exchange data digitally with other cars or infrastructure. Couldn’t this interconnectedness be leveraged to collect rich data on real-world performance?
If you recoil at the thought of inviting the EPA into your personal tailpipe space, or if you are concerned that smart car technologies could also be designed to outsmart emissions monitoring, there are other direct-measurement alternatives. Several states (including California) are using roadside devices to scan exhaust emissions (and license plates) from thousands of vehicles per day and then match these data with vehicle make, model, and year via vehicle registration data. These data can be used to estimate model-specific measures of the average emissions performance of new cars on the road. Why not use these measures in lieu of – or in conjunction with- test data to determine compliance with new vehicle standards?
I get it that there are plenty of complications that crop up when we start to think seriously about using data collected under noisy real-world conditions to monitor compliance directly (rather than relying on test-generated proxies). But recent events have highlighted the troubling limitations of current testing approaches (perfectly predictable a la Goodhart). The EPA has recently indicated that it plans to expand random, on-road emissions testing (albeit under controlled protocols) to provide a reality check for lab tests. If recent developments ultimately move vehicle emissions testing closer to where the rubber hits the road, there may be a silver/green lining in this cheating scandal cloud.