The true cost of AlphaGo Zero
Motivation:
Rich Sutton’s Bitter Lesson for AI essentially argues that we should focus on metamethods that scale well with compute instead of trying to understand the structure and function of biological minds. The latter according to him are endlessly complex and therefore unlikely to scale. Furthermore, Rich Sutton(who works at Deep Mind) considers Deep Mind’s work on AlphaGo an exemplary model of AI research.
After reading Shimon Whiteson’s detailed rebuttal as well as David Sussilo’s reflection on loss functions I think it’s time to reevaluate the scientific value of AlphaGo Zero research and the carbon footprint of a winatanycost research culture. In particular, I’d like to address the following questions:
 What kinds of real problems can be solved with AlphaGo Zero algorithms?
 What is the true cost of AlphaGo Zero? (i.e. the carbon footprint)
 Does Google’s carbon offsetting scheme accomplish more than virtue signalling?
 Should AI researchers be noting their carbon footprint in their publications?
 Finally, might energetic constraints present an opportunity for better AI research?
I haven’t seen these questions addressed in one manuscript but I believe they are related and timely, hence this article. Now, I’d like to add that we can’t seriously entertain notions of safe AI without carefully developing environmentally friendly AI especially when the only thing that we know for certain is that we will do exponentially more FLOPs(i.e. computations) in the future.
Note: This post builds on the analyses of Andrej Karpathy and Dan Huang.
AlphaGo Zero’s contribution to humanity:
Before measuring the carbon footprint of AlphaGo Zero it’s a good idea to remind ourselves of the types of environments this ‘metamethod’ can handle:
 The environments must have deterministic dynamics which simplifies planning considerably.
 The environment must be fullyobservable which rules out large and complex environments.
 A perfect simulator must be available to the agent which rules out any biologicallyplausible environments.
 Evaluation is simple and objective: win/lose. For biological organisms all rewards are internal and subjective.
 Static statespaces and actionspaces: so we can’t generalise…not even to Go where .
These constraints effectively rule out the application of AlphaGo Zero’s algorithms to any practical problem in robotics because perfect simulators are nonexistent. But, it may be used to solve any twoperson board game which is a historic achievement and a great publicity stunt for Google assuming that the carbon footprint of this project is reasonable. This consideration is doubly important when you take into account the influence of Deep Mind on the modern AI research culture.
However, before estimating the metric tonnes of CO2 blasted into the atmosphere by Deep Mind let’s consider a related question. How much would it cost an entity outside of Google to replicate this type of research?
The cost of AlphaGo Zero in US dollars:
In ‘Mastering the game of Go without human knowledge’ [2] they had both a three day experiment as well as a forty day experiment. Let’s start with the three day experiment.
The three day experiment:
 Over 72 hours, ~ 5 million games were played.
 Each move of selfplay used ~ 0.4 seconds of computer time and each selfplay machine consisted of 4 TPUs.
 How many selfplay machines were used?

If the average game has 200 moves we have:
\begin{equation} \frac{72\cdot 60 \cdot 60 \cdot N_{SP}}{200 \cdot 5 \cdot 10^6} \approx 0.4 \implies N_{SP} \approx 1500 \end{equation}

Given that each selfplay machine contained 4 TPUs we have:
\begin{equation} N_{TPU} = 4 \cdot N_{SP} \approx 6000 \end{equation}
If we use the Google’s TPU pricing as of March 2019, the cost for an organisation outside of Google to replicate this experiment is therefore:
\begin{equation} \text{Cost} > 6000 \cdot 72 \cdot 4.5 \approx 2 \cdot 10^6 \quad \text{US dollars} \end{equation}
The forty day experiment:
For the forty day experiment one thing that’s different is that the policy network has twice as many layers so, as Dan Huang pointed out in his article, it’s reasonable to infer that twice the amount of time was used per move. So ~ 0.8 seconds rather than ~ 0.4 seconds.
 Over 40 days, ~ 29 million games were played.
 Each move of selfplay used ~ 0.8 seconds of computer time where each selfplay machine consisted of 4 TPUs.
 How many selfplay machines were used?

If the average game has 200 moves we have:
\begin{equation} \frac{40 \cdot 24 \cdot 3600 \cdot N_{SP}}{200 \cdot 29 \cdot 10^6} \approx 0.8 \implies N_{SP} \approx 1300 \end{equation}

Given that each selfplay machine contained 4 TPUs we have:
\begin{equation} N_{TPU} = 4 \cdot N_{SP} \approx 5000 \end{equation}
If we use the Google’s TPU pricing as of March 2019, the cost for an organisation outside of Google to replicate this experiment is therefore:
\begin{equation} \text{Cost} > 5000 \cdot 960 \cdot 4.5 \approx 2 \cdot 10^7 \quad \text{US dollars} \end{equation}
It goes without saying that this is well outside the budget of any AI lab in academia.
The true cost of Google’s 40 day experiment:
This in my opinion is the more important calculation. While it’s not at all clear that AI research will ‘save the world’ in the long term, in the short term what is certain is that computeintensive AI experiments have a nontrivial carbon footprint. So I think it would be wise to use our energy budget carefully and, realistically, the only way to do this is to calculate the carbon footprint of any AI research project and place it on the front page of your research paper. Meanwhile, let’s proceed with the calculation.
The nature of this calculation involves first converting TPU hours into KiloWatt Hours(KWH) and then converting this value to metric tonnes of CO2:
 ~5000 TPUs were used for 960 hours.
 ~40 Watts per TPU according to [6].

This means that we have:
\begin{equation} \text{KWH} = 5000 \cdot 960 \cdot 40 \approx 1.9 \cdot 10^5 \end{equation}

This is approximately 23 American homes’ electricity for a year according to the EPA.
 In the USA, where Google Cloud TPUs are located, we have ~ ,5 kg of CO2/KWH so AlphaGo Zero was responsible for approximately 96 tonnes of CO2 into the atmosphere.
To appreciate the significance of 96 tonnes of CO2 over 40 days…this is approximately equivalent to 1000 hours of air travel and also approximately the carbon footprint of 23 American homes for a year. Relatively speaking, this is a large footprint for a board game ‘experiment’ that lasts 40 days.
Is this reasonable? At this point a Googler might start talking to me about Google’s carbon offsetting scheme.
Google’s carbon offsetting scheme:
I don’t have much time for this section because Google’s carbon offsetting scheme is basically a joke but let’s break it down anyway:

According to Google, the Google Cloud is supposedly 100% sustainable because Google purchases an equal amount of renewable energy for the total amount of energy used by their Cloud infrastructure.

If you check the charts of Urs Hölze, the Senior VP of technical infrastructure at Google, this means that they buy a lot of wind(~ 92%) and some solar(~ 8%).

Let’s suppose we can take these points at face value. Does this carbon offsetting scheme actually work out?
David J.C. Mackay, a giant of 20th century machine learning, would probably be rolling in his grave right now because he spent the last part of his life carefully assessing the potential contribution of wind and solar to humanity’s energy budget [7]. He was in fact Scientific Advisor to the Department of Energy and Climate Change and his essential contribution was to explain how the fundamental limits to wind and solar energy technologies weren’t technological; we are talking about hard physical limits. I will refer the reader to ‘Sustainable Energywithout the hot air’ by David J.C. Mackay which is freely available online rather than repeat his thorough calculations here.
Unfortunately, no combination of wind and solar energy can provide energy security for a country with the USA’s energy requirements. In the best case scenario, Google’s carbon offsetting scheme is thinly veiled virtue signalling. What then are the serious clean energy solutions?
Past the year 2050 it’s possible to make a strong case for nuclear fusion as being necessary for human civilisation to continue. Between now and the day we figure out how to engineer reliable nuclear fusion reactors we should use our energy budget wisely.
Boltzmann’s razor:
According to various sources the human brain uses ~20 Watts which is incredibly efficient compared to the 200 KiloWatts used by 5000 TPUs. In other words, AlphaGo Zero was ten thousand times less energy efficient than a human being for a comparable result. I don’t see how this is a strong argument for scalability at all.
The human brain isn’t an outlier. All biological organisms are energy efficient because they must first survive the second law of thermodynamics which is a minimum energy principle. Now, there are two ways organisms perform computations in an economical manner that I am aware of:

Morphological computation:
a. If you check the work of Tad McGeer [8] you will realise that it’s possible to build a walking robot without any electronics that simply exploits the laws of classical mechanics. It does computations by virtue of having a body. Some researchers might say that this is an instance of embodied cognition [12].
b. Romain Brette and his collaborators have been working on a project that involves a swimming neuron. This is an organism, the Paramecium, that has a single cell yet it’s capable of navigation, hunting, and procreation in very complex environments. How does the Paramecium do this? What is the reward function? Is it doing reinforcement learning?

The role of development:
a. If you consider any growing organism you will realise that its state space and action space are rapidly changing. This should make learning very hard. Yet, development is in some sense a form of curriculum learning and makes learning simpler.
b. I must add that during development the brain of the organism is rapidly changing. Shouldn’t this make learning impossible?
Morphospaces and developmental trajectories are fundamentally physical considerations. In some fundamental way organisms succeed in reorganizing physics locally. Termites in the desert construct mounds whose physical behavior is consistent with but not reducible to the physics of sand. Birds build nests whose physics isn’t reducible to its constituent parts. The resulting systems do computations in an economical manner by taking thermodynamics into account.
This is why energy efficiency is both a challenge and opportunity. It will force researchers to recognize the importance of understanding the biophysics of organisms at every scale where such biophysics contributes to survival. If I may distill this into a single principle I would call it Boltzmann’s razor:
Given two comparably effective intelligent systems focus on the research and development of those systems which consume less energy.
Naturally, the more economical system would be capable of accomplishing more tasks given the same amount of energy.
Discussion:
Of the AI researchers I have discussed the above issues with I noted a bimodal distribution. Roughly 30% agreed with me and roughly 70% pushed back really hard. Among the counterarguments of the second group I remember the following:
 If you force AI researchers to reduce their carbon footprint you will kill AI research.
 Why do you care about what Google does? It’s their own money and they can do whatever they want with it.
I think these are all terrible arguments.
References:
 D. Sutton. The Bitter Lesson. 2019.
 D. Silver et al. Mastering the game of Go without human knowledge. 2017.
 A. Karpathy. AlphaGo, in context. 2017.
 D. Huang. How much did AlphaGo Zero cost? 2018.
 The Twitter Thread of Shimon Whiteson: https://twitter.com/shimon8282/status/1106534178676506624
 This Tweet by David Sussillo: https://twitter.com/SussilloDavid/status/1106643708626137089
 Google Inc. InDatacenter Performance Analysis of a Tensor Processing Unit. 2017.
 D. McKay. Sustainable Energywithout the hot air. 2008.
 T. McGeer. Passive Dynamic Walking. 1990.
 L. Castrejon et al. Annotating Object Instances with a PolygonRNN. 2017.
 D. B. Chklovskii & C. F. Stevens. Wiring optimization in the brain. 2000.
 G. Montufar et al. A Theory of Cheap Control in Embodied Systems. 2014.