The coordination problem

During the summer of 2016 I had the opportunity to intern at Hanson Robotics in Hong Kong. It was a stimulating experience which made me think carefully about the challenges posed by human-level AI. In particular, it led me to draw interesting parallels with the present day challenge of coordinating the actions of 7 billion human-level AIs. No pun intended. 

At Hanson Robotics the core belief is that by giving robots a human form they would experience the human condition like we do and as a result our relationship with robots will be one of empathy and understanding. The androids would have similar goals to humans and this would reduce the likelihood of conflict. In some sense the future wouldn’t look too different from the present and so to understand this version of the future, I had to reflect upon modern day reality.

What does goal alignment look like in modern human societies? Have we solved the problem of coordinating the actions of 7 billion human-level AIs? These questions are especially pertinent at a time when DIY weapons of mass destruction are on the horizon. By this I mean bioweapons that can potentially allow a few people to eliminate an entire ethnic group, effectively committing genocide.

It’s interesting to think about our attempted solutions to these problems in the context of the human-level AIs we’re familiar with given that the task of coordinating the actions of beyond-human AIs will probably be much harder. In general if your solution can’t solve a simple variant, it’s unlikely to solve the more general problem.

Here are two attempts to answer the questions I raised:

  1. The Point:

    This is a platform launched by Andrew Mason in 2006 with the help of Eric Lefkosky in order to ‘kickstart’ social good. It experienced modest success before morphing into Groupon and then inspiring the founders of Kickstarter.I think this is an important partial solution to the coordination problem as it allows people to update their beliefs in real-time about the beliefs of other people on an issue they consider important. In some sense, it allows democratic institutions to be setup on the fly within small communities and I believe this deals with the ‘tyranny of the majority’ problem that often arises in a democracy.

  2. Democracy Earth:

    The notion of a decentralised, peer-to-peer democracy sounds very attractive and I think it’s a solution that works with platforms like ‘The Point’ in the sense that it serves small communities very well. However, I don’t see how a peer-to-peer democracy would be able to address a national security risk, environmental disaster, or the issue of maintaining cultural institutions. I imagine that the founders of this group are reasonable and don’t expect full decentralisation.

Now, the issue I have with both ‘The Point’ and ‘Democracy Earth’ is that while they both attempt to tackle the issue of coordinating small communities, it’s not clear to me how they would scale to massive communities. In fact, I don’t believe they can and this point is very important as without large-scale(i.e. global) coordination conflict is inevitable.

Consider the European Union for example. Many brilliant people including Dr. Stiglitz and Dr. Varoufakis have laid out careful plans for solving the European coordination problem. However, there’s little political will among politicians to execute a trans-national strategy that will fundamentally change the way the EU functions. This might have devastating consequences but I’m not here to encourage pessimism. On the contrary, I believe that a large-scale technological solution to  goal-alignment(i.e. coordination) might be possible.

The exact form of this solution isn’t clear to me right now although I believe that higher social network connectivity might be part of the solution. For this reason, I’d like to invite my readers to share their own solutions to this problem as I believe this would make it more likely that we discover a practical solution.

Learning integer sequences

Last Friday night I had the chance to watch the 2011 presentation of Demis Hassabis on ‘Systems neuroscience and AGI’ which I found fascinating. One of the intermediate challenges he brought up around the 33.40 minute mark was the problem of predicting the next term in an integer sequence. Given my math background, I thought I’d take a closer look at this problem.

Fanciful arguments:

On the surface this problem appears attractive for several reasons:

  1. It appears that no prior knowledge is required as all the data is contained in the n visible terms in the sequence.
  2. Small data vs Big data: little data is required in order to make reasonable prediction.
  3. It connects machine learning with data compression. In particular, it appears to emphasize the simplest algorithm(i.e. Ockham’s razor).

I shall demonstrate that all of the above preconceptions are mainly false.  Before presenting my arguments, I’d like to point out that there is a Kaggle competition on integer sequence prediction based on the OEIS dataset. However, I’d take the leaderboard with a grain of salt as it’s very easy to collect the sequences from the OEIS database and simply run a pattern-matching algorithm on the set of known sequences.

Examples of sequences:

Assuming that possible integer sequences are of the form (a_n)_{n=1}^\infty,a_n \in \{0,1,2,...,9\}  , here are some examples:

  1. Fibonacci numbers: 0,1,1,2,3,5…
  2. Collatz sequence: 0,1,7,2,5,8,16,3,19,6…number of steps for n to reach 1 in the Collatz process.
  3. Catalan numbers: 1,1,2,5,14…where a_n = \frac{1}{n+1}{2n \choose n}
  4. Khinchin’s constant: 2,6,8,5,4,5…base-10 representation of Khinchin’s constant

These four sequences come from the areas of basic arithmetic, number theory, combinatorics and real analysis but it’s possible to construct an integer sequence from a branch of mathematics that hasn’t been invented yet so the space of possible integer sequences is actually quite large. In fact, the space of computable integer sequences is countably infinite.

Bayesian inference:

I’d like to make the case that this is actually a problem in Bayesian inference which can potentially involve a lot of data. Let me explain.

When trying to predict the next term in an integer sequence,a_{n+1} ,  what we’re actually trying to discover is the underlying number generator and in order to do this we need to discover structure in the data. However, in order to discover any structure in the data we must make the strong assumption that the data wasn’t generated in a random manner. This hypothesis can’t be justified by any single sequence.

Moreover, assuming that there is structure, knowing the author of the sequence is actually very important. If the sequence was beamed to my mobile phone by friendly aliens from a distant galaxy, it would be difficult for me to guessa_{n+1}  with a better than chance probability of being right whereas if the sequence was provided by the local number theorist my odds would be slightly better. The reason is that my prior knowledge of the number theorists’ mathematical training is very useful information.

A reasonable approach would be to define a program that returns a probability distribution over at most ten sequence generators which generate distinct a_{n+1}  conditioned on prior knowledge. The hard part is that the body of the program would necessarily involve an algorithm capable of learning mathematical concepts.

Randomness and Data compression:

At this point someone might try to bring up a clever argument centred around Ockham’s razor. Now, I think that such an argument definitely has some merit. As a general rule it’s reasonable to assume that the sequence generator can be encoded in a language such that its description in that language is as short or shorter than the length of the sequence. Further, if we assume that the author is an organism they are probably highly organised in a spatial and temporal manner. In other words, it’s very unlikely that the sequence is random.

However, this doesn’t constrain the problem space. The necessary and sufficient general approach would be to use an AGI capable of abstract conceptual reasoning beyond that allowed by current machine learning algorithms. This is the only way it could possibly learn and use mathematical concepts. Good luck building that for a Kaggle competition.

Conclusion:

The main lesson drawn from this analysis is that in order to make measurable progress in the field of machine learning it’s very important to choose your problems wisely. In fact, I think it’s fair to say that choosing the right machine learning problems to work on is at least as important as the associated algorithmic challenges.

Note: It’s not surprising that nobody is prepared to offer money for such a competition. What I do find surprising is that some have tried to make ‘progress’ on this challenge.

A brief history of AI

The majority of today’s intellectuals and visionaries including Nick Bostrom and Demis Hassabis have a very curious belief that the quest for strong artificial intelligence is a recent phenomenon. In fact, if one thinks carefully this goal has actually been seriously pursued in the last 200 years. It is very far from a recent phenomenon but perhaps it might help if I clearly state what I mean by artificial intelligence. 

In 2007, Shane Legg, the Chief Scientist at DeepMind, came up with a good list of definitions of artificial intelligence due to different AI researchers and eventually he distills this into a single definition:

“Intelligence measures an agent’s ability to achieve goals in a wide range of environments.” -S. Legg and M. Hutter

Using this definition, I will use concrete examples to show that there have been at least three important attempts to develop strong artificial intelligence at varying degrees of abstraction in the last two hundred years and these systems have actually been applied to important problems even large numbers of people. 

1. Laplace’s Demon:

The goal of any grand unified theory in physics is to develop practical principles and algorithms that are capable of predicting the behaviour of any physical system. Now, in the early 1800s many scientists including Laplace believed that the joint development of classical mechanics and perturbation theory were sufficiently powerful to predict the behaviour of any observable system. This belief is summed up by Laplace as follows:

We may regard the present state of the universe as the effect of its past and the cause of its future. An intellect which at a certain moment would know all forces that set nature in motion, and all positions of all items of which nature is composed, if this intellect were also vast enough to submit these data to analysis, it would embrace in a single formula the movements of the greatest bodies of the universe and those of the tiniest atom; for such an intellect nothing would be uncertain and the future just like the past would be present before its eyes.

This entity which future scientists and philosophers called Laplace’s demon hasn’t quite lived up to expectations. Granted, Hamiltonian and Lagrangian methods are used for simulating a large number of physical systems today ranging from molecules to celestial bodies. However, the big obstacle facing this approach is not only the amount of data required but the fact that we have very few closed systems and almost all closed systems eventually behave in a chaotic(i.e. unpredictable) manner. To be precise, they have a finite Lyapunov time.

2. Communist Central Planning:

The most advanced versions of Communism involve a large number of enlightened and benevolent technocrats that make decisions for the rest of the population in order to guarantee economic equality. The basic idea is that if you get a lot of clever and well-intentioned people together the aggregate decisions will be much better than the accumulated economic decisions of the entire populace. This is not how Communism is usually introduced but this is how it’s always carried out in practice.

In the early 20th century this seemed like a brilliant idea but empirically it turned out to be a catastrophic failure. There are also very sound theoretical reasons for its failure. First, it leads to a monolithic structure that doesn’t adapt to market signals because they are non-existent. Second, the average person is not an idiot and “good technocrats” are simply conceited people that are too stubborn to change their minds. Third, while it theoretically guarantees that everybody does “equally well” it doesn’t guarantee that people do well at all. In fact, if you take into account the first two points the fact that a Central Planning system fails to adapt means that eventually everybody does “equally badly”.

The failure of Central Planning leads me to the next AI system.

3. Free markets:

The Free market is essentially a black box boosting algorithm unlike a Central Planning system. Instead of a well-defined group of elite decision makers you have a large number of agents of variable information processing ability which constitute what Adam Smith would call the Invisible Hand.

Proponents of free market economics argue that the “Free Market” has a number of very important theoretical properties. First, it takes into account market signals which means that it’s adaptable and in theory everybody is commensurately rewarded. Second, it’s regulated by a democratically-elected government to prevent non-competitive behaviour.

However, this system faces many theoretical and practical difficulties:

a) unpriced externalities: unpriced damage done to the environment among other things
b) wealth distribution: There’s no guarantee that the gini index is close to zero.
c) information asymmetry: No guarantee that every agent has access to reliable information. In fact, with the issue of big data today and who owns it this problem is becoming increasingly important.
d) black box: No economist can predict anything with any precision about the future behaviour of a free market economy. There have been unpredictable market crashes in the past and there’s nothing to prevent such catastrophic events in the future. 

The four points given above would cause alarm if I associated them with an AI system which would replace the “Free market”. AI theorists would quickly throw up their hands and say “What about goal alignment!?” However humans in Free Market economies and most economists are surprisingly comfortable with the current situation.

More importantly, the main point I’m trying to drive home is semantic in nature. There is no hard and fast rule that AI has to be digital or that it must be programmed via a laptop. The key thing is that there are universal design principles for building substrate-independent AI systems. 

Meanwhile, there are many warning signs that the free market system is in danger of imminent collapse. In fact, AI risks lie in the present and not the future as many suggest. The omnipresent AI risk is that we fail to build a more robust AI system to handle the economy while the Invisible Hand falls apart.

 Note: Surprisingly economists haven’t made a formal connection between boosting algorithms and free market systems but I promise to write a blog post on this subject in the near future. 

Behavioral Syntax v 1.1

I’ve been contributing to a Python-based behavioral Turing Test that’s part of OpenWorm for a while now and Behavioral Syntax v 1.1 is almost ready so it’s time I announced its existence. The methods are based on the ‘Behavioral Syntax’ paper of Andre Brown et al. which attempts to describe the locomotion of C. Elegans in terms of a vocabulary of ‘worm shapes’ that are determined using K-means clustering.

postures
the green segment designates the head

90 distinct postures are sufficient to account for 82% of postural variance so we
can map worm shapes from videos to these 90 postures with minimal loss of
information. And lo and behold! We now have a problem that amounts to comparing sequences and suddenly we can use methods from natural language processing and bioinformatics.

Now, I must add that this project was never intended to merely be a Python
version of Dr Brown’s Matlab code. From the beginning I planned to design
a fast behavioral Turing Test that would serve as a first pass filter. It would
automatically generate lab reports that would make it simple to assess whether
or not a simulated worm resembled a normal C. Elegans in its behavior.

The primary method I plan to use is bayesian classification using sub-models
such as minimal description length, grammatical inference, hierarchical
Markov models, and simpler things such as computing posture heat-map
similarity. A posture heat-map might be useful in a uniform setting such as
agar petri dishes off-food. Assuming that a standard search algorithm such
as Lévy flight search is in use, I would expect a certain pattern in the
occurrence of postures. You might think that Lévy flights would be off-limits
to my project since they represent continuous-time behavior but my plan is
actually to compare ‘continuous’ and ‘discrete’ models. It would be interesting
to see what kind of ‘false’ worms can actually fool a discrete Turing test.

Speaking of which, I’m actually planning to begin work on a ‘false worm’
repository. Initially these would be simple test cases to see whether the
Behavioral Turing test can actually be fooled by a random
piece-wise harmonic path on a flat surface. I can’t think of a better way to
see whether these tests actually have any value.

Stay tuned. In less than a week it should be possible for anybody to install the repository using pip.