Last Friday night I had the chance to watch the 2011 presentation of Demis Hassabis on ‘Systems neuroscience and AGI’ which I found fascinating. One of the intermediate challenges he brought up around the 33.40 minute mark was the problem of predicting the next term in an integer sequence. Given my math background, I thought I’d take a closer look at this problem.
On the surface this problem appears attractive for several reasons:
- It appears that no prior knowledge is required as all the data is contained in the n visible terms in the sequence.
- Small data vs Big data: little data is required in order to make reasonable prediction.
- It connects machine learning with data compression. In particular, it appears to emphasize the simplest algorithm(i.e. Ockham’s razor).
I shall demonstrate that all of the above preconceptions are mainly false. Before presenting my arguments, I’d like to point out that there is a Kaggle competition on integer sequence prediction based on the OEIS dataset. However, I’d take the leaderboard with a grain of salt as it’s very easy to collect the sequences from the OEIS database and simply run a pattern-matching algorithm on the set of known sequences.
Examples of sequences:
Assuming that possible integer sequences are of the form , here are some examples:
- Fibonacci numbers: 0,1,1,2,3,5…
- Collatz sequence: 0,1,7,2,5,8,16,3,19,6…number of steps for n to reach 1 in the Collatz process.
- Catalan numbers: 1,1,2,5,14…where
- Khinchin’s constant: 2,6,8,5,4,5…base-10 representation of Khinchin’s constant
These four sequences come from the areas of basic arithmetic, number theory, combinatorics and real analysis but it’s possible to construct an integer sequence from a branch of mathematics that hasn’t been invented yet so the space of possible integer sequences is actually quite large. In fact, the space of computable integer sequences is countably infinite.
I’d like to make the case that this is actually a problem in Bayesian inference which can potentially involve a lot of data. Let me explain.
When trying to predict the next term in an integer sequence,, what we’re actually trying to discover is the underlying number generator and in order to do this we need to discover structure in the data. However, in order to discover any structure in the data we must make the strong assumption that the data wasn’t generated in a random manner. This hypothesis can’t be justified by any single sequence.
Moreover, assuming that there is structure, knowing the author of the sequence is actually very important. If the sequence was beamed to my mobile phone by friendly aliens from a distant galaxy, it would be difficult for me to guess with a better than chance probability of being right whereas if the sequence was provided by the local number theorist my odds would be slightly better. The reason is that my prior knowledge of the number theorists’ mathematical training is very useful information.
A reasonable approach would be to define a program that returns a probability distribution over at most ten sequence generators which generate distinct conditioned on prior knowledge. The hard part is that the body of the program would necessarily involve an algorithm capable of learning mathematical concepts.
Randomness and Data compression:
At this point someone might try to bring up a clever argument centred around Ockham’s razor. Now, I think that such an argument definitely has some merit. As a general rule it’s reasonable to assume that the sequence generator can be encoded in a language such that its description in that language is as short or shorter than the length of the sequence. Further, if we assume that the author is an organism they are probably highly organised in a spatial and temporal manner. In other words, it’s very unlikely that the sequence is random.
However, this doesn’t constrain the problem space. The necessary and sufficient general approach would be to use an AGI capable of abstract conceptual reasoning beyond that allowed by current machine learning algorithms. This is the only way it could possibly learn and use mathematical concepts. Good luck building that for a Kaggle competition.
The main lesson drawn from this analysis is that in order to make measurable progress in the field of machine learning it’s very important to choose your problems wisely. In fact, I think it’s fair to say that choosing the right machine learning problems to work on is at least as important as the associated algorithmic challenges.
Note: It’s not surprising that nobody is prepared to offer money for such a competition. What I do find surprising is that some have tried to make ‘progress’ on this challenge.