## Introduction:

In an excellent paper published less than two years ago, Timothy Lillicrap, a theoretical neuroscientist at DeepMind, found a simple yet reasonable solution to the weight transport problem. Essentially, Timothy and his co-authors showed that it’s possible to do backpropagation with random weights and still obtain very competitive results on various benchmarks [2]. The reason why this is really significant is that it marks an important step towards biologically plausible deep learning.

## The weight transport problem:

While backpropagation is a very effective approach for training deep neural networks, at present it’s not at all clear whether the brain might actually use this method for learning. In fact, backprop has three biologically implausible requirements [1]:

1. feedback weights must be the same as feedforward weights
2. forward and backward passes require different computations
3. error gradients must be stored separately from activations

A biologically plausible solution to the second and third problems is to use an error propagation network with the same topology as the feedforward network but used only for backpropagation of error signals. However, there is no known biological mechanism for this error network to know the weights of the feedforward network. This makes the first requirement, weight symmetry, a serious obstacle.

This is also known as the weight transport problem [3].

## Random synaptic feedback:

The solution proposed by Lillicrap et al. is based on two good observations:

1. Any fixed random matrix $$B$$ may serve as a substitute for the original matrix $$W$$ in backpropagation provided that on average we have:

$$e^\top WB e > 0$$

where $$e$$ is the error in the network’s output. Geometrically, this is equivalent to requiring that $$e^\top W$$ and $$Be$$ are within $$90^{\circ}$$ of each other.

2. Over time we get better alignment between $$W$$ and $$B$$ due to the modified update rules which means that the first requirement becomes easier to satisfy with more iterations.

## A simple example:

Let’s consider a simple three layer linear neural network that is intended to approximate a linear mapping:

$$$\begin{cases} h = W_0 x \\ y = W h \\ e = Tx -y \end{cases}$$$

The loss is given by:

$$\mathcal{L} = \frac{1}{2} e^\top e$$

From this we may derive the following backpropagation update equations:

$$\Delta W \propto \frac{\partial \mathcal{L}}{\partial W} = \frac{\partial \mathcal{L}}{\partial e} \frac{\partial e}{\partial y} \frac{\partial y}{\partial W} = e \cdot -1 \cdot h = e h^\top$$

$$\Delta W_0 \propto \frac{\partial \mathcal{L}}{\partial W_0} = \frac{\partial \mathcal{L}}{\partial e} \frac{\partial e}{\partial y} \frac{\partial y}{\partial h} \frac{\partial e}{\partial W_0} = e \cdot (-1) \cdot W \cdot x = -W^\top e x^\top$$

Now the random synaptic feedback innovation is essentially to replace step $$(5)$$ with:

$$\Delta W_0 \propto B e x^\top$$

where $$B$$ is a fixed random matrix. As a result, we no longer need explicit knowledge of the original weights in our update equations. I actually implemented this method for a three-layer sigmoid (i.e. nonlinear) neural network and obtained 89.5% accuracy on the MNIST dataset after 10 iterations, a result that is competitive with backpropagation.

## Discussion:

In spite of its remarkable simplicity, Timothy Lillicrap’s solution to the weight transport problem is very effective and so I think it deserves further investigation. In the near future I plan to implement random synaptic feedback for much larger sigmoid and ReLU networks as well as recurrent neural networks in order to build upon the work of [1].

Considering all the approaches to biologically plausible deep learning attempted so far, I believe this work represents a very important step forward.

## References:

1. How Important Is Weight Symmetry in Backpropagation? (Qianli Liao, Joel Z. Leibo, Tomaso A. Poggio. 2016. AAAI.)
2. Random synaptic feedback weights support error backpropagation for deep learning(Lillicrap 2016. Nature communications.)
3. Grossberg, S. 1987. Competitive learning: From interactive activation to adaptive resonance. Cognitive science 11(1):23–63.