A team of researchers at Oxford University developed an algorithm called zero-divergence inference learning (Z-IL), an alternative to the backpropagation (BP) algorithm for training neural network AI models. Z-IL has been shown to exactly reproduce the results of BP on any neural network, but unlike BP does not violate known principles of brain function.
The algorithm was described in a paper presented at the latest Conference on Neural Information Processing Systems (NeurIPS). Z-IL produces the same updates to a neural network's weights as does BP. Unlike BP, however, Z-IL conforms to several observed properties of "natural" neural networks (e.g., human brains), including local plasticity and autonomous updates. A more recent work by the researchers also shows that Z-IL can work for every possible neural network architecture and is comparable to BP in computational efficiency. According to the authors,
Overall, our work provides important evidence for the debate on the long-disputed question whether the brain can perform [backpropagation].
Artificial neural networks (ANN), which are widely used today in deep-learning applications, are a mathematical model of neurons, the cells that make up the brains of living creatures. Early implementations of ANNs were shown to be unable to learn functions that are not linearly separable; for example, the XOR logic function. In a 1986 paper in Nature, deep-learning pioneer Geoffrey Hinton and colleagues applied BP techniques to train multi-layered neural networks. During the training process, the network's prediction errors are propagated backward---from the output layer to the input layer---using automatic differentiation. This is used to update the network's weights in a way that minimizes prediction errors.
In 1989, in another article in Nature, Francis Crick (of DNA double-helix fame) claimed that it was "highly unlikely" that BP could happen in biological brains, given that there is no backward flow of information. While deep-learning models trained with BP have achieved human-like, or even super-human performance on some tasks, researchers who are interested in building a general AI system based on proven biological principles have worked to develop more biologically plausible training algorithms. These algorithms must exhibit local plasticity, meaning that a neuron's weights be updated based only on information local to it, with minimal external control.
Predictive coding (PC) is one such algorithm. Predictive coding is a model of brain function, composed of hierarchical layers, similar to ANNs. However, while an ANN assumes the output of a lower layer is an input to a higher layer, in the PC model a layer's function is to predict the activity of a layer below, and the prediction errors are transmitted to higher layers. The training algorithm then updates a layer's weights to minimize the errors. However, unlike BP, PC does not require the ultimate output error to be "chained" back to the input layer. Instead, each layer can be updated using only information from the next higher layer. PC is consistent with neurophysiology and works as a model for many perceptual activities. Furthermore, because the updates require only local information, the process is more parallelizable.
In 2020, researchers from the University of Edinburgh and the University of Sussex published a paper on arXiv demonstrating that a PC algorithm could approximate BP on any neural network. The team open-sourced their implementation code for training recursive neural networks (RNN) and convolutional neural networks (CNN). By contrast, the Oxford team has now shown that PC can produce exactly the same network weights as BP, for any neural network. The team used their algorithm to train several models of various architectures, including AlexNet, ResNet, an RNN and a Transformer. The researchers also showed that their algorithm achieved a runtime comparable to BP training, almost an order-of-magnitude improvement compared to their previous PC implementation.
In a discussion on Hacker News, commenters pointed out the potential benefits of the parallel nature of PC:
This work opens the door for using new kinds of massively parallel "neuromorphic" hardware to implement orders of magnitude more layers and units, without requiring greater communications bandwidth between layers, because the model no longer needs to wait until gradients have back-propagated from the last to the first layer before moving on to the next sample. Scaling backpropagation to GPT-3 levels and beyond (think trillions of dense connections) is very hard -- it requires a lot of complicated plumbing and bookkeeping.
The Oxford researchers also note that their work might lead to new developments in neuromorphic computing. Furthermore, they claim that since they have shown how a BP equivalent can be implemented by biological systems, it may give neuroscience researchers a justification to use BP models in their work.