If you are interested in understanding and building intelligent learning systems then the obvious place to look for inspiration is the brain. Unfortunately we don’t really understand how the brain works at a system wide level to be able to replicate it, and it’s not obvious that we should even try.
If it is inspiration we are looking for then neurons are a good place to start. While neurons in the brain may have inspired ML approaches like artificial neural networks (i.e most modern deep learning), when you dig into the details you start to realize there isn’t actually a huge amount in common.
With his post I am to cover the following:
I wanted to write this post when I saw this tweet, but only felt capable of doing so having recently finished reading The Spike by Mark Humphries. I pull a lot of the narratives and details from the book - I can take no credit for the ideas here myself. I’d recommend grabbing a copy if this post interests you!
Let’s start with artificial neurons. Inspired by real neurons, a CS/ML view of an artificial neuron might be something like the following:
Synapses (weights)
┌───┐
────►│0.2├───────────────┐
Inputs └───┘ │
│ Sum of weights x inputs
┌───┐ │ ┌──────────┐ _____
────►│0.7├──────────┤ │ │ / Activation fn
└───┘ └───┬───►│ Neuron ├─────► / ───────────►
│ │ │ / Output
┌───┐ ┌───────┘ └──────────┘ ____/
────►│0.0├──┤
└───┘ │
│
│
┌───┐ │
────►│1.3├─────────┘
└───┘
I’ve deliberately drawn this kind of like a biological neuron, but ultimately the structure in the above diagram is irrelevant as this is really just a big matrix multiplication. Or in the case of a single neuron, the activation function applied to the inner product of the inputs and the synapse weights.
This is the most basic component of pretty much all artificial neural networks used by deep learning, with an activation function (e.g a sigmoid, ReLu) that usually provides a source of non-linearity (it is not simply of the form
At a high level neurons have 3 components:
The synapse is a connection between an axon and a dendrite.
Here’s a basic morphological diagram of a pyramidal neuron. If you don’t want to stretch your imagination to ascii diagrams, have a look at some of the extremely detailed imagery on pyramidal neurons on the H01 dataset.
│
│ │
│ │ │ │
│ ┌──┘ │ │
└──────┤ ├───┘
│ │
│ ┌───┘
│ │
──────────┐ └─┤
│ │
│ │
────┐ │ ├───────┐
│ └────┤ │
│ │ │
└─────────┤ │ │
│ │ │
Dendrites ├───┘
│
┌────────┤ ┌─────────
│ │ │
│ │ ├────────
───────────┘ ┌──▼───┐ │
│ Cell ◄───┬──┘
┌─────────► Body │ │
│ └──┬───┘ └──────────
│ │
──────┘ │
│
│Axon
│
│
│
┌──┤
│ │
│ │
◄──────┘ │
│
└──────────────►
Pyramidal cells (PC) could probably be called “the powerhouse of the cortex”, and make up up ~75% of excitatory neurons in it.
Real neurons operate on electrical voltage and their output is a spike - a positive blip in voltage for a short period of time. Spikes from a specific neuron are generally the same shape, strength etc each time. There may be one of them, or possibly a burst of them. Spikes travel down the axon of a neuron, and look something like this:
┌─┐
▲ │ │
│Voltage │ │
│ │ │
│ │ │
│ │ │
│ │ │
│ │ │
│ │ │
│ ┌─┘ │
│ ┌┘ │
│ ┌┘ │
│ ┌┘ │
│ ┌──┘ │
│ ┌─┘ └┐
├───────────────┘ └┐ ┌────────────────
│ │ ┌┘
│ └┐ ┌┘
│ └┐ ┌─┘
│ └──┘
│ Time
└───────────────────────────────────────────────────────►
Roughly the events leading up to a voltage spike travelling down the axon of a PC are as follows:
Spikes are cheap in terms of energy, can travel quickly, and over long distances. This is particularly useful as our brains and bodies have grown over several hundred million years. As they are a sort of digital signal, they are much less prone to degradation over long distances than analog ones.
Each neuron in the brain either sends out an excitatory signal, or an inhibitory one. They all spike (except in the retina), and those spikes always have a positive voltage. Depending on the neurotransmitter they release the effect on the neurons they connect to could either contribute to a voltage increase in the dendritic tree of the target neuron (excitatory) or inhibit any voltage spikes in the dendritic tree (inhibitory).
For a typical PC, 90% of their inputs are excitatory and 10% inhibitory for a total number of around 7.5K synapses.
This is in contrast to an artificial neuron, where a model weight/parameter (a synapse weight) can change to be either positive or negative and is treated in the same way.
When an inhibitory neurons fires and activates the synapse of a target neuron they don’t contribute a negative voltage to the target neuron that gets aggregated at the soma like excitatory synapses do.
Rather, they create what you can think of as small voltage “holes” that a mini-spike travelling along the dendrite can fall through before it gets to the cell body to help facilitate depolarization at the cell body. This is quite different to how a negative weight works in an artificial neuron.
┌─┐
│ │ Mini-spike
│ │ ─────►
─┘ │ ┌─
└─┘
Dendrite
─────────────────────────────────────►
Spike gets sucked up
No spike continues
─────────────────────────────────────►
* * * *
┌───────┐
│ │
└─┐ ┌──┘Active
│ │ Inhibitory
│ │ Synapse
│ │
Inhibitory synapse behaviour is therefore highly dependent on the structure of the dendritic tree, affecting excitatory synapses that must pass them on their way to the cell body.
A dendrite may have many synapses close together. If one of those synapses activates a mini-spike will travel down the tree with strength x. If two synapses activate near to each other, the size of the spike that travels on will be > 2x. They don’t just add up, it’s supralinear.
The dendritic tree has many nonlinearities thanks to dendritic tree structure, dynamics of inhibition and supralinear effects previously discussed, and a variety of different types of synapse with varying dynamics, timing, etc. You can’t just think of them as electrical cables. The result of these dynamics ultimately means that for a single “layer” of biological neurons, you may need 5-8 layers of an artificial neurons to model the same dynamics.
A single spike in some far away part of the dendritic tree will likely have little no impact on the voltage of the cell body. There is attenuation of the signal meaning it weakens as it travels. Spikes have to work together if they want to do anything - given this supralinear affect, a small cluster of activated synapses nearby might start to have a chance of making a mini-spike big enough to travel the whole way to the cell body.
But even then, possibly nothing will happen. A pyramidal neuron may typically have 10K synapses onto it’s dendritic tree, and would require at least ~150 of them to activate at roughly the same time in order to cause the voltage at the cell body to pass it’s tipping point and depolarize, causing an onward spike. There are many factors that contribute to this, the reliability of synapses, the fairly limited range of synaptic strength. The key thing is that spikes must work together, and as Mark Humphries puts it in the book - it’s all about “The Legion”.
Despite their analog implementation spikes come in a binary flavour - they either happen, or they don’t. But with all biological neurons those spikes still happen at a specific time and that timing is probably very important. Spikes can be reliably accurate down to ~10
Additionally it’s well known that plasticity - the adjustment of the number or strength of synapses between neurons - is in many cases dependent on relatively precise timing of presynaptic (the neuron that sends a signal across the synapse) and postsynaptic (the neuron whom the dendrite in question belongs to) activity.
One might typically conceive of the activation of neurons in an ANN as representing the the rate of firing of of the neuron (although this clearly not a fair comparison, as by this point it is probably clear that ANNs are simply just a different thing).
The more input activity the higher that rate will be generally, with firing rates saturating at various levels depending on the type of neuron, at the upper end firing rates approaching a limit of around 500 spikes/s.
There are such things as (artificial) spiking neural networks that attempt to more closely reflect the binary nature of a spike, and while they potentially promise some significant advantages such as their ability to map onto neuromorphic hardware and be significantly more energy efficient - there has been limited success in training such artificial models at scale.
Whether the rate model with work with today is sufficient or not for all our future AI use cases is an unanswered question but a difference that is hard to ignore.
In biological neurons particularly at synapses - sometimes things just fail. Many excitatory synapses in the cortex may only successfully influence the target neuron 25% of the time and in some cases, the success rate has been measured as low as 5%. Reliability of a synapse as well as it’s strength can change however, and strengthening a connection between neurons may actually mean making that connection more reliable rather than stronger in some cases.
Interestingly the lack of reliability of single connections may have some functional advantages, with a parallel in artificial networks - dropout. It turns out that randomly dropping connections between ANNs during training actually aids with generalization and avoiding overfitting, and potentially it may have the same effect in biological neurons.
There are other practical effects at play too. When neurotransmitters are emitted by a synapse, some time and energy is required to “reset” the synapse, and this can take time, resulting in effect known as short-term-depression (STD). STD may also allow additional forms of computation to the timing dynamics it can introduce.
This is at the least another finger pointing at the very limited power of a single spike, and the importance of spikes combining together to have an effect.
So far I’ve covered details about individual neurons. We can start to zoom out a little and consider how these have an impact on neuronal populations and the network overall.
In 1 second, fewer than 10% of cortical neurons will fire. 1 second is quite a long time in the brain, significantly more time that it would take you to initiate the movements required to catch a glass falling off a table, for example.
Some neurons fire a lot, and 10% of neurons contribute 50% of spikes in the cortex.
The distribution is long tailed, with many neurons that fire quite rarely, possibly not at all over the course of several minutes, and these are referred to as “dark” neurons.
What is their purpose? We don’t know of course, but the simplest reason might be energy efficiency. It’s possible that they are sitting there quietly, ready to pick up some new pattern and come to life in the future. Or returning to the idea of “the legion”, many less active neurons may work together by firing together to have an effect. It may also be that we simply haven’t succeeded in recording these neurons correctly.
Sparsity is a frequently explored idea in AI research, and may have some benefits in many cases. Interestingly one of the key challenges with sparsity is implementing it efficiently in computer hardware, with any performance gain only being realised when sparsity overall is at ~5% or lower. This may be at least one of the reasons why it is not a staple component of modern machine learning.
Some cells just create their own spikes at regular intervals, dubbed pacemaker neurons. In these neurons the low voltage after the spike causes other channels to open up that gradually increase the voltage back to the tipping point. These pacemaker neurons are present in many parts of the brain.
However, a lot of “spontaneous” activity isn't coming from specific neurons but the network itself with its many feedback connections. It is spontaneous in the sense that it isn’t directly caused as a result of sensory input.
Spontaneous activity is using a lot of energy, but the assumed role of this spontaneous activity via feedback - prediction - probably makes the cost worth it.
To be able to talk about feedback we have to have some notion of direction. While in some parts of the brain the concept of direction might be messy, we can try to draw out this concept in the ventral (“what”) visual pathway. Here’s a very simplified view:
┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐
Eyes │ │ │ │ │ │ │ │
────────►│ V1 ├──►│ V2 ├──►│ V4 ├──►│ IT │
│ │ │ │ │ │ │ │
└────────┘ └────────┘ └────────┘ └────────┘
As the input from the eyes hits V1, basic textures, edges, orientations are detected. These neurons project (send their axons) onto V2, and so on, at each level capturing a high level of concepts or abstract features, and eventually in IT and the Fusiform Face Area (FFA) we can detect faces. We can think of this as the forward direction of the pathway.
But here’s the thing - of all the incoming connections to the V1 area, only a small minority, around 5%, are actually coming from the eyes. In reality, our connections are more like this:
┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐
│ │◄──┤ │◄──┤ │◄──┤ │
│ │ │ │ │ │ │ │
Eyes │ │ │ │ │ │ │ │
────────►│ V1 │◄──┤ V2 │◄──┤ V4 │◄──┤ IT │
│ │ │ │ │ │ │ │
│ │ │ │ │ │ │ │
│ ├──►│ ├──►│ ├──►│ │
└────────┘ └────────┘ └────────┘ └────────┘
Even without input from the eyes, there is plenty of input into and activity happening in V1.
One elegant way to think about this is to accept the idea that “perception is a controlled hallucination”. Our brains are very, very good at filling in the gaps and predicting what comes next, perhaps those predictions are even providing the majority of our sensory experience, and in maturity, our senses providing occasional grounding / reality checks, at least in familiar experiences or tasks. Broadly this idea is fleshed out in much more detail in Predictive Coding.
Many biological systems are kept in a state of homeostasis, where self regulating processes keep the system in a certain state (for example, body temperature).
The brain is no different and through a variety of mechanisms, including at least the strengthening and weakening of both excitatory and inhibitory synapses - much of the brain is looking to maintain a state of activity that is right at the edge of criticality. That is just a few spikes might be enough to trigger an avalanche of neuronal responses, feedback, and ultimately action.
If these mechanisms somehow fail and activity runs away it can cause mass synchronicity across the brain and epilepsy. The inherent randomness throughout spiking neurons thankfully works to battle this synchronicity as well as the inhibitory neurons.
This state of balance also helps in priming neural activity such that when our senses confirm our expectations, our neural circuitry is tipped over the edge and can react quickly, a sort of priming or form of attention.
With feedback connections, we also get recurrence, i.e loops. These loops can serve many purposes. Here are some examples:
The number of loops through neuronal circuitry in a the brain is just unimaginably large, and very hard to study.
This is a very sparse (heh) view of neurons, and I barely touched the surface of so many other details that are probably quite important. I did not cover the huge variety of different neurons in the brain, neurotransmitters and modulators, any details about the macro structure of the cortex, columns or layers or the specialized areas of the midbrain, basal ganglia, cerebellum, other cell types, any discussion at all of learning mechanisms, or the fact we have two brain halves.
But with that said, hopefully you’ve discovered something about biological neurons you did not previously know, and have a greater appreciation for just how wildly complex they and their dynamics are in comparison to the artificial neurons of deep neural networks.
I make no claim on what we can or should do with this. I believe the brain is a potential source of inspiration, and understanding why the brain works in these ways has the potential to benefit work in AI research, even if we don’t directly model it.
If I were to summarize these points into a few key themes: