Singular learning theory is a way of understanding neural networks by studying how develop during training, not looking at the end product and dissecting it. Akin to studying how a child learns and becomes an adult with a formed brain (arguably, takes longer for some than others. Some never develop one).
But how?
Ah that’s the interesting question.
Notes
- At bottom of the loss valley lie several rivers of constants. Generalisation is a balance between expressivity and simplicity - more parameters vs less parameters
- Four basic concepts:
- The “truth”, q (x ) some distribution that is generating samples
- A model, p (x|w), parametrized by weights w ∈ W ⊂ R d, where W is compact;
- A prior over weights, φ (w);
- And a dataset of samples D n = { X 1 , … , X n } , where each random variable X i is i.i.d. according to q ( x ) .
- These shapes get formed during training (backpropagation/loss minimisation valleys)
- Two levels of learning:
- Lower level: finding the optimal weights for a data set such that the predicted output matches the truth
- Higher level: finding the optimal model class/architecture for a given data set
Further reading
- Hoogland, J. (2023). Neural networks generalize because of this one weird trick. LessWrong. Singular learning theory as an explanation for why overparameterised networks generalise.
- SLT for AI safety. LessWrong.