Singular learning theory is a way of understanding neural networks by studying how develop during training, not looking at the end product and dissecting it. Akin to studying how a child learns and becomes an adult with a formed brain (arguably, takes longer for some than others. Some never develop one).

But how?

Ah that’s the interesting question.

Notes

  1. At bottom of the loss valley lie several rivers of constants. Generalisation is a balance between expressivity and simplicity - more parameters vs less parameters
  2. Four basic concepts:
  • The “truth”, q (x ) some distribution that is generating samples
  • A model, p (x|w), parametrized by weights w ∈ W ⊂ R d, where W is compact;
  • A prior over weights, φ (w);
  • And a dataset of samples D n = { X 1 , … , X n } , where each random variable X i is i.i.d. according to q ( x ) .
  1. These shapes get formed during training (backpropagation/loss minimisation valleys)
  2. Two levels of learning:
  • Lower level: finding the optimal weights for a data set such that the predicted output matches the truth
  • Higher level: finding the optimal model class/architecture for a given data set

Further reading