NNUEs are an edit made to Multi-Layer Perceptrons that makes them very efficient and usable as an evaluation function for chess engines, NNUEs are quite recent (introduced in 2018) and using them alongside 0x88 and MTD(f) is quite funny if you ask me! basically honda civic with a jet engine basically
MLPs?
Multi-Layer Perceptrons is usually the first thing one gets introduced to in deep learning, it's the most fundamental NN one can get exposed to after the Perceptron, i suggest watching 3b1b's really awesome deep learning series! before continue-ing on reading this article!
Back-propagation
im writing this section mostly for myself since i got kinda confused.
let's fix out MLP to only have 2 hidden layers of arbitrary size, let's define an error $L$ as $$L = (y - e)^2$$ where $y$ is the result given by our network by feed-forwarding, and $e$ is what we expected, we have that where $\sigma(x)$ is an activation function: $$y = \sigma(out_b + \sum_{i = 0}^{\text{HL2 size}} out_w \times HL2[i])$$ and we have that
$$HL2[i] = \sigma(HL2[i]b + \sum{j=0}^{\text{HL1 size}} HL1[j] \times HL2[i]_w) \newline HL1[j] = \sigma(HL1[j]b + \sum{k=0}^{\text{input size}} input[k] \times HL1[j]_w)$$
so if we can find partial derivative in respect to weights and biases of $L$ we can just do gradient descent, so let's get in the rabbit hole of finding the (partial) derivative!
let's try to find it for some values (yes, by hand!): $$\frac{dL}{dout_b} = \frac{dL}{dy}\frac{dy}{dout_b} = \frac{1}{2} (2y - 2e) \times 1 = y-e$$