Sigmoid and Quadratic Cost¶

$$a^L = \frac{1}{1 + e^{-z^L}}$$

$$C = \sum_j (a^L_j-y_j)^2$$

Backpropagating the loss to the biases and weights of the output layer:

$$\frac{\partial C}{\partial b_j^L} = \frac{\partial C}{\partial a_j^L}\frac{\partial a_j^L}{\partial b_j^L} = (a_j^L-y_j)\sigma'(z_j^L)$$$$\frac{\partial C}{\partial w_{jk}^L} = \frac{\partial C}{\partial a_j^L}\frac{\partial a_j^L}{\partial w_{jk}^L} = a_k^{L-1}(a_j^L-y_j)\sigma'(z_j^L)$$

As $\sigma'(z_j^L) = \sigma(z_j^L) (1- \sigma(z_j^L))$, $\frac{\partial C}{\partial b_j^L}$ and $\frac{\partial C}{\partial w_{jk}^L}$ become small when $\sigma(z_j^L)\approx 0$ or $\sigma(z_j^L) \approx 1$. This behavior is bad when $\sigma(z_j^L)$ is near to the wrong extreme.

Sigmoid and Cross-entropy¶

$$a^L = \frac{1}{1 + e^{-z^L}}$$

$$C = -\sum_j y_j \ln a^L_j$$

Backpropagating the loss to the biases and weights of the output layer:

$$\frac{\partial C}{\partial b_j^L} = \frac{\partial C}{\partial a_j^L}\frac{\partial a_j^L}{\partial b_j^L} = a_j^L-y_j$$$$\frac{\partial C}{\partial w_{jk}^L} = \frac{\partial C}{\partial a_j^L}\frac{\partial a_j^L}{\partial w_{jk}^L} = a_k^{L-1}(a_j^L-y_j)$$

Linear Output and Quadratic Cost¶

$$a^L = z^L$$

$$C = \sum_j (a^L_j-y_j)^2$$

Backpropagating the loss to the biases and weights of the output layer:

Softmax Output and Log-likelihood¶

$$a^L_j = \frac{e^{z^L_j}}{\sum_j e^{z^L_j}}$$

$$C = -\ln a^L_y$$

Backpropagating the loss to the biases and weights of the output layer:

$$\frac{\partial C}{\partial b_j^L} = a_j^L-y_j$$$$\frac{\partial C}{\partial w_{jk}^L} = a_k^{L-1}(a_j^L-y_j)$$

Comments

Problem with Sigmoid and Sum of Squares Loss

Sigmoid and Quadratic Cost¶

Sigmoid and Cross-entropy¶

Linear Output and Quadratic Cost¶

Softmax Output and Log-likelihood¶

Published

Category

Tags