Home
  • Home
  • Categories
  • Tags
  • Archives

Problem with Sigmoid and Sum of Squares Loss

Sigmoid and Quadratic Cost¶

$$a^L = \frac{1}{1 + e^{-z^L}}$$
$$C = \sum_j (a^L_j-y_j)^2$$

Backpropagating the loss to the biases and weights of the output layer:

$$\frac{\partial C}{\partial b_j^L} = \frac{\partial C}{\partial a_j^L}\frac{\partial a_j^L}{\partial b_j^L} = (a_j^L-y_j)\sigma'(z_j^L)$$$$\frac{\partial C}{\partial w_{jk}^L} = \frac{\partial C}{\partial a_j^L}\frac{\partial a_j^L}{\partial w_{jk}^L} = a_k^{L-1}(a_j^L-y_j)\sigma'(z_j^L)$$

As $\sigma'(z_j^L) = \sigma(z_j^L) (1- \sigma(z_j^L))$, $\frac{\partial C}{\partial b_j^L}$ and $\frac{\partial C}{\partial w_{jk}^L}$ become small when $\sigma(z_j^L)\approx 0$ or $\sigma(z_j^L) \approx 1$. This behavior is bad when $\sigma(z_j^L)$ is near to the wrong extreme.

Sigmoid and Cross-entropy¶

$$a^L = \frac{1}{1 + e^{-z^L}}$$
$$C = -\sum_j y_j \ln a^L_j$$

Backpropagating the loss to the biases and weights of the output layer:

$$\frac{\partial C}{\partial b_j^L} = \frac{\partial C}{\partial a_j^L}\frac{\partial a_j^L}{\partial b_j^L} = a_j^L-y_j$$$$\frac{\partial C}{\partial w_{jk}^L} = \frac{\partial C}{\partial a_j^L}\frac{\partial a_j^L}{\partial w_{jk}^L} = a_k^{L-1}(a_j^L-y_j)$$

Linear Output and Quadratic Cost¶

$$a^L = z^L$$
$$C = \sum_j (a^L_j-y_j)^2$$

Backpropagating the loss to the biases and weights of the output layer:

$$\frac{\partial C}{\partial b_j^L} = \frac{\partial C}{\partial a_j^L}\frac{\partial a_j^L}{\partial b_j^L} = a_j^L-y_j$$$$\frac{\partial C}{\partial w_{jk}^L} = \frac{\partial C}{\partial a_j^L}\frac{\partial a_j^L}{\partial w_{jk}^L} = a_k^{L-1}(a_j^L-y_j)$$

Softmax Output and Log-likelihood¶

$$a^L_j = \frac{e^{z^L_j}}{\sum_j e^{z^L_j}}$$
$$C = -\ln a^L_y$$

Backpropagating the loss to the biases and weights of the output layer:

$$\frac{\partial C}{\partial b_j^L} = a_j^L-y_j$$$$\frac{\partial C}{\partial w_{jk}^L} = a_k^{L-1}(a_j^L-y_j)$$
Comments
comments powered by Disqus

  • « Object Tracking
  • Weight Initialization »

Published

Jan 17, 2017

Category

Theoretical ML

Tags

  • Analysis 3
  • Powered by Pelican. Theme: Elegant by Talha Mansoor