# Sparse Autoencoder Notation Summary

Here is a summary of the symbols used in our derivation of the sparse autoencoder:

Symbol Meaning $\textstyle x$ Input features for a training example, $\textstyle x \in \Re^{n}$. $\textstyle y$ Output/target values. Here, $\textstyle y$ can be vector valued. In the case of an autoencoder, $\textstyle y=x$. $\textstyle (x^{(i)}, y^{(i)})$ The $\textstyle i$-th training example $\textstyle h_{W,b}(x)$ Output of our hypothesis on input $\textstyle x$, using parameters $\textstyle W,b$. This should be a vector of

the same dimension as the target value $\textstyle y$. $\textstyle W^{(l)}_{ij}$ The parameter associated with the connection between unit $\textstyle j$ in layer $\textstyle l$, and

unit $\textstyle i$ in layer $\textstyle l+1$. $\textstyle b^{(l)}_{i}$ The bias term associated with unit $\textstyle i$ in layer $\textstyle l+1$. Can also be thought of as the parameter associated with the connection between the bias unit in layer $\textstyle l$ and unit $\textstyle i$ in layer $\textstyle l+1$. $\textstyle \theta$ Our parameter vector. It is useful to think of this as the result of taking the parameters $\textstyle W,b$ and unrolling them into a long column vector. $\textstyle a^{(l)}_i$ Activation (output) of unit $\textstyle i$ in layer $\textstyle l$ of the network.

In addition, since layer $\textstyle L_1$ is the input layer, we also have $\textstyle a^{(1)}_i = x_i$. $\textstyle f(\cdot)$ The activation function. Throughout these notes, we used $\textstyle f(z) = \tanh(z)$. $\textstyle z^{(l)}_i$ Total weighted sum of inputs to unit $\textstyle i$ in layer $\textstyle l$. Thus, $\textstyle a^{(l)}_i = f(z^{(l)}_i)$. $\textstyle \alpha$ Learning rate parameter $\textstyle s_l$ Number of units in layer $\textstyle l$ (not counting the bias unit). $\textstyle n_l$ Number layers in the network. Layer $\textstyle L_1$ is usually the input layer, and layer $\textstyle L_{n_l}$ the output layer. $\textstyle \lambda$ Weight decay parameter. $\textstyle \hat{x}$ For an autoencoder, its output; i.e., its reconstruction of the input $\textstyle x$. Same meaning as $\textstyle h_{W,b}(x)$. $\textstyle \rho$ Sparsity parameter, which specifies our desired level of sparsity $\textstyle \hat\rho_i$ The average activation of hidden unit $\textstyle i$ (in the sparse autoencoder). $\textstyle \beta$ Weight of the sparsity penalty term (in the sparse autoencoder objective).

Language : 中文