# Sparse Autoencoder Notation Summary

Here is a summary of the symbols used in our derivation of the sparse autoencoder:

Symbol Meaning
$\textstyle x$ Input features for a training example, $\textstyle x \in \Re^{n}$.
$\textstyle y$ Output/target values. Here, $\textstyle y$ can be vector valued. In the case of an autoencoder, $\textstyle y=x$.
$\textstyle (x^{(i)}, y^{(i)})$ The $\textstyle i$-th training example
$\textstyle h_{W,b}(x)$ Output of our hypothesis on input $\textstyle x$, using parameters $\textstyle W,b$. This should be a vector of

the same dimension as the target value $\textstyle y$.

$\textstyle W^{(l)}_{ij}$ The parameter associated with the connection between unit $\textstyle j$ in layer $\textstyle l$, and

unit $\textstyle i$ in layer $\textstyle l+1$.

$\textstyle b^{(l)}_{i}$ The bias term associated with unit $\textstyle i$ in layer $\textstyle l+1$. Can also be thought of as the parameter associated with the connection between the bias unit in layer $\textstyle l$ and unit $\textstyle i$ in layer $\textstyle l+1$.
$\textstyle \theta$ Our parameter vector. It is useful to think of this as the result of taking the parameters $\textstyle W,b$ and unrolling them into a long column vector.
$\textstyle a^{(l)}_i$ Activation (output) of unit $\textstyle i$ in layer $\textstyle l$ of the network.

In addition, since layer $\textstyle L_1$ is the input layer, we also have $\textstyle a^{(1)}_i = x_i$.

$\textstyle f(\cdot)$ The activation function. Throughout these notes, we used $\textstyle f(z) = \tanh(z)$.
$\textstyle z^{(l)}_i$ Total weighted sum of inputs to unit $\textstyle i$ in layer $\textstyle l$. Thus, $\textstyle a^{(l)}_i = f(z^{(l)}_i)$.
$\textstyle \alpha$ Learning rate parameter
$\textstyle s_l$ Number of units in layer $\textstyle l$ (not counting the bias unit).
$\textstyle n_l$ Number layers in the network. Layer $\textstyle L_1$ is usually the input layer, and layer $\textstyle L_{n_l}$ the output layer.
$\textstyle \lambda$ Weight decay parameter.
$\textstyle \hat{x}$ For an autoencoder, its output; i.e., its reconstruction of the input $\textstyle x$. Same meaning as $\textstyle h_{W,b}(x)$.
$\textstyle \rho$ Sparsity parameter, which specifies our desired level of sparsity
$\textstyle \hat\rho_i$ The average activation of hidden unit $\textstyle i$ (in the sparse autoencoder).
$\textstyle \beta$ Weight of the sparsity penalty term (in the sparse autoencoder objective).

Language : 中文