# Neural Networks

### From Ufldl

Line 11: | Line 11: | ||

This "neuron" is a computational unit that takes as input <math>x_1, x_2, x_3</math> (and a +1 intercept term), and | This "neuron" is a computational unit that takes as input <math>x_1, x_2, x_3</math> (and a +1 intercept term), and | ||

- | outputs <math>h_{W,b}(x) = f(W^Tx) = f(\sum_{i=1}^3 W_{i}x_i +b)</math>, where <math>f : \Re \mapsto \Re</math> is | + | outputs <math>\textstyle h_{W,b}(x) = f(W^Tx) = f(\sum_{i=1}^3 W_{i}x_i +b)</math>, where <math>f : \Re \mapsto \Re</math> is |

called the '''activation function'''. In these notes, we will choose | called the '''activation function'''. In these notes, we will choose | ||

<math>f(\cdot)</math> to be the sigmoid function: | <math>f(\cdot)</math> to be the sigmoid function: | ||

Line 31: | Line 31: | ||

- | + | <div align=center> | |

- | [[Image:Sigmoid_Function.png|400px| | + | [[Image:Sigmoid_Function.png|400px|top|Sigmoid activation function.]] |

- | [[Image:Tanh_Function.png|400px| | + | [[Image:Tanh_Function.png|400px|top|Tanh activation function.]] |

+ | </div> | ||

The <math>\tanh(z)</math> function is a rescaled version of the sigmoid, and its output range is | The <math>\tanh(z)</math> function is a rescaled version of the sigmoid, and its output range is | ||

<math>[-1,1]</math> instead of <math>[0,1]</math>. | <math>[-1,1]</math> instead of <math>[0,1]</math>. | ||

- | Note that unlike | + | Note that unlike some other venues (including the OpenClassroom videos, and parts of CS229), we are not using the convention |

here of <math>x_0=1</math>. Instead, the intercept term is handled separately by the parameter <math>b</math>. | here of <math>x_0=1</math>. Instead, the intercept term is handled separately by the parameter <math>b</math>. | ||

Line 52: | Line 53: | ||

A neural network is put together by hooking together many of our simple | A neural network is put together by hooking together many of our simple | ||

- | + | "neurons," so that the output of a neuron can be the input of another. For | |

example, here is a small neural network: | example, here is a small neural network: | ||

Line 58: | Line 59: | ||

In this figure, we have used circles to also denote the inputs to the network. The circles | In this figure, we have used circles to also denote the inputs to the network. The circles | ||

- | labeled | + | labeled "+1" are called '''bias units''', and correspond to the intercept term. |

The leftmost layer of the network is called the '''input layer''', and the | The leftmost layer of the network is called the '''input layer''', and the | ||

rightmost layer the '''output layer''' (which, in this example, has only one | rightmost layer the '''output layer''' (which, in this example, has only one | ||

Line 94: | Line 95: | ||

In the sequel, we also let <math>z^{(l)}_i</math> denote the total weighted sum of inputs to unit <math>i</math> in layer <math>l</math>, | In the sequel, we also let <math>z^{(l)}_i</math> denote the total weighted sum of inputs to unit <math>i</math> in layer <math>l</math>, | ||

- | including the bias term (e.g., <math>z_i^{(2)} = \sum_{j=1}^n W^{(1)}_{ij} x_j + b^{(1)}_i</math>), so that | + | including the bias term (e.g., <math>\textstyle z_i^{(2)} = \sum_{j=1}^n W^{(1)}_{ij} x_j + b^{(1)}_i</math>), so that |

<math>a^{(l)}_i = f(z^{(l)}_i)</math>. | <math>a^{(l)}_i = f(z^{(l)}_i)</math>. | ||

Line 101: | Line 102: | ||

to apply to vectors in an element-wise fashion (i.e., | to apply to vectors in an element-wise fashion (i.e., | ||

<math>f([z_1, z_2, z_3]) = [f(z_1), f(z_2), f(z_3)]</math>), then we can write | <math>f([z_1, z_2, z_3]) = [f(z_1), f(z_2), f(z_3)]</math>), then we can write | ||

- | + | the equations above more | |

compactly as: | compactly as: | ||

:<math>\begin{align} | :<math>\begin{align} | ||

Line 109: | Line 110: | ||

h_{W,b}(x) &= a^{(3)} = f(z^{(3)}) | h_{W,b}(x) &= a^{(3)} = f(z^{(3)}) | ||

\end{align}</math> | \end{align}</math> | ||

- | More generally, recalling that we also use <math>a^{(1)} = x</math> to also denote the values from the input layer, | + | We call this step '''forward propagation.''' More generally, recalling that we also use <math>a^{(1)} = x</math> to also denote the values from the input layer, |

then given layer <math>l</math>'s activations <math>a^{(l)}</math>, we can compute layer <math>l+1</math>'s activations <math>a^{(l+1)}</math> as: | then given layer <math>l</math>'s activations <math>a^{(l)}</math>, we can compute layer <math>l+1</math>'s activations <math>a^{(l+1)}</math> as: | ||

:<math>\begin{align} | :<math>\begin{align} | ||

Line 117: | Line 118: | ||

By organizing our parameters in matrices and using matrix-vector operations, we can take | By organizing our parameters in matrices and using matrix-vector operations, we can take | ||

advantage of fast linear algebra routines to quickly perform calculations in our network. | advantage of fast linear algebra routines to quickly perform calculations in our network. | ||

+ | |||

We have so far focused on one example neural network, but one can also build neural | We have so far focused on one example neural network, but one can also build neural | ||

- | networks with other | + | networks with other '''architectures''' (meaning patterns of connectivity between neurons), including ones with multiple hidden layers. |

- | The most common choice is a <math>n_l</math>-layered network | + | The most common choice is a <math>\textstyle n_l</math>-layered network |

- | where layer <math>1</math> is the input layer, layer <math>n_l</math> is the output layer, and each | + | where layer <math>\textstyle 1</math> is the input layer, layer <math>\textstyle n_l</math> is the output layer, and each |

- | layer <math>l</math> is densely connected to layer <math>l+1</math>. In this setting, to compute the | + | layer <math>\textstyle l</math> is densely connected to layer <math>\textstyle l+1</math>. In this setting, to compute the |

output of the network, we can successively compute all the activations in layer | output of the network, we can successively compute all the activations in layer | ||

- | <math>L_2</math>, then layer <math>L_3</math>, and so on, up to layer <math>L_{n_l}</math>, using | + | <math>\textstyle L_2</math>, then layer <math>\textstyle L_3</math>, and so on, up to layer <math>\textstyle L_{n_l}</math>, using the equations above that describe the forward propagation step. This is one |

- | example of a | + | example of a '''feedforward''' neural network, since the connectivity graph |

does not have any directed loops or cycles. | does not have any directed loops or cycles. | ||

+ | |||

Neural networks can also have multiple output units. For example, here is a network | Neural networks can also have multiple output units. For example, here is a network | ||

Line 139: | Line 142: | ||

patient, and the different outputs <math>y_i</math>'s might indicate presence or absence | patient, and the different outputs <math>y_i</math>'s might indicate presence or absence | ||

of different diseases.) | of different diseases.) | ||

+ | |||

+ | |||

+ | {{Sparse_Autoencoder}} | ||

+ | |||

+ | |||

+ | {{Languages|神经网络|中文}} |