Neural Networks

From Ufldl

Revision as of 05:29, 26 February 2011 by Ang (Talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Consider a supervised learning problem where we have access to labeled training examples (x(i),y(i)). Neural networks give a way of defining a complex, non-linear form of hypotheses hW,b(x), with parameters W,b that we can fit to our data.

To describe neural networks, we will begin by describing the simplest possible neural network, one which comprises a single "neuron." We will use the following diagram to denote a single neuron:


This `neuron' is a computational unit that takes as input x1,x2,x3 (and a +1 intercept term), and outputs h_{W,b}(x) = f(W^Tx) = f(\sum_{i=1}^3 W_{i}x_i +b), where f : \Re \mapsto \Re is called the activation function. In these notes, we will choose f(\cdot) to be the sigmoid function:

f(z) = \frac{1}{1+\exp(-z)}.

Thus, our single neuron corresponds exactly to the input-output mapping defined by logistic regression.

Although these notes will use the sigmoid function, it is worth noting that another common choice for f is the hyperbolic tangent, or tanh, function:

f(z) = \tanh(z) = \frac{e^z - e^{-z}}{e^z + e^{-z}},

Here are plots of the sigmoid and tanh functions:

Personal tools