# Neural Networks

 Consider a supervised learning problem where we have access to labeled training examples $(x^{(i)}, y^{(i)})$. Neural networks give a way of defining a complex, non-linear form of hypotheses $h_{W,b}(x)$, with parameters $W,b$ that we can fit to our data.

To describe neural networks, we will begin by describing the simplest possible neural network, one which comprises a single "neuron."  We will use the following diagram to denote a single neuron:

This "neuron" is a computational unit that takes as input $x_1, x_2, x_3$ (and a +1 intercept term), and outputs $h_{W,b}(x) = f(W^Tx) = f(\sum_{i=1}^3 W_{i}x_i +b)$, where $f : \Re \mapsto \Re$ is called the '''activation function'''.  In these notes, we will choose