# 梯度检验与高级优化

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

\begin{align} \theta := \theta - \alpha \frac{d}{d\theta}J(\theta). \end{align}

\begin{align} \frac{d}{d\theta}J(\theta) = \lim_{\epsilon \rightarrow 0} \frac{J(\theta+ \epsilon) - J(\theta-\epsilon)}{2 \epsilon}. \end{align}

\begin{align} \frac{J(\theta+{\rm EPSILON}) - J(\theta-{\rm EPSILON})}{2 \times {\rm EPSILON}} \end{align}

\begin{align} g(\theta) \approx \frac{J(\theta+{\rm EPSILON}) - J(\theta-{\rm EPSILON})}{2 \times {\rm EPSILON}}. \end{align}

\begin{align} \vec{e}_i = \begin{bmatrix}0 \\ 0 \\ \vdots \\ 1 \\ \vdots \\ 0\end{bmatrix} \end{align}

\begin{align} g_i(\theta) \approx \frac{J(\theta^{(i+)}) - J(\theta^{(i-)})}{2 \times {\rm EPSILON}}. \end{align}

\begin{align} \nabla_{W^{(l)}} J(W,b) &= \left( \frac{1}{m} \Delta W^{(l)} \right) + \lambda W^{(l)} \\ \nabla_{b^{(l)}} J(W,b) &= \frac{1}{m} \Delta b^{(l)}. \end{align}

## 中英文对照

off-by-one error 缺位错误

bias term 偏置项

numerically checking 数值检验

numerical roundoff errors 数值舍入误差

significant digits 有效数字

unrolling 组合扩展

learning rate 学习速率

Hessian matrix Hessian矩阵

Newton's method 牛顿法

step-size 步长值

## 中文译者

Language : English