Gradient descent is a first-order iterative optimization algorithm for finding the minimum of a function. To find a local minimum of a function using gradient descent, one takes steps proportional to the negative of the gradient (or of the approximate gradient) of the function at the current point.app
Quoted from en.wikipedia.orgthis
Now,We have a One-dimensional function
$$ f(x)=(x-1)^2-2 $$
The graph may looks like this
spa
Of course,In this Instance you also could using graph find the minimum directly,But we don't talk about it.圖片
We are going to get derivative of this function.
derivative:
$$ \nabla f(x)= 2(x-1) $$
Then let the derivative of this function equal 0.ip
$$ 0=2(x-1) $$ci
We will find that when X equal = 1,we get the minimum of this function.get
In this instance,One-dimensional function,The Gradient Descent optimization algorithm will change the X value continuously, reducing the value of function.it
We let the X start at value equal -1io
$$ x_{0}= -1 $$function
$$ \nabla f(x_{0})= 2(x_{0}-1) $$
How let the X value changing continuously, makes value of function getting closer to minimum.
Now we are going to focus on the derivative of this function
We know that when the derivative of function equal 0,will get a minimum or maximum of function
When the derivative of function > 0,value of function is going to increase continuously.
When the derivative of function < 0,value of function is going to decrease continuously.
let me give you a simple example
When our derivative of function gets closer to minimum or maximum, the value of derivative will smaller.
Then we can just let next value of X equal :
$$ x_{1}= x_{0}-\gamma*\nabla f(x_{0}) $$
$$ \gamma $$
The gamma is called learning rate ,controlling the speed of value changing, We will talk about it later.
How about multidimensional functions? That's a good question,Let's talk about it later.