in my impression, the gradient descent is for finding the independent variable that can get the minimum/maximum value of an objective function. So we need an obj. function: \(\mathcal{L}\)python
how to find the \(x_0\) that makes the \(f(x)\) has the minimum value, via gradient descent?express
Start with an arbitrary \(x\), calculate the value of \(f(x)\) :dom
import random def func(x): return x*x + 2*x +1 def gred(x): # the gradient of f(x) return 2*x + 2 x = random.uniform(-10.0,10.0) #randomly pick a float in interval of (-10, 10) # x = 10 print('x starts at:', x) y0 = func(x) #first cal delta = 0.5 #the value of delta_x, each iteration x = x + delta # === interation === for i in range(100): print('i=',i) y1 = func(x) delta = -0.08*gred(x) print(' delta=',delta) if y1 > y0: print(' y1>y0') # if gred(x) is positive, the x should decrease. # if gred(x) is negative, the x should increase. else: print(' y1<=y0') # if gred(x) is positive, the x should increase. # if gred(x) is negative, the x should decrease. x = x+delta y0 = y1 print(' x=', x, 'f(x)=', y1)
Let's disscuss how to determin the some_value
in the psudo code above.ide
if \(y_1-y_0\) has a large positive difference, i.e. \(y1 >> y0\), the x should shift backward heavily. so the some_value
can be a ratio of \((y_1-y_0)\times(-gradient)\) , Let's say, some_value
: \(\lambda = r \times\) gred(x) , here, \(r=0.08\) is the step-size.spa
The basic gradient descent has many shortcomings which can be found by search the 'shortcoming of gd'.code
Another problem of GD algorithm is , What if the \(\mathcal{L}\) does not have explicit expression of its gradient?orm
Stochastic Gradient Descent(SGD) is another GD algorithm.ci