Class 2 Gradient Descentorm
For \[n\times n\] matrix A, B,element
tr(AB)=trr(BA)get
tr(ABC)=tr(CAB)=tr(BCA)io
tr(A)=tr($A^T$)ast
tr():representing the trace of matrix, equal to the sum of diagonal elements of matrixform
for \[A\in R^{m*n}, f(A) \in R^1:\]im
$(\bigtriangledown)_A f(x)=[\frac{\partial f(A)}{\partial A_(ij)}]_{m*n)$dict
$ (\bigtriangledown)_A tr(ABA^TC)=CAB+C^TAB^T$di
least square formula solutiontime
$x\times \theta to predict y$
$x=[
1, x_{11}, x_{12}, x_{13},..x_{1n}
1, x_{21}, x_{22}, x_{23},..x_{2n}
...
1, x_{m1}, x_{m2}, x_{m3},..x_{mn}
]
where m is number of observations, n is number of features $
$\theta=[\theta_0, \theta_1, \theta_2, ..., \theta_n] ^T is parameters$
To get the least square, we can get the following equaltion
$x^T\times x \times \theta=x^T\times y&
$\theta=(x^T\tims x)^{-1}\tims x^T\times y$