>pca <- read.csv("D:/pca.csv")
>pca
x1 x2 x3 x4
1 40 2.0 5 20
2 10 1.5 5 30
3 120 3.0 13 50
4 250 4.5 18 0
5 120 3.5 9 50
6 10 1.5 12 50
7 40 1.0 19 40
8 270 4.0 13 60
9 280 3.5 11 60
10 170 3.0 9 60
11 180 3.5 14 40
12 130 2.0 30 50
13 220 1.5 17 20
14 160 1.5 35 60
15 220 2.5 14 30
16 140 2.0 20 20
17 220 2.0 14 10
18 40 1.0 10 0
19 20 1.0 12 60
20 120 2.0 20 0
> P=scale(pca)#將原始數據標準化後,創建矩陣P
> P
[,1] [,2] [,3] [,4]
[1,] -1.10251269 -0.3081296 -1.3477550 -0.7084466
[2,] -1.44001658 -0.7821750 -1.3477550 -0.2513843
[3,] -0.20250233 0.6399614 -0.2695510 0.6627404
[4,] 1.26001451 2.0620978 0.4043265 -1.6225713
[5,] -0.20250233 1.1140068 -0.8086530 0.6627404
[6,] -1.44001658 -0.7821750 -0.4043265 0.6627404
[7,] -1.10251269 -1.2562205 0.5391020 0.2056781
[8,] 1.48501710 1.5880523 -0.2695510 1.1198028
[9,] 1.59751839 1.1140068 -0.5391020 1.1198028
[10,] 0.36000414 0.6399614 -0.8086530 1.1198028
[11,] 0.47250544 1.1140068 -0.1347755 0.2056781
[12,] -0.09000104 -0.3081296 2.0216325 0.6627404
[13,] 0.92251062 -0.7821750 0.2695510 -0.7084466
[14,] 0.24750285 -0.7821750 2.6955100 1.1198028
[15,] 0.92251062 0.1659159 -0.1347755 -0.2513843
[16,] 0.02250026 -0.3081296 0.6738775 -0.7084466
[17,] 0.92251062 -0.3081296 -0.1347755 -1.1655090
[18,] -1.10251269 -1.2562205 -0.6738775 -1.6225713
[19,] -1.32751528 -1.2562205 -0.4043265 1.1198028
[20,] -0.20250233 -0.3081296 0.6738775 -1.6225713
> eigen(cov(P)) #求矩陣P的協方差矩陣的特徵值和特徵向量,向量矩陣($vectors)中的第一例(0.69996363....)即爲第一個特徵值(1.7182516)的特徵向量,以此類推。
$values
[1] 1.7182516 1.0935358 0.9813470 0.2068656web
$vectors
[,1] [,2] [,3] [,4]
[1,] 0.69996363 0.09501037 -0.24004879 0.6658833
[2,] 0.68979810 -0.28364662 0.05846333 -0.6635550
[3,] 0.08793923 0.90415870 -0.27031356 -0.3188955
[4,] 0.16277651 0.30498307 0.93053167 0.1208302
spa
特徵值分解能夠獲得特徵值與特徵向量,特徵值表示的是這個特徵到底有多重要,而特徵向量表示這個特徵是什麼;奇異值σ跟特徵值相似,在矩陣Σ中也是從大到小排列,並且σ的減小特別的快,在不少狀況下,前10%甚至1%的奇異值的和就佔了所有的奇異值之和的99%以上了。.net
> > svd(cov(P))$d #奇異值分解實現,應用的矩陣一樣爲原始數據的標準化後的協方差矩陣(方陣) [1] 1.7182516 1.0935358 0.9813470 0.2068656 $u [,1] [,2] [,3] [,4] [1,] -0.69996363 0.09501037 -0.24004879 -0.6658833 [2,] -0.68979810 -0.28364662 0.05846333 0.6635550 [3,] -0.08793923 0.90415870 -0.27031356 0.3188955 [4,] -0.16277651 0.30498307 0.93053167 -0.1208302 $v [,1] [,2] [,3] [,4] [1,] -0.69996363 0.09501037 -0.24004879 -0.6658833 [2,] -0.68979810 -0.28364662 0.05846333 0.6635550 [3,] -0.08793923 0.90415870 -0.27031356 0.3188955 [4,] -0.16277651 0.30498307 0.93053167 -0.1208302
結果顯示和特徵值分解的結果徹底相同,即奇異值=特徵值;左奇異向量與右奇異向量相等,這點和理論一致:code
http://blog.csdn.net/wangzhiqing3/article/details/7446444 orm
2. 奇異值分解blog
上面討論了方陣的分解,可是在LSA中,咱們是要對Term-Document矩陣進行分解,很顯然這個矩陣不是方陣。這時須要奇異值分解對Term-Document進行分解。奇異值分解的推理使用到了上面所講的方陣的分解。索引
假設C是M x N矩陣,U是M x M矩陣,其中U的列爲CCT的正交特徵向量,V爲N x N矩陣,其中V的列爲CTC的正交特徵向量,再假設r爲C矩陣的秩,則存在奇異值分解:ip
S奇異值分解是一個能適用於任意的矩陣的一種分解的方法,VD處理普通矩陣mxn,待續......ci
> svd(P)$d #奇異值分解實現,應用的矩陣爲原始數據的標準化後矩陣(20X4) [1] 5.713736 4.558199 4.318054 1.982535 $u [,1] [,2] [,3] [,4] [1,] 0.213188874 -0.31854661 0.01117934 -0.09356334 [2,] 0.298743593 -0.26550125 -0.09966088 -0.02040249 [3,] -0.067184471 -0.05316902 -0.17961537 -0.19846043 [4,] -0.363306085 -0.13041863 0.41709916 -0.43090590 [5,] -0.116116981 -0.18960342 -0.21978181 -0.27040771 [6,] 0.258181271 -0.01720108 -0.23759348 -0.11644174 [7,] 0.272565880 0.17588846 -0.05485743 -0.02402906 [8,] -0.401395312 -0.04641079 -0.19713543 0.07886498 [9,] -0.353798967 -0.06803484 -0.20133714 0.31867233 [10,] -0.140818406 -0.11779839 -0.28058876 0.10504376 [11,] -0.196159553 -0.07244560 -0.04157562 -0.17994125 [12,] -0.001770250 0.46264984 -0.01709488 -0.21189013 [13,] -0.002549413 0.07396825 0.23141706 0.48510618 [14,] -0.009279184 0.66343445 -0.04822489 -0.02040610 [15,] -0.123807056 -0.03464961 0.09477346 0.26067355 [16,] 0.044254060 0.10591148 0.20027671 -0.04088441 [17,] -0.040535151 -0.06631368 0.29818366 0.36362322 [18,] 0.343318979 -0.18704239 0.26319302 0.05965465 [19,] 0.288607918 0.04522412 -0.32341707 0.10786442 [20,] 0.097860256 0.04005869 0.38476035 -0.17217051 $v [,1] [,2] [,3] [,4] [1,] -0.69996363 0.09501037 0.24004879 0.6658833 [2,] -0.68979810 -0.28364662 -0.05846333 -0.6635550 [3,] -0.08793923 0.90415870 0.27031356 -0.3188955 [4,] -0.16277651 0.30498307 -0.93053167 0.1208302 奇異值與潛在語義索引LSI Book <- read.csv("D:/Book.csv") Book K=as.matrix(data.frame(Book)) svd(K) rownames(kk)=Book$X kk rownames(v)=paste('T',1:9,sep='') plot(rnorm,xlim=c(-0.8,0),ylim=c(-0.8,0.6),lty=0) points(v[,3],v[,2],col='red') points(kk[,3],kk[,2],col='blue') text(kk[,3],kk[,2],Book$X) text(v[,3],v[,2],paste('T',1:9,sep=''))
結果顯示右奇異矩陣爲以前原始數據的標準化後的協方差矩陣的特徵向量矩陣get
svd便可以實現對列的壓縮(變量),也能夠實現對行的壓縮(case)