Graph Embedding學習筆記（1）：Locally Linear Embedding (LLE)

時間 2020-01-03

標籤 graph embedding 學習筆記 locally linear lle 简体版

原文原文鏈接

論文信息

Roweis, Sam T. and Laurence K. Saul (2000). 「Nonlinear Dimensionality
Reduction by Locally Linear Embedding.」 Science, 290: 2323–2326.
doi:10.1126/science.290.5500.2323.html

we introduce locally linear embedding (LLE), an unsupervised learning algorithm that computes low-dimensional, neighborhood-preserving embeddings of high-dimensional inputs. Unlike clustering methods for local dimensionality reduction, LLE maps its inputs into a single global coordinate system of lower dimensionality, and its optimizations do not involve local minima.

筆記

LLE的本質是一種降維方法。主成分分析PCA是一種線性的降維方法，而LLE是一種非線性的降維方法。算法

近年來機器學習領域流行把降維以embedding的名義出現，具體含義是：When some object X is said to be embedded in another object Y, the embedding is given by some injective and structure-preserving map f : X → Yapp

關鍵：LLE的特性能夠理解爲neighborhood-preserving。less

LLE對流形數據保持neighborhood的效果比PCA好不少。什麼是流形數據？好比下圖這根螺旋狀的曲線。機器學習

若是用PCA對這種數據進行降維，即用第一主成分來描述這根曲線，是沒法保留數據螺旋形狀的順序（即降維後的座標從最密的中心點開始，沿着螺旋結構逐步往外擴）。下圖中的直線就是第一主成分的結果，能夠看到只捕獲到了方差最大的方向，structure-preserving的效果不好，根本緣由是線性降維沒法表達螺旋這種非線性結構：ide

那麼，有什麼方法能改進上面的結果呢？咱們取出螺旋數據的一個局部，對這個局部用PCA，咱們取出來的局部曲線曲度比較小，接近直線，這個使用PCA就能夠很好地擬合曲線：學習

LLE的核心思想就是這種截取局部線性擬合的思路。咱們看一下LLE做用後的效果：ui

再舉一個三維空間的例子：lua

看一下圖片識別的例子，橫軸和縱軸是LLE的頭兩個座標軸。對於橫軸而言，圖片人物的表情逐步從不開心變爲開心；對於縱軸而言，圖片人物臉的朝向從一側逐步變爲正面再到另一側。spa

LLE的基本流程以下圖所示：

基本公式以下：

以第三步爲例，看一下怎麼轉換爲特徵值求解問題：

下一步用朗格朗日乘子轉化爲無約束問題：

接着求導，發現是M的特徵值求解問題，由於目標是最小值，咱們取出最小的特徵值做爲結果：

R語言實現

# Local linear embedding of data vectors
# Inputs: n*p matrix of vectors, number of dimensions q to find (< p),
# number of nearest neighbors per vector, scalar regularization setting
# Calls: find.kNNs, reconstruction.weights, coords.from.weights
# Output: n*q matrix of new coordinates
lle <- function(x,q,k=q+1,alpha=0.01) {
  stopifnot(q>0, q<ncol(x), k>q, alpha>0) # sanity checks
  kNNs = find.kNNs(x,k) # should return an n*k matrix of indices
  w = reconstruction.weights(x,kNNs,alpha) # n*n weight matrix
  coords = coords.from.weights(w,q) # n*q coordinate matrix
  return(coords)
}

# Find multiple nearest neighbors in a data frame
# Inputs: n*p matrix of data vectors, number of neighbors to find,
# optional arguments to dist function
# Calls: smallest.by.rows
# Output: n*k matrix of the indices of nearest neighbors
find.kNNs <- function(x,k,...) {
  x.distances = dist(x,...) # Uses the built-in distance function
  x.distances = as.matrix(x.distances) # need to make it a matrix
  kNNs = smallest.by.rows(x.distances,k+1) # see text for +1
  return(kNNs[,-1]) # see text for -1
}

# Find the k smallest entries in each row of an array
# Inputs: n*p array, p >= k, number of smallest entries to find
# Output: n*k array of column indices for smallest entries per row
smallest.by.rows <- function(m,k) {
  stopifnot(ncol(m) >= k) # Otherwise "k smallest" is meaningless
  row.orders = t(apply(m,1,order))
  k.smallest = row.orders[,1:k]
  return(k.smallest)
}

# Least-squares weights for linear approx. of data from neighbors
# Inputs: n*p matrix of vectors, n*k matrix of neighbor indices,
# scalar regularization setting
# Calls: local.weights
# Outputs: n*n matrix of weights
reconstruction.weights <- function(x,neighbors,alpha) {
  stopifnot(is.matrix(x),is.matrix(neighbors),alpha>0)
  n=nrow(x)
  stopifnot(nrow(neighbors) == n)
  w = matrix(0,nrow=n,ncol=n)
  for (i in 1:n) {
    i.neighbors = neighbors[i,]
    w[i,i.neighbors] = local.weights(x[i,],x[i.neighbors,],alpha)
  }
  return(w)
}


# Calculate local reconstruction weights from vectors
# Inputs: focal vector (1*p matrix), k*p matrix of neighbors,
# scalar regularization setting
# Outputs: length k vector of weights, summing to 1
local.weights <- function(focal,neighbors,alpha) {
  # basic matrix-shape sanity checks
  stopifnot(nrow(focal)==1,ncol(focal)==ncol(neighbors))
  # Should really sanity-check the rest (is.numeric, etc.)
  k = nrow(neighbors)
  # Center on the focal vector
  neighbors=t(t(neighbors)-focal) # exploits recycling rule, which
  # has a weird preference for columns
  gram = neighbors %*% t(neighbors)
  # Try to solve the problem without regularization
  weights = try(solve(gram,rep(1,k)))
  # The try function tries to evaluate its argument and returns
  # the value if successful; otherwise it returns an error
  # message of class "try-error"
  if (identical(class(weights),"try-error")) {
    # Un-regularized solution failed, try to regularize
    # TODO: look at the error, check if it’s something
    # regularization could fix!
    weights = solve(gram+alpha*diag(k),rep(1,k))
  }
  # Enforce the unit-sum constraint
  weights = weights/sum(weights)
  return(weights)
}

# Get approximation weights from indices of point and neighbors
# Inputs: index of focal point, n*p matrix of vectors, n*k matrix
# of nearest neighbor indices, scalar regularization setting
# Calls: local.weights
# Output: vector of n reconstruction weights
local.weights.for.index <- function(focal,x,NNs,alpha) {
  n = nrow(x)
  stopifnot(n> 0, 0 < focal, focal <= n, nrow(NNs)==n)
  w = rep(0,n)
  neighbors = NNs[focal,]
  wts = local.weights(x[focal,],x[neighbors,],alpha)
  w[neighbors] = wts
  return(w)
}

# Local linear approximation weights, without iteration
# Inputs: n*p matrix of vectors, n*k matrix of neighbor indices,
# scalar regularization setting
# Calls: local.weights.for.index
# Outputs: n*n matrix of reconstruction weights
reconstruction.weights.2 <- function(x,neighbors,alpha) {
  # Sanity-checking should go here
  n = nrow(x)
  w = sapply(1:n,local.weights.for.index,x=x,NNs=neighbors,
             alpha=alpha)
  w = t(w) # sapply returns the transpose of the matrix we want
  return(w)
}

# Find intrinsic coordinates from local linear approximation weights
# Inputs: n*n matrix of weights, number of dimensions q, numerical
# tolerance for checking the row-sum constraint on the weights
# Output: n*q matrix of new coordinates on the manifold
coords.from.weights <- function(w,q,tol=1e-7) {
  n=nrow(w)
  stopifnot(ncol(w)==n) # Needs to be square
  # Check that the weights are normalized
  # to within tol > 0 to handle round-off error
  stopifnot(all(abs(rowSums(w)-1) < tol))
  # Make the Laplacian
  M = t(diag(n)-w)%*%(diag(n)-w)
  # diag(n) is n*n identity matrix
  soln = eigen(M) # eigenvalues and eigenvectors (here,
  # eigenfunctions), in order of decreasing eigenvalue
  coords = soln$vectors[,((n-q):(n-1))] # bottom eigenfunctions
  # except for the trivial one
  return(coords)
}