SC3 | 拉普拉斯矩陣 | Laplacian matrix | 圖論 | 聚類 | R代碼

Laplacian和PCA貌似是同一種性質的方法,座標系變換。html

 

最近在看SC3聚類這篇文章,SC3使用了這個工具。算法

SC3: consensus clustering of single-cell RNA-seq dataexpress

All distance matrices are then transformed using either principal component analysis (PCA) or by calculating the eigenvectors of the associated graph Laplacian (L = I – D–1/2AD–1/2, where I is the identity matrix, A is a similarity matrix (A = e–A′/max(A′)), where A′ is a distance matrix) and D is the degree matrix of A, a diagonal matrix that contains the row-sums of A on the diagonal (Dii = ΣjAij). The columns of the resulting matrices are then sorted in ascending order by their corresponding eigenvalues.dom

先看下該工具的功能:SC3 package manualide

跑一下常規代碼:函數

library(SingleCellExperiment)
library(SC3)
library(scater)

head(ann)
yan[1:3, 1:3]

# create a SingleCellExperiment object
sce <- SingleCellExperiment(
  assays = list(
    counts = as.matrix(yan),
    logcounts = log2(as.matrix(yan) + 1)
  ), 
  colData = ann
)

# define feature names in feature_symbol column
rowData(sce)$feature_symbol <- rownames(sce)
# remove features with duplicated names
sce <- sce[!duplicated(rowData(sce)$feature_symbol), ]

# define spike-ins
isSpike(sce, "ERCC") <- grepl("ERCC", rowData(sce)$feature_symbol)

plotPCA(sce, colour_by = "cell_type1")

sce <- sc3(sce, ks = 2:4, biology = TRUE)
# sc3_interactive(sce)
# sc3_export_results_xls(sce)

######################################
sce <- sc3_prepare(sce)

sce <- sc3_estimate_k(sce)

sce <- sc3_calc_dists(sce)
names(metadata(sce)$sc3$distances)

sce <- sc3_calc_transfs(sce)
names(metadata(sce)$sc3$transformations)
metadata(sce)$sc3$distances

sce <- sc3_kmeans(sce, ks = 2:4)
names(metadata(sce)$sc3$kmeans)

col_data <- colData(sce)
head(col_data[ , grep("sc3_", colnames(col_data))])
sce <- sc3_calc_consens(sce)
names(metadata(sce)$sc3$consensus)
names(metadata(sce)$sc3$consensus$`3`)

col_data <- colData(sce)
head(col_data[ , grep("sc3_", colnames(col_data))])

sce <- sc3_calc_biology(sce, ks = 2:4)

sce <- sc3_run_svm(sce, ks = 2:4)
col_data <- colData(sce)
head(col_data[ , grep("sc3_", colnames(col_data))])

  

接下來會嘗試拆一下該工具。工具

怎麼拆這個工具?idea

這種封裝的很好的R包其實比較難拆,通常的經過函數名字就能夠看到R代碼,但這裏你輸入函數名,如sc3_calc_dists,看到的只是如下的封裝好的代碼:spa

new("nonstandardGenericFunction", .Data = function (object) 
{
    standardGeneric("sc3_calc_dists")
}, generic = structure("sc3_calc_dists", package = "SC3"), package = "SC3", 
    group = list(), valueClass = character(0), signature = "object", 
    default = NULL, skeleton = (function (object) 
    stop("invalid call in method dispatch to 'sc3_calc_dists' (no default method)", 
        domain = NA))(object))  

暫時還不熟悉這種形式,因此只能經過函數名去GitHub裏面查了。插件

GitHub真的很優秀,能夠直接查文件內部代碼,能夠很快定位到sc3_calc_dists。

再配合這個目錄插件,效率提升了很多,https://www.octotree.io/?utm_source=lite&utm_medium=extension

如下是封裝前的代碼:

#' Calculate distances between the cells.
#' 
#' This function calculates distances between the cells. It
#' creates and populates the following items of the \code{sc3} slot of the \code{metadata(object)}:
#' \itemize{
#'   \item \code{distances} - contains a list of distance matrices corresponding to
#'   Euclidean, Pearson and Spearman distances.
#' }
#' 
#' @name sc3_calc_dists
#' @aliases sc3_calc_dists, sc3_calc_dists,SingleCellExperiment-method
#' 
#' @param object an object of \code{SingleCellExperiment} class
#' 
#' @return an object of \code{SingleCellExperiment} class
#' 
#' @importFrom doRNG %dorng%
#' @importFrom foreach foreach %dopar%
#' @importFrom parallel makeCluster stopCluster
#' @importFrom doParallel registerDoParallel
sc3_calc_dists.SingleCellExperiment <- function(object) {
    dataset <- get_processed_dataset(object)
    
    # check whether in the SVM regime
    if (!is.null(metadata(object)$sc3$svm_train_inds)) {
        dataset <- dataset[, metadata(object)$sc3$svm_train_inds]
    }
    
    # NULLing the variables to avoid notes in R CMD CHECK
    i <- NULL
    
    distances <- c("euclidean", "pearson", "spearman")
    
    message("Calculating distances between the cells...")
    
    if (metadata(object)$sc3$n_cores > length(distances)) {
        n_cores <- length(distances)
    } else {
        n_cores <- metadata(object)$sc3$n_cores
    }
    
    cl <- parallel::makeCluster(n_cores, outfile = "")
    doParallel::registerDoParallel(cl, cores = n_cores)
    
    # calculate distances in parallel
    dists <- foreach::foreach(i = distances) %dorng% {
        try({
            calculate_distance(dataset, i)
        })
    }
    
    # stop local cluster
    parallel::stopCluster(cl)
    
    names(dists) <- distances
    
    metadata(object)$sc3$distances <- dists
    return(object)
}

#' @rdname sc3_calc_dists
#' @aliases sc3_calc_dists
setMethod("sc3_calc_dists", signature(object = "SingleCellExperiment"), sc3_calc_dists.SingleCellExperiment)  

經過setMethod連接到一塊兒的。

順路找到了原函數:

#' Calculate a distance matrix
#'
#' Distance between the cells, i.e. columns, in the input expression matrix are
#' calculated using the Euclidean, Pearson and Spearman metrics to construct
#' distance matrices.
#'
#' @param data expression matrix
#' @param method one of the distance metrics: 'spearman', 'pearson', 'euclidean'
#' @return distance matrix
#'
#' @importFrom stats cor dist
#' 
#' @useDynLib SC3
#' @importFrom Rcpp sourceCpp
#'
calculate_distance <- function(data, method) {
    return(if (method == "spearman") {
        as.matrix(1 - cor(data, method = "spearman"))
    } else if (method == "pearson") {
        as.matrix(1 - cor(data, method = "pearson"))
    } else {
        ED2(data)
    })
}

#' Distance matrix transformation
#'
#' All distance matrices are transformed using either principal component 
#' analysis (PCA) or by calculating the 
#' eigenvectors of the graph Laplacian (Spectral). 
#' The columns of the resulting matrices are then sorted in 
#' descending order by their corresponding eigenvalues.
#'
#' @param dists distance matrix
#' @param method transformation method: either 'pca' or
#' 'laplacian'
#' @return transformed distance matrix
#'
#' @importFrom stats prcomp cmdscale
#'
transformation <- function(dists, method) {
    if (method == "pca") {
        t <- prcomp(dists, center = TRUE, scale. = TRUE)
        return(t$rotation)
    } else if (method == "laplacian") {
        L <- norm_laplacian(dists)
        l <- eigen(L)
        # sort eigenvectors by their eigenvalues
        return(l$vectors[, order(l$values)])
    }
}

#' Calculate consensus matrix
#'
#' Consensus matrix is calculated using the Cluster-based Similarity 
#' Partitioning Algorithm (CSPA). For each clustering solution a binary 
#' similarity matrix is constructed from the corresponding cell labels: 
#' if two cells belong to the same cluster, their similarity is 1, otherwise 
#' the similarity is 0. A consensus matrix is calculated by averaging all 
#' similarity matrices.
#'
#' @param clusts a matrix containing clustering solutions in columns
#' @return consensus matrix
#' 
#' @useDynLib SC3
#' @importFrom Rcpp sourceCpp
#' @export
consensus_matrix <- function(clusts) {
    res <- consmx(clusts)
    colnames(res) <- as.character(c(1:nrow(clusts)))
    rownames(res) <- as.character(c(1:nrow(clusts)))
    return(res)
}

 

  • 距離計算
  • 轉換
  • consensus

都在這裏。。。  

 

ED2是他們實驗室本身用Rcpp寫的一個計算歐氏距離的工具。

transformation輸入的是對稱的距離矩陣(行列都是樣本細胞),而後作完PCA,返回了rotation,不知道這樣作有什麼意義?

還真有用PCA來處理距離類似度矩陣的,MDS,目的就是降維,由於後面要用kmean聚類;

而後對每個降維了的矩陣用kmeans;

consensus用的是這個算法:Cluster-based Similarity Partitioning Algorithm (CSPA),作這個的意義何在?輸入是每一個細胞的多重聚類結果,而後作了一個一致性統一。

 

 

參考:

拉普拉斯矩陣(Laplacian matrix)

相關文章
相關標籤/搜索