單細胞數據整合方法 | Comprehensive Integration of Single-Cell Data

時間 2019-11-05

標籤單細胞數據整合方法 comprehensive integration single cell data 简体版

原文原文鏈接

操做代碼：https://satijalab.org/seurat/算法

依賴的算法express

CCAapp

CANONICAL CORRELATION ANALYSIS | R DATA ANALYSIS EXAMPLES ide

MNN工具

The Mutual Nearest Neighbor Method in Functional Nonparametric Regressionui

Comprehensive Integration of Single-Cell Dataspa

實在是沒想到，這篇seurat的V3裏面的整合方法竟然發在了Cell主刊。rest

果真：大佬+前沿領域=無限可能orm

能夠看到bioRxiv上是November 02, 2018發佈的，而後Cell主刊June 06, 2019正式發表。ip

方法的創意應該在2017年末就有了，那時候我纔剛來作single cell。

Single-cell transcriptomics has transformed our ability to characterize cell states, but deep biological understanding requires more than a taxonomic listing of clusters.

As new methods arise to measure distinct cellular modalities, a key analytical challenge is to integrate these datasets to better understand cellular identity and function.

Here, we develop a strategy to 「anchor」 diverse datasets together, enabling us to integrate single-cell measurements not only across scRNA-seq technologies, but also across different modalities.

After demonstrating improvement over existing methods for integrating scRNA-seq data, we anchor scRNA-seq experiments with scATAC-seq to explore chromatin differences in closely related interneuron subsets and project protein expression measurements onto a bone marrow atlas to characterize lymphocyte populations.

Lastly, we harmonize in situ gene expression and scRNA-seq datasets, allowing transcriptome-wide imputation of spatial gene expression patterns.

Our work presents a strategy for the assembly of harmonized references and transfer of information across datasets.

亮點1：經過錨定的方法來整合多種數據，不一樣平臺，不一樣形態。

亮點2：同時能整合scATAC-seq數據

亮點3：空間基因表達模式分析

至今爲止的單細胞重大突破：

immunophenotype (Stoeckius et al., 2017; Peterson et al., 2017),
genome sequence (Navin et al., 2011; Vitak et al., 2017),
lineage origins (Raj et al., 2018; Spanjaard et al., 2018; Alemany et al., 2018),
DNA methylation landscape (Luo et al., 2018; Kelsey et al., 2017),
chromatin accessibility (Cao et al., 2018; Lake et al., 2018; Preissl et al., 2018),
spatial positioning

單細胞數據整合的兩大問題：

how can disparate single-cell datasets, produced across individuals, technologies, and modalities be harmonized into a single reference
once a reference has been constructed, how can its data and meta-data improve the analysis of new experiments?

These questions are well suited to established fields in statistical learning.

第二個問題就相似reference assembly (Li et al., 2010) and mapping (Langmead et al., 2009) for genomic DNA sequences

identify shared subpopulations across datasets

canonical correlation analysis (CCA)
mutual nearest neighbors (MNNs)

第二種整合的問題：

only a subset of cell types are shared across datasets
significant technical variation masks shared biological signal.

這篇文章解決了三個問題：

reference assembly
transfer learning for transcriptomic, epigenomic, proteomic,
spatially resolved single-cell data

核心凝練

Through the identification of cell pairwise correspondences between single cells across datasets, termed ‘‘anchors,’’ we can transformdatasets into a shared space, even in the presence of extensive technical and/or biological differences.

This enables the construction of harmonized atlases at the tissue or organismal scale, as well as effective transfer of discrete or continuous data from a reference onto a query dataset.

一些單細胞的常識

false negatives (‘‘drop-outs’’) due to transcript abundance and protocol-specific biases

expression derived from fluorescence in situ hybridization (FISH) exhibits probe-specific noise due to sequence specificity and background binding