單細胞數據整合方法 | Comprehensive Integration of Single-Cell Data

操做代碼:https://satijalab.org/seurat/算法

依賴的算法express

CCAapp

CANONICAL CORRELATION ANALYSIS | R DATA ANALYSIS EXAMPLES ide

MNN工具

The Mutual Nearest Neighbor Method in Functional Nonparametric Regressionui

 

Comprehensive Integration of Single-Cell Dataspa

實在是沒想到,這篇seurat的V3裏面的整合方法竟然發在了Cell主刊。rest

果真:大佬+前沿領域=無限可能orm

能夠看到bioRxiv上是November 02, 2018發佈的,而後Cell主刊June 06, 2019正式發表。ip

方法的創意應該在2017年末就有了,那時候我纔剛來作single cell。

Single-cell transcriptomics has transformed our ability to characterize cell states, but deep biological understanding requires more than a taxonomic listing of clusters.

As new methods arise to measure distinct cellular modalities, a key analytical challenge is to integrate these datasets to better understand cellular identity and function.

Here, we develop a strategy to 「anchor」 diverse datasets together, enabling us to integrate single-cell measurements not only across scRNA-seq technologies, but also across different modalities.

After demonstrating improvement over existing methods for integrating scRNA-seq data, we anchor scRNA-seq experiments with scATAC-seq to explore chromatin differences in closely related interneuron subsets and project protein expression measurements onto a bone marrow atlas to characterize lymphocyte populations.

Lastly, we harmonize in situ gene expression and scRNA-seq datasets, allowing transcriptome-wide imputation of spatial gene expression patterns.

Our work presents a strategy for the assembly of harmonized references and transfer of information across datasets.

亮點1:經過錨定的方法來整合多種數據,不一樣平臺,不一樣形態。

亮點2:同時能整合scATAC-seq數據 

亮點3:空間基因表達模式分析

 

至今爲止的單細胞重大突破:

  • immunophenotype (Stoeckius et al., 2017; Peterson et al., 2017),
  • genome sequence (Navin et al., 2011; Vitak et al., 2017),
  • lineage origins (Raj et al., 2018; Spanjaard et al., 2018; Alemany et al., 2018),
  • DNA methylation landscape (Luo et al., 2018; Kelsey et al., 2017),
  • chromatin accessibility (Cao et al., 2018; Lake et al., 2018; Preissl et al., 2018),
  • spatial positioning

 

單細胞數據整合的兩大問題:

  1. how can disparate single-cell datasets, produced across individuals, technologies, and modalities be harmonized into a single reference
  2. once a reference has been constructed, how can its data and meta-data improve the analysis of new experiments?

These questions are well suited to established fields in statistical learning.

第二個問題就相似reference assembly (Li et al., 2010) and mapping (Langmead et al., 2009) for genomic DNA sequences

 

identify shared subpopulations across datasets

  • canonical correlation analysis (CCA)
  • mutual nearest neighbors (MNNs)

 

第二種整合的問題:

  • only a subset of cell types are shared across datasets
  • significant technical variation masks shared biological signal.

 

這篇文章解決了三個問題:

  • reference assembly
  • transfer learning for transcriptomic, epigenomic, proteomic,
  • spatially resolved single-cell data

 

核心凝練

Through the identification of cell pairwise correspondences between single cells across datasets, termed ‘‘anchors,’’ we can transformdatasets into a shared space, even in the presence of extensive technical and/or biological differences.

This enables the construction of harmonized atlases at the tissue or organismal scale, as well as effective transfer of discrete or continuous data from a reference onto a query dataset.

 

一些單細胞的常識

false negatives (‘‘drop-outs’’) due to transcript abundance and protocol-specific biases

expression derived from fluorescence in situ hybridization (FISH) exhibits probe-specific noise due to sequence specificity and background binding

 

結果

Identifying Anchor Correspondences across Single-Cell Datasets

基本的假設:we assume that there are correspondences between datasets and that at least a subset of cells represent a shared biological state.

 

Constructing Integrated Atlases at the Scale of Organs and Organisms

評估不一樣工具在整合不一樣平臺和不一樣subtype數據的準確性

 

Leveraging Anchor Correspondences to Classify Cell States

開始整合case和control,cell state

 

Projecting Cellular States across Modalities

整合scATAC-seq

 

Transferring Continuous and Multimodal Data across Experiments

 

 

Predicting Protein Expression in Human Bone Marrow Cells

CITE-seq,預測蛋白表達

 

Spatial Mapping of Single-Cell Sequencing Data in the Mouse Cortex

小鼠大腦皮層的空間比對

 


 

what's my problem?

我也早就意識到這是個重要的有價值的問題了,可是孤軍奮戰,沒有真正的提煉這個問題,也沒有深刻思考和理解,更沒有想去利用統計思惟來解決這個問題。

能夠看到大佬早就看到這個有價值的問題,並且已經召集人馬來討論、思考,用統計學的方法系統的提出了本身的解決方案,也最終憑藉本身的實力和名氣把結果發表在最頂級的雜誌上了。

 

是什麼在阻撓我,讓我一直在原地打轉?

相關文章
相關標籤/搜索