操做代碼:https://satijalab.org/seurat/算法
依賴的算法express
CCAapp
CANONICAL CORRELATION ANALYSIS | R DATA ANALYSIS EXAMPLES ide
MNN工具
The Mutual Nearest Neighbor Method in Functional Nonparametric Regressionui
Comprehensive Integration of Single-Cell Dataspa
實在是沒想到,這篇seurat的V3裏面的整合方法竟然發在了Cell主刊。rest
果真:大佬+前沿領域=無限可能orm
能夠看到bioRxiv上是November 02, 2018發佈的,而後Cell主刊June 06, 2019正式發表。ip
方法的創意應該在2017年末就有了,那時候我纔剛來作single cell。
Single-cell transcriptomics has transformed our ability to characterize cell states, but deep biological understanding requires more than a taxonomic listing of clusters.
As new methods arise to measure distinct cellular modalities, a key analytical challenge is to integrate these datasets to better understand cellular identity and function.
Here, we develop a strategy to 「anchor」 diverse datasets together, enabling us to integrate single-cell measurements not only across scRNA-seq technologies, but also across different modalities.
After demonstrating improvement over existing methods for integrating scRNA-seq data, we anchor scRNA-seq experiments with scATAC-seq to explore chromatin differences in closely related interneuron subsets and project protein expression measurements onto a bone marrow atlas to characterize lymphocyte populations.
Lastly, we harmonize in situ gene expression and scRNA-seq datasets, allowing transcriptome-wide imputation of spatial gene expression patterns.
Our work presents a strategy for the assembly of harmonized references and transfer of information across datasets.
亮點1:經過錨定的方法來整合多種數據,不一樣平臺,不一樣形態。
亮點2:同時能整合scATAC-seq數據
亮點3:空間基因表達模式分析
至今爲止的單細胞重大突破:
單細胞數據整合的兩大問題:
These questions are well suited to established fields in statistical learning.
第二個問題就相似reference assembly (Li et al., 2010) and mapping (Langmead et al., 2009) for genomic DNA sequences
identify shared subpopulations across datasets
第二種整合的問題:
這篇文章解決了三個問題:
核心凝練
Through the identification of cell pairwise correspondences between single cells across datasets, termed ‘‘anchors,’’ we can transformdatasets into a shared space, even in the presence of extensive technical and/or biological differences.
This enables the construction of harmonized atlases at the tissue or organismal scale, as well as effective transfer of discrete or continuous data from a reference onto a query dataset.
一些單細胞的常識
false negatives (‘‘drop-outs’’) due to transcript abundance and protocol-specific biases
expression derived from fluorescence in situ hybridization (FISH) exhibits probe-specific noise due to sequence specificity and background binding
基本的假設:we assume that there are correspondences between datasets and that at least a subset of cells represent a shared biological state.
評估不一樣工具在整合不一樣平臺和不一樣subtype數據的準確性
開始整合case和control,cell state
整合scATAC-seq
CITE-seq,預測蛋白表達
小鼠大腦皮層的空間比對
what's my problem?
我也早就意識到這是個重要的有價值的問題了,可是孤軍奮戰,沒有真正的提煉這個問題,也沒有深刻思考和理解,更沒有想去利用統計思惟來解決這個問題。
能夠看到大佬早就看到這個有價值的問題,並且已經召集人馬來討論、思考,用統計學的方法系統的提出了本身的解決方案,也最終憑藉本身的實力和名氣把結果發表在最頂級的雜誌上了。
是什麼在阻撓我,讓我一直在原地打轉?