轉載 :http://blog.sina.com.cn/s/blog_83f77c940102xuro.htmlhtml
Kalatskaya I, Trinh Q M, Spears M, et al. ISOWN: accurate somatic mutation identification in the absence of normal tissue controls[J]. Genome Medicine, 2017, 9(1):59.
變異檢測能夠分爲三類:
single nucleotide variant (SNV), insertion and deletion (indel), and structural variant (SV, including copy number variation, duplication, translocation, etc.)典型的SNV和小的indel通常小等於10bp
比對:
Illumina(BWA)、TMAP (for Ion Torrent reads) for DNA reads
splice-aware aligners such as TopHat and STAR for RNA sequencing
關於在比對以前是否要作數據質控的問題,在這裏作數據質控也只是去掉接頭序列。由於不少變異檢測的軟件都是基於位點的檢測策略,因此整條reads的質量狀況不是那麼重要,另外局部重比對也就是BQSR (base quality score recalibration)。基於PCR擴增的數據不須要在數據比對後去除PCR冗餘。
tumor-normal變異檢測模式
基於啓發式算法的編譯檢測算法有VarScan2, qSNP, Shimmer, RADIA, SOAPsnv, and VarDict
加入genotype analysis的分析軟件有SomaticSniper, FaSD- somatic, SAMtools, JointSNVMix2, Virmid, SNVSniffer, Seurat, and CaVEMan,這些軟件通常使用在低覆蓋的數據分析中(WGS, WES, or targeted sequencing with low depth),可是對低頻突變不敏感
基於Haplotype-based strategy檢測策略不須要局部重比對,由於該變異檢測方法是基於reads組裝後的結果進行編譯檢測的,這樣的軟件有Platypus, HapMuC, LocHap, FreeBayes, and MuTect2 。
基於機器學習方法的軟件MutationSeq, SomaticSeq, SNooPer, and BAYSIC
若是是高測序覆蓋深度的低頻突變建議使用Strelka, MuTect, LoFreq, EBCall, deepSNV, LoLoPicker, and MuSE,啓發式算法的軟件對於發現低頻突變也有較好的效果(1% variant calling with VarDict) and (< 5% variant calling with VarScan2)
Single-sample 變異檢測模式
SNVMix2, Shearwater, SPLINTER, SNVer, OutLyzer, and Pisces這些軟件均可以進行單樣本變異檢測可是不能區分somatic and germline
ISOWN, SomVarIUS, and SiNVICT能夠提供單樣本的變異檢測可是同時也可同時區分somatic and germline,ISOWN軟件是依賴於MuTect2,隨後依賴somatic (COSMIC) and germline mutations (ExAC and dbSNP)來作進一步區分,OutLyzer, Pisces, ISOWN, SomVarIUS, SiNVICT已經被應用到靶向測序的應用
UMI-based variant calling
通常低頻突變定義爲((VAF ≤5%) )目標就是排出測序錯誤Illumina(0.01–0.1 ),目前給予UMI分析的軟件有三款: DeepSNVMiner, MAGERI, and smCounter
其中 DeepSNVMiner, MAGERI, and smCounter輸入都是原始數據而只有smCounter的輸入是BAM格式,基於PGM平臺已經有了一個處理UMI的插件TVC。此外還有一個開源的軟件Fgbio。Illumina建議DNA輸入量30ng 測序層數40000X 中值覆蓋度可達到~2500X 敏感性變異檢測爲:0.4%
RNA-seq variant calling
基於RNA數據變異檢測的軟件有RADIA, Seurat, VarDict, VarScan2, SNPiR, and eSNVdetect,可是RADIA and Seurat 須要整合RADIA and Seurat 的DNA數據
2014年Genome in a Bottle Consortium簡稱(GIAB)經過整合多種測序科技和比對分析軟件公佈了NA12878 cell line 細胞系高質量可信的變異檢測結果
Zook J M, Chapman B, Wang J, et al. Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls[J]. Nature Biotechnology, 2014, 32(3):246-51.