基因組測序、組裝與分析總結

1. 測序前的準備

蒐集物種相關信息,好比基因組大小,雜合度,html

1.1 獲取基因組大小

基因組大小的獲取關係到對之後組裝結果的大小的正確與否判斷;基因組太大(>10Gb),超出了目前denovo組裝基因組軟件的對機器內存的要求,從客觀條件上講是沒法實現組裝的。數據庫

通常物種的基因組大小能夠從(http://www.genomesize.com/ )這個數據庫查到。若是沒有搜錄,須要考慮經過實驗(流式細胞儀)得到基因組大小。express

1.1.1 流式細胞儀估計基因組大小的例子:app

Yoshida, S., J. K. Ishida, et al. (2010). "A full-length enriched cDNA library and expressed sequence tag analysis of the parasitic weed, Striga hermonthica." BMC Plant Biol 10: 55.ide

1.1.2 基於福爾根染色估計基因組大小的描述:ui

這本書比較經典,重點推薦:Gregory, T. (2005). The evolution of the genome, Academic Press.lua

1.1.3 定量pcr估計基因組大小的例子:orm

Wilhelm, J., A. Pingoud, et al. (2003). "Real-time PCR-based method for the estimation of genome sizes." Nucleic Acids Res 31(10): e56.htm

Jeyaprakash, A. and M. A. Hoy (2009). "The nuclear genome of the phytoseiid Metaseiulus occidentalis (Acari: Phytoseiidae) is among the smallest known in arthropods." Exp Appl Acarol 47(4): 263-273.ip

1.1.4 Kmer估計基因組大小的例子:

Kim, E. B., X. Fang, et al. (2011). "Genome sequencing reveals insights into physiology and longevity of the naked mole rat." Nature 479(7372): 223-227.

1.2 雜合度估計

雜合度對基因組組裝的影響主要體如今不能合併姊妹染色體,雜合度高的區域,會把兩條姊妹染色單體都組裝出來,從而形成組裝的基因組偏大於實際的基因組大小。

通常是經過SSR在測序親本的子代中檢查SSR的多態性。雜合度若是高於0.5%,則認爲組裝有必定難度。雜合度高於1%則很難組裝出來。

雜和度估計通常經過kmer分析來作,這裏有一個例子:

http://www.nature.com/nature/journal/vaop/ncurrent/full/nature11413.html

下降雜合度能夠經過不少代近交來實現。

雜合度高,並非說組裝不出來,而是說,裝出來的序列不適用於後續的生物學分析。好比拷貝數、基因完整結構。

1.3 是否有遺傳圖譜可用

隨着測序對質量要求愈來愈高和相關技術的逐漸成熟,遺傳圖譜也快成了denovo基因組的必須組成。構建遺傳圖構建相關概念能夠參考這本書(The handbook of plant genome mapping: genetic and physical mapping )

1.4 生物學問題的調研

這一步也是很重要的

2. 測序樣品準備

肯定第一步沒問題,就意味着這個物種是能夠嘗試測序的。測序樣品對一些物種也是很大問題的,某些物種取樣自己就是一個挑戰的問題。

基因組測序用的樣品最好是來自於同一個個體,這樣能夠下降個體間的雜和對組裝的影響。大片斷對此無要求。

3. 測序策略的選擇

通常都是用不一樣梯度的插入片斷來測序,小片斷(200,500,800)和大片斷(1k, 2kb 5kb 10kb 20kb 40kb)。若是是雜合度高和重複序列較多的物種,可能要採起fosmid-by-fosmid或者fosmid pooling的策略。

不言而喻,後者花費是至關高的。

4. 基因組組裝

4.1 組裝相關綜述:

Li, Z., Y. Chen, et al. (2012). "Comparison of the two major classes of assembly algorithms: overlap-layout-consensus and de-bruijn-graph." Brief Funct Genomics 11(1): 25-37.

Treangen, T. J. and S. L. Salzberg (2012). "Repetitive DNA and next-generation sequencing: computational challenges and solutions." Nat Rev Genet 13(1): 36-46.

http://www.cbcb.umd.edu/research/assembly_primer.shtml

Schatz, M. C., J. Witkowski, et al. (2012). "Current challenges in de novo plant genome sequencing and assembly." Genome Biol 13(4): 243

Baker, M. (2012). "De novo genome assembly: what every biologist should know." Nat Methods 9(4): 333-337. (重點推薦)

Compeau, P. E., et al. (2011). "How to apply de Bruijn graphs to genome assembly." Nat Biotechnol 29(11): 987-991.

Birney, E. (2011). "Assemblies: the good, the bad, the ugly." Nat Methods 8(1): 59-60.

Schatz, M. C., et al. (2010). "Assembly of large genomes using second-generation sequencing." Genome Res 20(9): 1165-1173.

4.2 糾錯軟件:

Kelley, D. R., M. C. Schatz, et al. (2010). "Quake: quality-aware detection and correction of sequencing errors." Genome Biol 11(11): R116.

4.3 組裝軟件比較

Salzberg, S. L., A. M. Phillippy, et al. (2012). "GAGE: A critical evaluation of genome assemblies and assembly algorithms." Genome Res 22(3): 557-567.

Zhang, W., et al. (2011). "A practical comparison of de novo genome assembly software tools for next-generation sequencing technologies." PLoS One 6(3): e17915.

Narzisi, G. and B. Mishra (2011). "Comparing de novo genome assembly: the long and short of it." PLoS One 6(4): e19175.

Lin, Y., et al. (2011). "Comparative Studies of de novo Assembly Tools for Next-generation Sequencing Technologies." Bioinformatics.

Hayden, E. C. (2011). "Genome builders face the competition." Nature 471(7339): 425.

Finotello, F., et al. (2011). "Comparative analysis of algorithms for whole-genome assembly of pyrosequencing data." Brief Bioinform.

Earl, D. A., et al. (2011). "Assemblathon 1: A competitive assessment of de novo short read assembly methods." Genome Res.

4.4 組裝質量評估

Schatz, M. C., et al. (2011). "Hawkeye and AMOS: visualizing and assessing the quality of genome assemblies." Brief Bioinform.

Riba-Grognuz, O., et al. (2011). "Visualization and quality assessment of de novo genome assemblies." Bioinformatics.

我的看法:

目前大基因組的denovo組裝主流軟件仍是ALLPATH-LG SOAPdenovo

ALLPATH-LG的優勢是:組裝的連續性最好,準確性最好,可是消耗內存較大,不是太好使用

SOAPdenovo的優勢是:速度快,消耗的內存能夠接受,組裝的連續性還能夠,可是錯誤相對要多一些。

固然,上述評述並非在全部狀況下的,對不一樣物種,不一樣數據,他們的表現可能會不同。

基於Overlap-layout的方法的組裝軟件首推CABOG,這是當年用來組裝果蠅基因組的原型。另外,快要發佈的MSR-CA貌似也不錯,其整合了上述全部軟件的優勢,來勢很猛啊。

5. 基因組註釋

Yandell, M. and D. Ence (2012). "A beginner's guide to eukaryotic genome annotation." Nat Rev Genet 13(5): 329-342.

6. 基因組可視化

Nielsen, C. B., M. Cantor, et al. (2010). "Visualizing genomes: techniques and challenges." Nat Methods 7(3 Suppl): S5-S15.

7. 進化分析

Yang, Z. and B. Rannala (2012). "Molecular phylogenetics: principles and practice." Nat Rev Genet 13(5): 303-314.

8. 經典案例

Colbourne, J. K., M. E. Pfrender, et al. (2011). "The ecoresponsive genome of Daphnia pulex." Science 331(6017): 555-561.

Kim, E. B., X. Fang, et al. (2011). "Genome sequencing reveals insights into physiology and longevity of the naked mole rat." Nature 479(7372): 223-227.

Grbic, M., T. Van Leeuwen, et al. (2011). "The genome of Tetranychus urticae reveals herbivorous pest adaptations." Nature 479(7374): 487-492.

以上內容轉載自:測序中國seq.cn(http://seq.cn/4607-48597)

相關文章
相關標籤/搜索