miRDeep2的文件夾下面有自帶的tutorial,參考經過參考這個例子學習miRDeep2.html
tutorial_dir文件夾裏有下面幾個文件,.fa爲fasta格式。linux
cel_cluster.fa: # 研究物種的基因組文件 express
mature_ref_this_species.fa: # 研究物種的成熟miRNA文件,miRBase有下載瀏覽器
mature_ref_other_species.fa: # 其餘物種相關的成熟miRNA文件,miRBase有下載bash
precursors_ref_this_species.fa: # 研究物種miRNA前體的文件,miRBase有下載app
reads.fa: # deep sequencing readside
~~~~~~~~~~第一步~~~~~~~~~學習
# 利用bowtie-build創建基因組文件的index測試
bowtie-build cel_cluster.fa cel_cluster # cel_cluster.fa是基因組文件,cel_cluster是index文件的ui
前綴,這個前綴能夠是任意的
# 字符,不必定要和基因組文件相同。
~~~~~~~~~~第二步~~~~~~~~~
# 處理reads文件而且把它map到基因上
perl mapper.pl reads.fa -c -j -k TCGTATGCCGTCTTCTGCTTGT -l 18 -m -p cel_cluster -s
reads_collapsed.fa -t reads_collapsed_vs_genome.arf -v
參數講解
-c 指出輸入文件是fasta格式,同類的參數還有-a(seq.txt format),-b(qseq.txt format),-e(fastq format),-d
(contig file)
-j 刪除不規範的字母(不規範的字母是指除a,c,g,t,u,n,A,C,G,T,U,N以外的字母)
-k 剪切接頭,後跟接頭序列,例子中的TCGTATGCCGTCTTCTGCTTGT就是接頭
-l 忽視小於某長度的序列,例子中忽視18nt長度的reads
-m collapses the reads
-p 將處理過的reads map到以前創建過索引的基因組上,例子中的cel_cluster
-s 指出將處理過的reads輸出到某個文件,例子中將處理過的reads輸出到reads_collapsed.fa
-t 指出將mapping的結果輸出到某個文件,例子中將mapping後的結果輸出到reads_collapsed_vs_genome.arf文件中
-v 在屏幕上顯示處理的動做,加v和不加v的區別見附註1,明顯看出來加v後屏幕不只顯示了一個處理後的summary,而
且顯示了mapper的動做,如discarding,clipping,collapsing,trimming。不加v屏幕上只顯示一個summary
例子中未使用的參數
處理/mapping參數
-g 給reads一個前綴,默認是seq。-s和-t兩個輸出文件中reads前面會多出seq三個字母。
-h parse to fasta format
-i 轉換rna成dna(再map到基因組)convert rna to dna alphabet (to map against genome)
-q 種子序列中一個錯配(mapping的時間會變長??)map with one mismatch in the seed (mapping takes
longer)
-r 容許在基因組上map到的最多的位置數,默認是5。也就是說最多map 5個位置
-u 不移除臨時文件的路徑
-n 覆蓋已有文件
~~~~~~~~~~第三步~~~~~~~~~
# fast quantitation of reads mapping to known miRBase precursors.
(This step is not required for
identification of known and novel miRNAs in the deep sequencing data when using miRDeep2.pl.)
快速定量reads mapping到已知的miRNA前體。利用miRDeep.pl在deep sequencing數據中鑑定已知和未知的miRNA,這
一步不是必須的。
quantifier.pl -p precursors_ref_this_species.fa -m mature_ref_this_species.fa -r reads_collapsed.fa
-t cel -y 16_19
參數講解
-p miRNA前體文件,miRBase能夠下載
-m 成熟miRNA序列文件,miRBase能夠下載
-r reads文件
-t 物種,能夠指定某個物種,這樣分析的時候只考慮某個物種的數據。也能夠不指定,分析全部的
-y [time] optional otherwise its generating a new one
屏幕上顯示的結果
getting samples and corresponding read numbers
seq 374333 reads
Converting input files
building bowtie index
mapping mature sequences against index
# reads processed: 174
# reads with at least one reported alignment: 6 (3.45%)
# reads that failed to align: 168 (96.55%)
Reported 6 alignments to 1 output stream(s)
mapping read sequences against index
# reads processed: 1505
# reads with at least one reported alignment: 1088 (72.29%)
# reads that failed to align: 417 (27.71%)
Reported 1099 alignments to 1 output stream(s)
analyzing data
6 mature mappings to precursors
Expressed miRNAs are written to expression_analyses/expression_analyses_16_19/miRNA_expressed.csv
not expressed miRNAs are written to
expression_analyses/expression_analyses_16_19/miRNA_not_expressed.csv
Creating miRBase.mrd file
after READS READ IN thing
make_html2.pl -q expression_analyses/expression_analyses_16_19/miRBase.mrd -k
mature_ref_this_species.fa -z -t C.elegans -y 16_19 -o -i
expression_analyses/expression_analyses_16_19/mature_ref_this_species_mapped.arf -l -m cel
miRNAs_expressed_all_samples_16_19.csv
miRNAs_expressed_all_samples_16_19.csv file with miRNA expression values
parsing miRBase.mrd file finished
creating PDF files
creating pdf for cel-mir-39 finished
creating pdf for cel-mir-40 finished
creating pdf for cel-mir-37 finished
creating pdf for cel-mir-36 finished
creating pdf for cel-mir-38 finished
creating pdf for cel-mir-41 finished
#
獲得幾個文件,expression_16_19.html,expression_analyses文件夾(裏面有不少文件),
iRNAs_expressed_all_samples_16_19.csv
,pdfs_16_19文件夾
~~~~~~~~~~第四步~~~~~~~~~
#在deep sequencing data中鑑定已知和未知的miRNA
miRDeep2.pl reads_collapsed.fa cel_cluster.fa reads_collapsed_vs_genome.arf
mature_ref_this_species.fa mature_ref_other_species.fa precursors_ref_this_species.fa -t C.elegans
2> report.log
# reads_collapsed.fa是通過mapper.pl處理的reads。
# cel_cluster.fa是基因組文件
# reads_collapsed_vs_genome.arf mapping的結果
# mature_ref_this_species.fa研究物種的成熟miRNA文件,miRBase有下載
# mature_ref_other_species.fa其餘物種相關的成熟miRNA文件,miRBase有下載
# precursors_ref_this_species.fa研究物種miRNA前體的文件,miRBase有下載
# 若是你只有reads,arf文件,genome文件,其餘文件沒有,須要這樣表示miRNAs_ref/none miRNAs_other/none
precursors/none,本物種的成熟miRNA無,其餘相關物種也無,更沒有前體。
參數說明
-t 物種
2> repot.log表示將全部的步驟輸出到report.log文件中
# 屏幕顯示
#####################################
# #
# miRDeep2 #
# #
# last change: 07/07/2011 #
# #
#####################################
miRDeep2 started at 19:44:43
#Starting miRDeep2
#testing input files
#Quantitation of known miRNAs in data
#parsing genome mappings
#excising precursors
#preparing signature
#folding precursors
#computing randfold p-values
#running miRDeep core algorithm
#running permuted controls
#doing survey of accuracy
#producing graphic results
miRDeep runtime:
started: 19:44:43
ended: 19:46:15
total:0h:1m:32s
~~~~~~~~~~第五步~~~~~~~~~
# 瀏覽結果
用瀏覽器打開.html文件
注意,cel-miR-37預測了兩次。由於這個位點的兩個潛在的前體能夠摺疊成髮卡結構。然而,註釋的髮卡結構得分遠遠
高於未註釋的髮卡結構(miRDeep2 score 6.1e+4 vs. -0.2)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~附註1~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
######加v###屏幕上輸出的結果以下####
discarding sequences with non-canonical letters
clipping 3' adapters
discarding short reads
collapsing reads
mapping reads to genome index
# reads processed: 1609
# reads with at least one reported alignment: 470 (29.21%)
# reads that failed to align: 1139 (70.79%)
Reported 480 alignments to 1 output stream(s)
trimming unmapped nts in the 3' ends
######不加v###屏幕上輸出的結果以下####
# reads processed: 1609
# reads with at least one reported alignment: 470 (29.21%)
# reads that failed to align: 1139 (70.79%)
Reported 480 alignments to 1 output stream(s)
~~~~~~~~~~~~~~附註1~~~~~~~~~~~~~~~~~~
原文地址:http://blog.sina.com.cn/s/blog_7cffd1400101m3i3.html http://blog.sina.com.cn/s/blog_7cffd1400100twvb.html