miRDeep2 學習及安裝篇

時間 2019-11-11

標籤 mirdeep2 mirdeep 學習安裝简体版

原文原文鏈接

1、mirDeep2安裝

下載和解壓

wget http://mdc.helmholtz.de/38350089/en/research/research_teams/systems_biology_of_gene_regulatory_elements/projects/miRDeep/mirdeep2_0_0_5.zip

unzip mirdeep2_0_0_5.zip

若是用mirDeep2自帶的install.pl安裝會遇到下載的文件不存在的狀況，好比bowtie

那麼你須要本身安裝幾個軟件。解壓後的路徑下面有個README裏面詳細介紹瞭如何自行安裝mirdeep2。不過有些細節須要修改。

首先，下載幾個必須的package，下載到/home/disk6/src路徑下，解壓也都在這個路徑下完成

（ps：全部附帶安裝軟件的網址，參照下載好的mirdeep2目錄下的README）

bowtie #version 0.12.7

ViennaRNA-1.8.5.tar.gz

squid-1.9g.tar.gz

randfold-2.0.tar.gz

PDF-API2-0.73.tar.gz

perl #個人版本是 5.10.1

~~~~~~~~~~安裝bowtie

unzip bowtie-0.12.7-linux-x86_64.zip

解壓後就是可執行的二進制文件，不須要編譯，省心啊

把bowtie加入環境變量

~~~~~~~~~安裝ViennaRNA

tar -zxf ViennaRNA-1.8.5.tar.gz

cd ViennaRNA-1.8.5

./configure --prefix=/home/disk6/tools/ViennaRNA #/home/disk6/tools/是我安裝軟件的路徑，我把經常使用的軟件都安裝到這裏，或者創建ln -s到tools下面相應的目錄，而後一個個放到path中

make

make install

~~~~~~~~~安裝squid-1.9g.tar.gz和randfold-2.0.tar.gz

tar -zxf squid-1.9g.tar.gz

cd squid-1.9g

./configure --prefix=/home/disk6/tools/squid #只有configure以後纔有squid.h文件，這是下面的randfold2.0須要的文件

make

make install

tar -zxf randfold-2.0.tar.gz

cd randfold2.0

編輯Makefile文件，將INCLUDE=-I這一行替換爲INCLUDE=-I. -I/home/disk6/src/squid-1.9g/ -L/home/disk6/src/squid-1.9g/

make

將randfold加入path

~~~~~~~~~~~~安裝PDF-API2-0.73.tar.gz

tar -zxf PDF-API2-0.73.tar.gz

cd PDF-API2-0.73

mkdir ../mirdeep2/lib/ #這個不能忘了，一開始就解壓了mirdeep2，在mirdeep2下面建立一個lib路徑

perl Makefile.PL PREFIX=/home/disk6/src/mirdeep2 LIB=/home/disk6/src/mirdeep2/lib

make

make test

make install #至此，/home/disk6/src/mirdeep2/lib下面已經有了兩個目錄PDF和x86_64-linux-thread-multi

~~~~~~~~~~~~配置mirdeep2的perl5lib 就是那個PDF了

在~/.bash_profile裏面加入

export PERL5LIB=PERL5LIB:/home/disk6/src/mirdeep2/lib

~~~~~~~~~測試全部安裝過的軟件是否正常

to test if everything is installed properly type in

1) bowtie

2) RNAfold -h

3) randfold

4) make_html.pl

~~~~~~~~~~最後，在path中加入miRDeep2的路徑

2、mirDeep2介紹

miRDeep2的文件夾下面有自帶的tutorial，參考經過參考這個例子學習miRDeep2.html

tutorial_dir文件夾裏有下面幾個文件，.fa爲fasta格式。linux

cel_cluster.fa: # 研究物種的基因組文件 express

mature_ref_this_species.fa: # 研究物種的成熟miRNA文件，miRBase有下載瀏覽器

mature_ref_other_species.fa: # 其餘物種相關的成熟miRNA文件，miRBase有下載bash

precursors_ref_this_species.fa: # 研究物種miRNA前體的文件，miRBase有下載app

reads.fa: # deep sequencing readside

~~~~~~~~~~第一步~~~~~~~~~學習

# 利用bowtie-build創建基因組文件的index測試

bowtie-build cel_cluster.fa cel_cluster # cel_cluster.fa是基因組文件，cel_cluster是index文件的ui

前綴，這個前綴能夠是任意的

# 字符，不必定要和基因組文件相同。

~~~~~~~~~~第二步~~~~~~~~~

# 處理reads文件而且把它map到基因上

perl mapper.pl reads.fa -c -j -k TCGTATGCCGTCTTCTGCTTGT -l 18 -m -p cel_cluster -s

reads_collapsed.fa -t reads_collapsed_vs_genome.arf -v

參數講解
-c 指出輸入文件是fasta格式，同類的參數還有-a(seq.txt format),-b(qseq.txt format),-e(fastq format),-d

(contig file)
-j 刪除不規範的字母（不規範的字母是指除a,c,g,t,u,n,A,C,G,T,U,N以外的字母）
-k 剪切接頭，後跟接頭序列，例子中的TCGTATGCCGTCTTCTGCTTGT就是接頭
-l 忽視小於某長度的序列，例子中忽視18nt長度的reads
-m collapses the reads
-p 將處理過的reads map到以前創建過索引的基因組上，例子中的cel_cluster
-s 指出將處理過的reads輸出到某個文件，例子中將處理過的reads輸出到reads_collapsed.fa
-t 指出將mapping的結果輸出到某個文件，例子中將mapping後的結果輸出到reads_collapsed_vs_genome.arf文件中
-v 在屏幕上顯示處理的動做，加v和不加v的區別見附註1，明顯看出來加v後屏幕不只顯示了一個處理後的summary，而

且顯示了mapper的動做，如discarding，clipping，collapsing，trimming。不加v屏幕上只顯示一個summary

例子中未使用的參數
處理/mapping參數
-g 給reads一個前綴，默認是seq。-s和-t兩個輸出文件中reads前面會多出seq三個字母。
-h parse to fasta format
-i 轉換rna成dna（再map到基因組）convert rna to dna alphabet (to map against genome)
-q 種子序列中一個錯配（mapping的時間會變長？？）map with one mismatch in the seed (mapping takes

longer)
-r 容許在基因組上map到的最多的位置數，默認是5。也就是說最多map 5個位置
-u 不移除臨時文件的路徑
-n 覆蓋已有文件

~~~~~~~~~~第三步~~~~~~~~~
# fast quantitation of reads mapping to known miRBase precursors.

(This step is not required for

identification of known and novel miRNAs in the deep sequencing data when using miRDeep2.pl.)
快速定量reads mapping到已知的miRNA前體。利用miRDeep.pl在deep sequencing數據中鑑定已知和未知的miRNA，這

一步不是必須的。

quantifier.pl -p precursors_ref_this_species.fa -m mature_ref_this_species.fa -r reads_collapsed.fa

-t cel -y 16_19

參數講解
-p miRNA前體文件，miRBase能夠下載

-m 成熟miRNA序列文件，miRBase能夠下載

-r reads文件

-t 物種，能夠指定某個物種，這樣分析的時候只考慮某個物種的數據。也能夠不指定,分析全部的

-y [time] optional otherwise its generating a new one

屏幕上顯示的結果
getting samples and corresponding read numbers

seq 374333 reads

Converting input files
building bowtie index
mapping mature sequences against index
# reads processed: 174
# reads with at least one reported alignment: 6 (3.45%)
# reads that failed to align: 168 (96.55%)
Reported 6 alignments to 1 output stream(s)
mapping read sequences against index
# reads processed: 1505
# reads with at least one reported alignment: 1088 (72.29%)
# reads that failed to align: 417 (27.71%)
Reported 1099 alignments to 1 output stream(s)
analyzing data

6 mature mappings to precursors

Expressed miRNAs are written to expression_analyses/expression_analyses_16_19/miRNA_expressed.csv
not expressed miRNAs are written to

expression_analyses/expression_analyses_16_19/miRNA_not_expressed.csv

Creating miRBase.mrd file

after READS READ IN thing

make_html2.pl -q expression_analyses/expression_analyses_16_19/miRBase.mrd -k

mature_ref_this_species.fa -z -t C.elegans -y 16_19 -o -i

expression_analyses/expression_analyses_16_19/mature_ref_this_species_mapped.arf -l -m cel

miRNAs_expressed_all_samples_16_19.csv
miRNAs_expressed_all_samples_16_19.csv file with miRNA expression values
parsing miRBase.mrd file finished
creating PDF files
creating pdf for cel-mir-39 finished
creating pdf for cel-mir-40 finished
creating pdf for cel-mir-37 finished
creating pdf for cel-mir-36 finished
creating pdf for cel-mir-38 finished
creating pdf for cel-mir-41 finished

#
獲得幾個文件，expression_16_19.html，expression_analyses文件夾（裏面有不少文件），

iRNAs_expressed_all_samples_16_19.csv
，pdfs_16_19文件夾

~~~~~~~~~~第四步~~~~~~~~~

#在deep sequencing data中鑑定已知和未知的miRNA

miRDeep2.pl reads_collapsed.fa cel_cluster.fa reads_collapsed_vs_genome.arf

mature_ref_this_species.fa mature_ref_other_species.fa precursors_ref_this_species.fa -t C.elegans

2> report.log

# reads_collapsed.fa是通過mapper.pl處理的reads。
# cel_cluster.fa是基因組文件
# reads_collapsed_vs_genome.arf mapping的結果
# mature_ref_this_species.fa研究物種的成熟miRNA文件，miRBase有下載
# mature_ref_other_species.fa其餘物種相關的成熟miRNA文件，miRBase有下載
# precursors_ref_this_species.fa研究物種miRNA前體的文件，miRBase有下載
# 若是你只有reads，arf文件，genome文件，其餘文件沒有，須要這樣表示miRNAs_ref/none miRNAs_other/none

precursors/none，本物種的成熟miRNA無，其餘相關物種也無，更沒有前體。

參數說明
-t 物種
2> repot.log表示將全部的步驟輸出到report.log文件中

# 屏幕顯示

#####################################
#                                   #
# miRDeep2                          #
#                                   #
# last change: 07/07/2011           #
#                                   #
#####################################

miRDeep2 started at 19:44:43

#Starting miRDeep2
#testing input files
#Quantitation of known miRNAs in data
#parsing genome mappings
#excising precursors
#preparing signature
#folding precursors
#computing randfold p-values
#running miRDeep core algorithm
#running permuted controls
#doing survey of accuracy
#producing graphic results

miRDeep runtime:

started: 19:44:43
ended: 19:46:15
total:0h:1m:32s

~~~~~~~~~~第五步~~~~~~~~~

# 瀏覽結果

用瀏覽器打開.html文件
注意，cel-miR-37預測了兩次。由於這個位點的兩個潛在的前體能夠摺疊成髮卡結構。然而，註釋的髮卡結構得分遠遠

高於未註釋的髮卡結構(miRDeep2 score 6.1e+4 vs. -0.2)

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~附註1~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

######加v###屏幕上輸出的結果以下####

discarding sequences with non-canonical letters
clipping 3' adapters
discarding short reads
collapsing reads
mapping reads to genome index
# reads processed: 1609
# reads with at least one reported alignment: 470 (29.21%)
# reads that failed to align: 1139 (70.79%)
Reported 480 alignments to 1 output stream(s)
trimming unmapped nts in the 3' ends

######不加v###屏幕上輸出的結果以下####

# reads processed: 1609
# reads with at least one reported alignment: 470 (29.21%)
# reads that failed to align: 1139 (70.79%)
Reported 480 alignments to 1 output stream(s)

~~~~~~~~~~~~~~附註1~~~~~~~~~~~~~~~~~~

原文地址：http://blog.sina.com.cn/s/blog_7cffd1400101m3i3.html http://blog.sina.com.cn/s/blog_7cffd1400100twvb.html