轉錄組入門(5)：序列比對

時間 2019-12-11

標籤轉錄入門序列比對简体版

原文原文鏈接

任務列表

比對軟件
hisat2的用法
下載index文件
比對、排序、索引
質量控制
載入IGV，截圖幾個基因

hisat2的用法

本做業是比對到基因組，因此使用gapped or splices mapper，此流程已經更新。TopHat首次被髮表已是7年前，STAR的比對速度是TopHat的50倍，HISAT更是STAR的1.2倍。HISAT2是TopHat2/Bowti2的繼任者，使用改進的BWT算法，實現了更快的速度和更少的資源佔用，做者推薦TopHat2/Bowti2和HISAT的用戶轉換到HISAT2。

官網：https://ccb.jhu.edu/software/hisat2/index.shtml（學習一個軟件最好的方法就是結合現有中文資料，加上閱讀官方說明書和HELP文檔，通常剛開始學習的時候先使用默認參數，不要亂調參數）

下載index文件

cd ~/reference
mkdir -p index/hisat && cd index/hisat
wget -c ftp://ftp.ccb.jhu.edu/pub/infphilo/hisat2/data/hg19.tar.gz
wget -c ftp://ftp.ccb.jhu.edu/pub/infphilo/hisat2/data/mm10.tar.gz
tar zxvf hg19.tar.gz
tar xvzf mm10.tar.gz

-c：斷點續傳

比對、排序、索引

把fastq格式的reads比對上去獲得sam文件，接着用samtools把它轉爲bam文件，而且排序（注意N和P兩種排序區別）索引好（可使用管道實現，省去中間SAM保存的過程，直接輸出BAM文件）

編寫bash腳本：map.sh

#! usr/bin/bash
set -u
set -e
set -o pipefail
hg19_ref=/mnt/hgfs/2017/reference/index/hisat/hg19/genome
mm10_ref=/mnt/hgfs/2017/reference/index/hisat/mm10/genome
data_path=/mnt/hgfs/2017/rna_seq/data
NUM_THREADS=25
ls --color=never Homo*1.fastq.gz | while read id;do(~/biosoft/hisat2-2.1.0/hisat2 -t -p $NUM_THREADS -x $hg19_ref -1 $data_path/${id%_*}_1.fastq.gz -2 $data_path/${id%_*}_2.fastq.gz 2 > ${id%_*}_map.log | samtools view -Sb  - > ${id%_*}.bam);done
ls --color=never Mus*1.fastq.gz | while read id;do(~/biosoft/hisat2-2.1.0/hisat2 -t -p $NUM_THREADS -x $mm10_ref -1 $data_path/${id%_*}_1.fastq.gz -2 $data_path/${id%_*}_2.fastq.gz 2 > ${id%_*}_map.log | samtools view -Sb  - > ${id%_*}.bam);done
ls --color=never *.bam | while read id;do(samtools sort --threads $NUM_THREADS $id -o ${id%.*}_sorted.bam);done
ls --color=never *_sorted.bam | while read id;do(samtools index $id);done

運行腳本：