構建本身須要的blast庫,須要下載全部本身須要的基因。利用biopython能夠快速完成。python
1.首先,利用 「org.Hs.eg.db」 將本身的基因symbol轉成accession中的idapi
1 library(org.Hs.eg.db) 2 symbol=c("PDCD1","CD274","IL4","IL7") 3 accession =mapIds(org.Hs.eg.db,keys=x,column = "REFSEQ",keytype = "SYMBOL",multiVals = "first") 4 accession=as.matrix(accession) 5 write.table(accession,"accession.txt",row.names = T,col.names = F,quote = F,sep = "\t")
結果以下:ide
1 PDCD1 NM_005018 2 CD274 NM_001267706 3 IL4 NM_000589 4 IL7 NM_000880
2.利用biopython下載序列。fetch
1 from Bio import Entrez 2 from Bio import SeqIO 3 file_in_name="accession.txt" 4 file_out_name="result.fasta" 5 Entrez.email = 'xxxx@xx.com'##你的郵箱 6 input_file=open(file_in_name,"r") 7 output_file=open(file_out_name,"a") 8 for record_id in input_file: 9 record_id=record_id.strip().split("\t")[1] 10 result_handle = Entrez.efetch(db="nucleotide", rettype="gb", id=record_id) 11 seqRecord = SeqIO.read(result_handle, format='gb') 12 result_handle.close() 13 output_file.write(seqRecord.format('fasta')) 14 output_file.close() 15 input_file.close()
結果:spa
>NM_005018.3 Homo sapiens programmed cell death 1 (PDCD1), mRNA
GCTCACCTCCGCCTGAGCAGTGGAGAAGGCGGCACTCTGGTGGGGCTGCTCCAGGCATGCAGATCCCACAGGCGCCCTGGCCAGTCGTCTGGGCGGTGCTACAACTGGGCTGGCGGCCAG....
>NM_001267706.1 Homo sapiens CD274 molecule (CD274), transcript variant 2, mRNA
GGCGCAACGCTGAGCAGCTGGCGCGTCCCGCGCGGCCCCAGTTCTGCGCAGCTTCCCGAGGCTCCGCACCAGCCGCGCTTCTGTCCGCCTGCAGGGCATTCCAGAAAGATGAGGATATTT...
>NM_000589.4 Homo sapiens interleukin 4 (IL4), transcript variant 1, mRNA
ATCGTTAGCTTCTCCTGATAAACTAATTGCCTCACATTGTCACTGCAAATCGACACCTATTAATGGGTCTCACCTCCCAACTGCTTCCCCCTCTGTTCTTCCTGCTAGCATGTGCCGGCA...
>NM_000880.4 Homo sapiens interleukin 7 (IL7), transcript variant 1, mRNA
ACACTTGTGGCTTCCGTGCACACATTAACAACTCATGGTTCTAGCTCCCAGTCGCCAAGCGTTGCCAAGGCGTTGAGAGATCATCTGGGAAGTCTTTTACCCAGAATTGCTTTGATTCAG...
完成code