Pytorch是一個較新的深度學習框架,是一個 Python 優先的深度學習框架,可以在強大的 GPU 加速基礎上實現張量和動態神經網絡。github
對於沒有學習過pytorch的初學者,能夠先看一下官網發行的60分鐘入門pytorch,參考地址 :http://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.html網絡
一、我如今使用的語料是基本規範的數據(例以下),可是加載語料數據的過程當中仍然存在着一些須要預處理的地方,像一些數據的大小寫、數字的處理以及「\n \t」等一些字符,如今使用torchtext第三方庫進行加載數據預處理。dom
You Should Pay Nine Bucks for This : Because you can hear about suffering Afghan refugees on the news and still be unaffected . ||| 2 Dramas like this make it human . ||| 4
import torchtext.data as data # lower word text_field = data.Field(lower=True)
1 from torchtext import data 2 def clean_str(string): 3 string = re.sub(r"[^A-Za-z0-9(),!?\'\`]", " ", string) 4 string = re.sub(r"\'s", " \'s", string) 5 string = re.sub(r"\'ve", " \'ve", string) 6 string = re.sub(r"n\'t", " n\'t", string) 7 string = re.sub(r"\'re", " \'re", string) 8 string = re.sub(r"\'d", " \'d", string) 9 string = re.sub(r"\'ll", " \'ll", string) 10 string = re.sub(r",", " , ", string) 11 string = re.sub(r"!", " ! ", string) 12 string = re.sub(r"\(", " \( ", string) 13 string = re.sub(r"\)", " \) ", string) 14 string = re.sub(r"\?", " \? ", string) 15 string = re.sub(r"\s{2,}", " ", string) 16 return string.strip() 17 18 text_field.preprocessing = data.Pipeline(clean_str)
1 if shuffle: 2 random.shuffle(examples_train) 3 random.shuffle(examples_dev) 4 random.shuffle(examples_test)
1 class Iterator(object): 2 """Defines an iterator that loads batches of data from a Dataset. 3 4 Attributes: 5 dataset: The Dataset object to load Examples from. 6 batch_size: Batch size. 7 sort_key: A key to use for sorting examples in order to batch together 8 examples with similar lengths and minimize padding. The sort_key 9 provided to the Iterator constructor overrides the sort_key 10 attribute of the Dataset, or defers to it if None. 11 train: Whether the iterator represents a train set. 12 repeat: Whether to repeat the iterator for multiple epochs. 13 shuffle: Whether to shuffle examples between epochs. 14 sort: Whether to sort examples according to self.sort_key. 15 Note that repeat, shuffle, and sort default to train, train, and 16 (not train). 17 device: Device to create batches on. Use -1 for CPU and None for the 18 currently active GPU device. 19 """
一、word embedding簡單來講就是語料中每個單詞對應的其相應的詞向量,目前訓練詞向量的方式最使用的應該是word2vec(參考 http://www.cnblogs.com/bamtercelboo/p/7181899.html)
二、上文中已經經過torchtext創建了相關的詞彙表,加載詞向量有兩種方式,一個是加載外部根據語料訓練好的預訓練詞向量,另外一個方式是隨機初始化詞向量,兩種方式相互比較的話當時是使用預訓練好的詞向量效果會好不少,可是本身訓練的詞向量並不見得會有很好的效果,由於語料數據可能不足,像已經訓練好的詞向量,像Google News那個詞向量,是業界公認的詞向量,可是因爲數量巨大,若是硬件設施(GPU)不行的話,仍是不要去嘗試這個了。
glove-vectors (https://nlp.stanford.edu/projects/glove/)
1 # load word embedding 2 def load_my_vecs(path, vocab, freqs): 3 word_vecs = {} 4 with open(path, encoding="utf-8") as f: 5 count = 0 6 lines = f.readlines()[1:] 7 for line in lines: 8 values = line.split(" ") 9 word = values[0] 10 # word = word.lower() 11 count += 1 12 if word in vocab: # whether to judge if in vocab 13 vector = [] 14 for count, val in enumerate(values): 15 if count == 0: 16 continue 17 vector.append(float(val)) 18 word_vecs[word] = vector 19 return word_vecs
1 # solve unknown by avg word embedding 2 def add_unknown_words_by_avg(word_vecs, vocab, k=100): 3 # solve unknown words inplaced by zero list 4 word_vecs_numpy = [] 5 for word in vocab: 6 if word in word_vecs: 7 word_vecs_numpy.append(word_vecs[word]) 8 print(len(word_vecs_numpy)) 9 col = [] 10 for i in range(k): 11 sum = 0.0 12 # for j in range(int(len(word_vecs_numpy) / 4)): 13 for j in range(int(len(word_vecs_numpy))): 14 sum += word_vecs_numpy[j][i] 15 sum = round(sum, 6) 16 col.append(sum) 17 zero = [] 18 for m in range(k): 19 # avg = col[m] / (len(col) * 5) 20 avg = col[m] / (len(word_vecs_numpy)) 21 avg = round(avg, 6) 22 zero.append(float(avg)) 23 24 list_word2vec = [] 25 oov = 0 26 iov = 0 27 for word in vocab: 28 if word not in word_vecs: 29 # word_vecs[word] = np.random.uniform(-0.25, 0.25, k).tolist() 30 # word_vecs[word] = [0.0] * k 31 oov += 1 32 word_vecs[word] = zero 33 list_word2vec.append(word_vecs[word]) 34 else: 35 iov += 1 36 list_word2vec.append(word_vecs[word]) 37 print("oov count", oov) 38 print("iov count", iov) 39 return list_word2vec
隨機初始化或者所有取zero,隨機初始化或者是取zero,能夠是全部的OOV都使用一個隨機值,也能夠每個OOV word都是隨機的,具體效果看本身效果
1 # solve unknown word by uniform(-0.25,0.25) 2 def add_unknown_words_by_uniform(word_vecs, vocab, k=100): 3 list_word2vec = [] 4 oov = 0 5 iov = 0 6 # uniform = np.random.uniform(-0.25, 0.25, k).round(6).tolist() 7 for word in vocab: 8 if word not in word_vecs: 9 oov += 1 10 word_vecs[word] = np.random.uniform(-0.25, 0.25, k).round(6).tolist() 11 # word_vecs[word] = np.random.uniform(-0.1, 0.1, k).round(6).tolist() 12 # word_vecs[word] = uniform 13 list_word2vec.append(word_vecs[word]) 14 else: 15 iov += 1 16 list_word2vec.append(word_vecs[word]) 17 print("oov count", oov) 18 print("iov count", iov) 19 return list_word2vec
1 if args.word_Embedding: 2 pretrained_weight = np.array(args.pretrained_weight) 3 self.embed.weight.data.copy_(torch.from_numpy(pretrained_weight))
對於pytorch中的nn.Conv2d()卷積函數來講,有weight and bias,對weight初始化是頗有必要的,不對其初始化可能減慢收斂速度,影響最終效果等
(),具體使用參考 http://pytorch.org/docs/master/nn.html#torch-nn-init
1 init.xavier_normal(conv.weight.data, gain=np.sqrt(args.init_weight_value)) 2 init.uniform(conv.bias, 0, 0)
對於pytorch中的nn.LSTM(),有all_weights屬性,其中包括weight and bias,是一個多維矩陣
1 if args.init_weight: 2 print("Initing W .......") 3 init.xavier_normal(self.bilstm.all_weights[0][0], gain=np.sqrt(args.init_weight_value)) 4 init.xavier_normal(self.bilstm.all_weights[0][1], gain=np.sqrt(args.init_weight_value)) 5 init.xavier_normal(self.bilstm.all_weights[1][0], gain=np.sqrt(args.init_weight_value)) 6 init.xavier_normal(self.bilstm.all_weights[1][1], gain=np.sqrt(args.init_weight_value))
CNN中的kernel-size:看過一篇paper(A Sensitivity Analysis of (and Practitioners’ Guide to)Convolutional Neural Networks for Sentence Classification),論文上測試了kernel的使用,根據其結果,設置大部分會在1-10隨機組合,具體的效果還好根據本身的任務。
batch size:batch size這個仍是須要去適當調整的,看相關的blogs,通常設置不會超過128,有可能也很小,在我目前的任務中,batch size =16有不錯的效果。
learning rate:學習率這個通常初值對於不一樣的優化器設置是不同的,聽說有一些經典的配置,像Adam :lr = 0.001
LSTM中的hidden size:LSTM中的隱藏層維度大小也對結果有必定的影響,若是使用300dim的外部詞向量的話,能夠考慮hidden size =150或者是300,對於hidden size我最大設置過600,由於硬件設備的緣由,600訓練起來已是很慢了,若是硬件資源ok的話,能夠嘗試更多的hidden size值,可是嘗試的過程當中仍是要考慮一下hidden size 與詞向量維度的關係(自認爲其是有必定的關係影響的)
二範式約束:pytorch中的Embedding中的max-norm 和norm-type就是二範式約束
1 if args.max_norm is not None: 2 print("max_norm = {} ".format(args.max_norm)) 3 self.embed = nn.Embedding(V, D, max_norm=args.max_norm)
pytorch中實現了L2正則化,也叫作權重衰減,具體實現是在優化器中,參數是 weight_decay(pytorch中的L1正則已經被遺棄了,能夠本身實現),通常設置1e-8
1 if args.Adam is True: 2 print("Adam Training......") 3 optimizer = torch.optim.Adam(model.parameters(), lr=args.lr, weight_decay=args.init_weight_decay)
1 import torch.nn.utils as utils 2 if args.init_clip_max_norm is not None: 3 utils.clip_grad_norm(model.parameters(), 4 max_norm=args.init_clip_max_norm)
動態學習率:pytorch最新的版本0.2已經實現了動態學習率,具體使用參考 http://pytorch.org/docs/master/optim.html#how-to-adjust-learning-rate
批量歸一化(batch normalizations),pytorch中也提供了相應的函數 BatchNorm1d() 、BatchNorm2d() 能夠直接使用,其中有一個參數(momentum)能夠做爲超參數調整
1 if args.batch_normalizations is True: 2 print("using batch_normalizations in the model......") 3 self.convs1_bn = nn.BatchNorm2d(num_features=Co, momentum=args.bath_norm_momentum, 4 affine=args.batch_norm_affine) 5 self.fc1_bn = nn.BatchNorm1d(num_features=in_fea//2, momentum=args.bath_norm_momentum, 6 affine=args.batch_norm_affine) 7 self.fc2_bn = nn.BatchNorm1d(num_features=C, momentum=args.bath_norm_momentum, 8 affine=args.batch_norm_affine)
1 if args.wide_conv is True: 2 print("using wide convolution") 3 self.convs1 = [nn.Conv2d(in_channels=Ci, out_channels=Co, kernel_size=(K, D), stride=(1, 1), 4 padding=(K//2, 0), dilation=1, bias=True) for K in Ks] 5 else: 6 print("using narrow convolution") 7 self.convs1 = [nn.Conv2d(in_channels=Ci, out_channels=Co, kernel_size=(K, D), bias=True) for K in Ks]
優化器:pytorch提供了多個優化器,咱們最經常使用的是Adam,效果仍是很不錯的,具體的能夠參考 http://pytorch.org/docs/master/optim.html#algorithms
fine-tune or no-fine-tune:這是一個很重要的策略,通常狀況下fine-tune是有很不錯的效果的相對於no-fine-tune來講。
