作推理的ESIM

接觸了一個在Inference領域比較有影響力的模型——ESIM。同時薅了Colab羊毛。html

ESIM模型簡介

Enhanced LSTM for Natural Language Inference這篇論文提出了一種計算兩個句子類似度的模型。模型由3個部分組成:python

Input Encoding

首先將輸入的兩個句子,premise和hypothesis的詞向量a=(a_1,...,a_{l_a})b=(b_1,...,b_{l_b})通過一個BiLSTM的處理,獲得新的詞向量表示(\bar{a_1}, \dots, \bar{a_{l_a}})(\bar{b_1}, \dots, \bar{b_{l_b}})git

Local Inference

論文中說到,計算兩個詞的相關程度最好的方法是計算詞向量的內積,也就是e_{ij}=\bar{a_i}^T\bar{b_j}。這樣,計算兩個句子的全部詞對之間的類似度(attention),就能夠得到一個矩陣github

(e_{ij})_{l_a \times l_b} = (\bar{a_i}^T\bar{b_j})_{l_a \times l_b}

接着是一個頗有意思的思想:既然要判斷兩個句子類似度,那麼就須要看看二者之間可否相互表示。也就是分別用premise和hypothesis中的詞向量\bar{a_i}\bar{b_i}表示對方的詞向量。json

論文中的公式爲:緩存

\widetilde{a_i} = \sum_{j=1}^{l_b}{\frac{exp(e_{ij})}{\sum_{k=1}^{l_b}{exp(e_{ik})}}\bar{b_j}}
\widetilde{b_j} = \sum_{i=1}^{l_a}{\frac{exp(e_{ij})}{\sum_{k=1}^{l_a}{exp(e_{kj})}}\bar{a_i}}

翻譯一下就是,由於模型不知道應該哪對a_ib_j纔是相近或相對,因此作了一個枚舉的操做,將全部的狀況都表示出來。以前計算的類似度矩陣就是就用來作加權。每一個位置上的權重即當前權重矩陣行(對於計算\widetilde{a_i}來講,對於計算\widetilde{b_j}就是列)的Softmax值。bash

論文爲了強化推理(Enhancement of inference information),將以前獲得的中間結果都堆疊起來。網絡

m_a = [\bar{a};\widetilde{a};\bar{a}-\widetilde{a};\bar{a} \odot \widetilde{a}]
m_b = [\bar{b};\widetilde{b};\bar{b}-\widetilde{b};\bar{b} \odot \widetilde{b}]

Inference Composition

推理組合使用的詞向量就是上一個部分所得的m_am_b,仍是用到了BiLSTM來獲取兩組詞向量的上下文信息。app

將全部的信息組合起來以後,一併送給全鏈接層,完成最後的糅合。ide

導入須要用到的庫

import os
import time
import logging
import pickle
from tqdm import tqdm_notebook as tqdm

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

import torchtext
from torchtext import data, datasets
from torchtext.vocab import GloVe

import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

import nltk
from nltk import word_tokenize
import spacy
from keras_preprocessing.text import Tokenizer

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(device)
複製代碼
cuda
複製代碼

掛載Google Drive

from google.colab import drive
drive.mount('/content/drive')
複製代碼
Go to this URL in a browser: https://accounts.google.com/o/oauth2/xxxxxxxx

Enter your authorization code:
··········
Mounted at /content/drive
複製代碼
!nvidia-smi
複製代碼
Fri Aug  9 04:45:35 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.67       Driver Version: 410.79       CUDA Version: 10.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           Off  | 00000000:00:04.0 Off |                    0 |
| N/A   60C    P0    62W / 149W |   6368MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+
複製代碼

使用torchtext準備數據

torchtext的使用方式參考了參考了:github.com/pytorch/exa…

torchtext中的GloVe能夠直接使用,可是因爲其沒有提供相似torchvision的直接讀取源文件的功能,而只能讀取緩存,因此最好:

  1. 先將GloVe下載到本地
  2. 在下載目錄打開終端,而後在終端中先使用torchtext生成緩存
  3. 之後使用GloVe的時候增長cache參數,這樣torchtext就會從cache中讀取而不是下載龐大的GloVe到本地了

不過若是薅的是Colab羊毛,那就隨便了(~ ̄▽ ̄)~

torchtext還能夠直接加載SNLI數據集,不過數據集的加載目錄結構以下:

  • root
    • snli_1.0
      • snli_1.0_train.jsonl
      • snli_1.0_dev.jsonl
      • snli_1.0_test.jsonl
TEXT = data.Field(batch_first=True, lower=True, tokenize="spacy")
LABEL = data.Field(sequential=False)

# 分離訓練、驗證、測試集
tic = time.time()
train, dev, test = datasets.SNLI.splits(TEXT, LABEL)
print(f"Cost: {(time.time() - tic) / 60:.2f} min")

# 加載GloVe預訓練向量
tic = time.time()
glove_vectors = GloVe(name='6B', dim=100)
print(f"Creat GloVe done. Cost: {(time.time() - tic) / 60:.2f} min")

# 建立詞彙表
tic = time.time()
TEXT.build_vocab(train, dev, test, vectors=glove_vectors)
LABEL.build_vocab(train)
print(f"Build vocab done. Cost: {(time.time() - tic) / 60:.2f} min")

print(f"TEXT.vocab.vectors.size(): {TEXT.vocab.vectors.size()}")
num_words = int(TEXT.vocab.vectors.size()[0])

# 保存分詞和詞向量的對應字典
if os.path.exists("/content/drive/My Drive/Colab Notebooks"):
    glove_stoi_path = "/content/drive/My Drive/Colab Notebooks/vocab_label_stoi.pkl"
else:
    glove_stoi_path = "./vocab_label_stoi.pkl"
pickle.dump([TEXT.vocab.stoi, LABEL.vocab.stoi], open(glove_stoi_path, "wb"))

batch_sz = 128

train_iter, dev_iter, test_iter = data.BucketIterator.splits(
    datasets=(train, dev, test),
    batch_sizes=(batch_sz, batch_sz, batch_sz),
    shuffle=True,
    device=device
)
複製代碼
Cost: 7.94 min
Creat GloVe done. Cost: 0.00 min
Build vocab done. Cost: 0.12 min
TEXT.vocab.vectors.size(): torch.Size([34193, 100])
複製代碼

通用參數配置

煉丹的時候最好有一個全局配方,這樣好調整。

class Config:

    def __init__(self):
        # For data
        self.batch_first = True
        try:
            self.batch_size = batch_sz
        except NameError:
            self.batch_size = 512

        # For Embedding
        self.n_embed = len(TEXT.vocab)
        self.d_embed = TEXT.vocab.vectors.size()[-1]

        # For Linear
        self.linear_size = self.d_embed

        # For LSTM
        self.hidden_size = 300

        # For output
        self.d_out = len(LABEL.vocab)  # 表示輸出爲幾維
        self.dropout = 0.5

        # For training
        self.save_path = r"/content/drive/My Drive/Colab Notebooks" if os.path.exists(
            r"/content/drive/My Drive/Colab Notebooks") else "./"
        self.snapshot = os.path.join(self.save_path, "ESIM.pt")

        self.device = device
        self.epoch = 64
        self.scheduler_step = 3
        self.lr = 0.0004
        self.early_stop_ratio = 0.985  # 能夠提前結束訓練過程


args = Config()
複製代碼

ESIM模型代碼實現

代碼參考了:github.com/pengshuang/…

nn.BatchNorm1d的使用

對數據的正則化能夠消除不一樣維度數據分佈不一樣的問題,幾何上的理解就是將n維空間的一個「橢球體」正則化爲一個「球體」,這樣能夠簡化模型的訓練難度,提升訓練速度。

可是若是將全部的輸入數據所有正則化,會消耗大量的時間,Batch Normalization就是一種折衷的方法,它只對輸入的batch_size個數據進行正則化。從機率上理解就是根據batch_size個樣本的分佈,估計全部樣本的分佈。

PyTorch的nn.BatchNorm1d聽名字就知道是對一維數據的批正則化,因此這裏有兩個限制條件:

  1. 訓練(即打開了model.train())的時候,提供的批大小至少爲2;測試、使用的(model.eval())時候沒有batch大小的限制
  2. 默認倒數第2維是「batch」

而我以前的數據處理所獲得的每個批次的數據,通過詞向量映射以後獲得的形狀爲batch * seq_len * embed_dim,因此這裏有3個維度。而且通過torchtext的data.BucketIterator.splits處理,每一個batch的seq_len是動態的(和當前batch中最長句子的長度相同)。這樣若是不加處理直接輸入給BatchNorm1d,通常會看到以下的報錯:

RuntimeError: running_mean should contain xxx elements not yyy

關於Embedding以後是否須要增長BatchNorm1d層

參考代碼實現很是漂亮,能夠看出做者的代碼功底。不過做者彷佛不是使用預處理的詞向量做爲Embedding向量,而我是用的是預訓練的詞向量GloVe,而且也不會去訓練Glove,因此是否有必要增長nn.BatchNorm1d

由於盲目增長網絡的層數並不會有好的影響,因此最好的方式就是先看看GloVe詞向量是否是每一維都是「正則化的」。

glove = TEXT.vocab.vectors

means, stds = glove.mean(dim=0).numpy(), glove.std(dim=0).numpy()
dims = [i for i in range(glove.shape[1])]

plt.scatter(dims, means)
plt.scatter(dims, stds)
plt.legend(["mean", "std"])
plt.xlabel("Dims")
plt.ylabel("Features")
plt.show()

print(f"mean(means)={means.mean():.4f}, std(means)={means.std():.4f}")
print(f"mean(stds)={stds.mean():.4f}, std(stds)={stds.std():.4f}")
複製代碼

mean(means)=0.0032, std(means)=0.0809
mean(stds)=0.4361, std(stds)=0.0541
複製代碼

從圖中能夠看出每一維的分佈仍是比較穩定的,因此不打算在Embedding層後使用nn.BatchNorm1d

nn.LSTM的使用

nn.LSTM(
   input_size, hidden_size, num_layers, bias=True, batch_first=False, dropout=0, bidirectional=False)
)
複製代碼

nn.LSTM的默認參數batch_first是False,這會讓習慣了CV的數據格式的我十分不適應,因此最好仍是設置一下True

如下是LSTM的輸入/輸出格式。Inputs能夠不帶上h_0c_0,這個時候LSTM會自動生成全0的h_0c_0

  • Inputs: input, (h_0, c_0)
  • Outputs: output, (h_n, c_n)
  • input: (seq_len, batch, input_size)
  • output: (seq_len, batch, num_directions * hidden_size)
  • h / c: (num_layers * num_directions, batch, hidden_size)
class ESIM(nn.Module):

    def __init__(self, args):
        super(ESIM, self).__init__()
        self.args = args

        self.embedding = nn.Embedding(
            args.n_embed, args.d_embed)  # 參數的初始化能夠放在以後
        # self.bn_embed = nn.BatchNorm1d(args.d_embed)

        self.lstm1 = nn.LSTM(args.d_embed, args.hidden_size,
                             num_layers=1, batch_first=True, bidirectional=True)
        self.lstm2 = nn.LSTM(args.hidden_size * 8, args.hidden_size,
                             num_layers=1, batch_first=True, bidirectional=True)

        self.fc = nn.Sequential(
            nn.BatchNorm1d(args.hidden_size * 8),
            nn.Linear(args.hidden_size * 8, args.linear_size),
            nn.ELU(inplace=True),
            nn.BatchNorm1d(args.linear_size),
            nn.Dropout(args.dropout),
            nn.Linear(args.linear_size, args.linear_size),
            nn.ELU(inplace=True),
            nn.BatchNorm1d(args.linear_size),
            nn.Dropout(args.dropout),
            nn.Linear(args.linear_size, args.d_out),
            nn.Softmax(dim=-1)
        )

    def submul(self, x1, x2):
        mul = x1 * x2
        sub = x1 - x2
        return torch.cat([sub, mul], -1)

    def apply_multiple(self, x):
        # input: batch_size * seq_len * (2 * hidden_size)
        p1 = F.avg_pool1d(x.transpose(1, 2), x.size(1)).squeeze(-1)
        p2 = F.max_pool1d(x.transpose(1, 2), x.size(1)).squeeze(-1)
        # output: batch_size * (4 * hidden_size)
        return torch.cat([p1, p2], 1)

    def soft_attention_align(self, x1, x2, mask1, mask2):
        ''' x1: batch_size * seq_len * dim x2: batch_size * seq_len * dim '''
        # attention: batch_size * seq_len * seq_len
        attention = torch.matmul(x1, x2.transpose(1, 2))
        # mask的做用:防止計算Softmax的時候出現異常值
        mask1 = mask1.float().masked_fill_(mask1, float('-inf'))
        mask2 = mask2.float().masked_fill_(mask2, float('-inf'))

        # weight: batch_size * seq_len * seq_len
        weight1 = F.softmax(attention + mask2.unsqueeze(1), dim=-1)
        x1_align = torch.matmul(weight1, x2)
        weight2 = F.softmax(attention.transpose(
            1, 2) + mask1.unsqueeze(1), dim=-1)
        x2_align = torch.matmul(weight2, x1)

        # x_align: batch_size * seq_len * hidden_size
        return x1_align, x2_align

    def forward(self, sent1, sent2):
        """ sent1: batch * la sent2: batch * lb """
        mask1, mask2 = sent1.eq(0), sent2.eq(0)
        x1, x2 = self.embedding(sent1), self.embedding(sent2)
        # x1, x2 = self.bn_embed(x1), self.bn_embed(x2)

        # batch * [la | lb] * dim
        o1, _ = self.lstm1(x1)
        o2, _ = self.lstm1(x2)

        # Local Inference
        # batch * [la | lb] * hidden_size
        q1_align, q2_align = self.soft_attention_align(o1, o2, mask1, mask2)

        # Inference Composition
        # batch_size * seq_len * (8 * hidden_size)
        q1_combined = torch.cat([o1, q1_align, self.submul(o1, q1_align)], -1)
        q2_combined = torch.cat([o2, q2_align, self.submul(o2, q2_align)], -1)

        # batch_size * seq_len * (2 * hidden_size)
        q1_compose, _ = self.lstm2(q1_combined)
        q2_compose, _ = self.lstm2(q2_combined)

        # Aggregate
        q1_rep = self.apply_multiple(q1_compose)
        q2_rep = self.apply_multiple(q2_compose)

        # Classifier
        similarity = self.fc(torch.cat([q1_rep, q2_rep], -1))
        return similarity


def take_snapshot(model, path):
    """保存模型訓練結果到Drive上,防止Colab重置後丟失"""
    torch.save(model.state_dict(), path)
    print(f"Snapshot has been saved to {path}")


def load_snapshot(model, path):
    model.load_state_dict(torch.load(path))
    print(f"Load snapshot from {path} done.")


model = ESIM(args)
# if os.path.exists(args.snapshot):
# load_snapshot(model, args.snapshot)

# Embedding向量不訓練
model.embedding.weight.data.copy_(TEXT.vocab.vectors)
model.embedding.weight.requires_grad = False

model.to(args.device)
複製代碼
ESIM(
  (embedding): Embedding(34193, 100)
  (lstm1): LSTM(100, 300, batch_first=True, bidirectional=True)
  (lstm2): LSTM(2400, 300, batch_first=True, bidirectional=True)
  (fc): Sequential(
    (0): BatchNorm1d(2400, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (1): Linear(in_features=2400, out_features=100, bias=True)
    (2): ELU(alpha=1.0, inplace)
    (3): BatchNorm1d(100, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (4): Dropout(p=0.5)
    (5): Linear(in_features=100, out_features=100, bias=True)
    (6): ELU(alpha=1.0, inplace)
    (7): BatchNorm1d(100, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (8): Dropout(p=0.5)
    (9): Linear(in_features=100, out_features=4, bias=True)
    (10): Softmax()
  )
)
複製代碼

訓練階段

這裏有幾個細節:

batch.label的形狀

batch.label是形狀爲(batch)的一維向量;而Y_pred是形狀爲batch \times 4的二維向量,使用.topk(1).indices提取最大值後仍然是二維向量。

因此若是不拓展batch.label的維度,PyTorch會自動廣播batch.label,最終獲得的結果再也不是batch \times 1,而是batch \times batch,那麼最後計算出來的準確率會大到離譜。這是下面代碼的含義:

(Y_pred.topk(1).indices == batch.label.unsqueeze(1))
複製代碼

tensor和標量的除法

在Python3.6中,除法符號/的結果默認是浮點型的,可是PyTorch並非這樣,這也是另外一個很容易忽視的細節。

(Y_pred.topk(1).indices == batch.label.unsqueeze(1))
複製代碼

上面代碼結果能夠看做是bool類型(其實是torch.uint8)。調用.sum()求和以後的結果類型是torch.LongTensor。可是PyTorch中整數除法是不會獲得浮點數的。

# 就像下面的代碼會獲得0同樣
In [2]: torch.LongTensor([1]) / torch.LongTensor([5])
Out[2]: tensor([0])
複製代碼

變量acc累加了每個batch中計算正確的樣本數量,因爲自動類型轉換,acc如今指向torch.LongTensor類型,因此最後計算準確率的時候必定要用.item()提取出整數值。若是忽視了這個細節,那麼最後獲得的準確率是0。

def training(model, data_iter, loss_fn, optimizer):
    """訓練部分"""
    model.train()
    data_iter.init_epoch()
    acc, cnt, avg_loss = 0, 0, 0.0

    for batch in data_iter:
        Y_pred = model(batch.premise, batch.hypothesis)
        loss = loss_fn(Y_pred, batch.label)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        avg_loss += loss.item() / len(data_iter)
        # unsqueeze是由於label是一維向量,下同
        acc += (Y_pred.topk(1).indices == batch.label.unsqueeze(1)).sum()
        cnt += len(batch.premise)

    return avg_loss, (acc.item() / cnt)  # 若是不提取item,會致使accuracy爲0


def validating(model, data_iter, loss_fn):
    """驗證部分"""
    model.eval()
    data_iter.init_epoch()
    acc, cnt, avg_loss = 0, 0, 0.0

    with torch.set_grad_enabled(False):
        for batch in data_iter:
            Y_pred = model(batch.premise, batch.hypothesis)

            avg_loss += loss_fn(Y_pred, batch.label).item() / len(data_iter)
            acc += (Y_pred.topk(1).indices == batch.label.unsqueeze(1)).sum()
            cnt += len(batch.premise)

    return avg_loss, (acc.item() / cnt)


def train(model, train_data, val_data):
    """訓練過程"""
    optimizer = optim.Adam(model.parameters(), lr=args.lr)
    loss_fn = nn.CrossEntropyLoss()
    scheduler = optim.lr_scheduler.ReduceLROnPlateau(
        optimizer, mode='min', factor=0.5, patience=args.scheduler_step, verbose=True)

    train_losses, val_losses, train_accs, val_accs = [], [], [], []

    # Before train
    tic = time.time()
    train_loss, train_acc = validating(model, train_data, loss_fn)
    val_loss, val_acc = validating(model, val_data, loss_fn)
    train_losses.append(train_loss)
    val_losses.append(val_loss)
    train_accs.append(train_acc)
    val_accs.append(val_acc)
    min_val_loss = val_loss
    print(f"Epoch: 0/{args.epoch}\t"
          f"Train loss: {train_loss:.4f}\tacc: {train_acc:.4f}\t"
          f"Val loss: {val_loss:.4f}\tacc: {val_acc:.4f}\t"
          f"Cost time: {(time.time()-tic):.2f}s")

    try:
        for epoch in range(args.epoch):
            tic = time.time()
            train_loss, train_acc = training(
                model, train_data, loss_fn, optimizer)
            val_loss, val_acc = validating(model, val_data, loss_fn)
            train_losses.append(train_loss)
            val_losses.append(val_loss)
            train_accs.append(train_acc)
            val_accs.append(val_acc)
            scheduler.step(val_loss)

            print(f"Epoch: {epoch + 1}/{args.epoch}\t"
                  f"Train loss: {train_loss:.4f}\tacc: {train_acc:.4f}\t"
                  f"Val loss: {val_loss:.4f}\tacc: {val_acc:.4f}\t"
                  f"Cost time: {(time.time()-tic):.2f}s")

            if val_loss < min_val_loss:  # 即時保存
                min_val_loss = val_loss
                take_snapshot(model, args.snapshot)

            # Early-stop:
            # if len(val_losses) >= 3 and (val_loss - min_val_loss) / min_val_loss > args.early_stop_ratio:
            # print(f"Early stop with best loss: {min_val_loss:.5f}")
            # break
            # args.early_stop_ratio *= args.early_stop_ratio

    except KeyboardInterrupt:
        print("Interrupted by user")

    return train_losses, val_losses, train_accs, val_accs


train_losses, val_losses, train_accs, val_accs = train(
    model, train_iter, dev_iter)
複製代碼
Epoch: 0/64	Train loss: 1.3871	acc: 0.3335	Val loss: 1.3871	acc: 0.3331	Cost time: 364.32s
Epoch: 1/64	Train loss: 1.0124	acc: 0.7275	Val loss: 0.9643	acc: 0.7760	Cost time: 998.41s
Snapshot has been saved to /content/drive/My Drive/Colab Notebooks/ESIM.pt
Epoch: 2/64	Train loss: 0.9476	acc: 0.7925	Val loss: 0.9785	acc: 0.7605	Cost time: 1003.32s
Epoch: 3/64	Train loss: 0.9305	acc: 0.8100	Val loss: 0.9204	acc: 0.8217	Cost time: 999.49s
Snapshot has been saved to /content/drive/My Drive/Colab Notebooks/ESIM.pt
Epoch: 4/64	Train loss: 0.9183	acc: 0.8227	Val loss: 0.9154	acc: 0.8260	Cost time: 1000.97s
Snapshot has been saved to /content/drive/My Drive/Colab Notebooks/ESIM.pt
Epoch: 5/64	Train loss: 0.9084	acc: 0.8329	Val loss: 0.9251	acc: 0.8156	Cost time: 996.99s
....
Epoch: 21/64	Train loss: 0.8236	acc: 0.9198	Val loss: 0.8912	acc: 0.8514	Cost time: 992.48s
Epoch: 22/64	Train loss: 0.8210	acc: 0.9224	Val loss: 0.8913	acc: 0.8514	Cost time: 996.35s
Epoch    22: reducing learning rate of group 0 to 5.0000e-05.
Epoch: 23/64	Train loss: 0.8195	acc: 0.9239	Val loss: 0.8940	acc: 0.8485	Cost time: 1000.48s
Epoch: 24/64	Train loss: 0.8169	acc: 0.9266	Val loss: 0.8937	acc: 0.8490	Cost time: 1006.78s
Interrupted by user
複製代碼

繪製Loss-Accuracy曲線

iters = [i + 1 for i in range(len(train_losses))]

# 防止KeyboardInterrupt的打斷致使兩組loss不等長
min_len = min(len(train_losses), len(val_losses))

# 繪製雙縱座標圖
fig, ax1 = plt.subplots()
ax1.plot(iters, train_losses[: min_len], '-', label='train loss')
ax1.plot(iters, val_losses[: min_len], '-.', label='val loss')
ax1.set_xlabel("Epoch")
ax1.set_ylabel("Loss")

# 建立子座標軸
ax2 = ax1.twinx()
ax2.plot(iters, train_accs[: min_len], ':', label='train acc')
ax2.plot(iters, val_accs[: min_len], '--', label='val acc')
ax2.set_ylabel("Accuracy")

# 爲雙縱座標圖添加圖例
handles1, labels1 = ax1.get_legend_handles_labels()
handles2, labels2 = ax2.get_legend_handles_labels()
plt.legend(handles1 + handles2, labels1 + labels2, loc='center right')
plt.show()
複製代碼

Loss曲線

預測

模型除了訓練出結果之外,還須要能在實際中運用。

nlp = spacy.load("en")

# 從新加載以前訓練結果最棒的模型參數
load_snapshot(model, args.snapshot)
# 小規模數據仍是cpu跑得快
model.to(torch.device("cpu"))

with open(r"/content/drive/My Drive/Colab Notebooks/vocab_label_stoi.pkl", "rb") as f:
    vocab_stoi, label_stoi = pickle.load(f)
複製代碼
Load snapshot from /content/drive/My Drive/Colab Notebooks/ESIM.pt done.
複製代碼
def sentence2tensor(stoi, sent1: str, sent2: str):
    """將兩個句子轉化爲張量"""
    sent1 = [str(token) for token in nlp(sent1.lower())]
    sent2 = [str(token) for token in nlp(sent2.lower())]

    tokens1, tokens2 = [], []

    for token in sent1:
        tokens1.append(stoi[token])

    for token in sent2:
        tokens2.append(stoi[token])

    delt_len = len(tokens1) - len(tokens2)

    if delt_len > 0:
        tokens2.extend([1] * delt_len)
    else:
        tokens1.extend([1] * (-delt_len))

    tensor1 = torch.LongTensor(tokens1).unsqueeze(0)
    tensor2 = torch.LongTensor(tokens2).unsqueeze(0)

    return tensor1, tensor2


def use(model, premise: str, hypothsis: str):
    """使用模型測試"""
    label_itos = {0: '<unk>', 1: 'entailment',
                  2: 'contradiction', 3: 'neutral'}

    model.eval()
    with torch.set_grad_enabled(False):
        tensor1, tensor2 = sentence2tensor(vocab_stoi, premise, hypothsis)
        predict = model(tensor1, tensor2)
        top1 = predict.topk(1).indices.item()

    print(f"The answer is '{label_itos[top1]}'")

    prob = predict.cpu().squeeze().numpy()
    plt.bar(["<unk>", "entailment", "contradiction", "neutral"], prob)
    plt.ylabel("probability")
    plt.show()
複製代碼

輸入兩個句子以後,打印最可能的推測結果,並用直方圖顯示每種推測的機率

# 蘊含
use(model,
    "A statue at a museum that no seems to be looking at.",
    "There is a statue that not many people seem to be interested in.")

# 對立
use(model,
    "A land rover is being driven across a river.",
    "A sedan is stuck in the middle of a river.")

# 中立
use(model,
    "A woman with a green headscarf, blue shirt and a very big grin.",
    "The woman is young.")
複製代碼
The answer is 'entailment'
複製代碼

The answer is 'contradiction'
複製代碼

The answer is 'neutral'
複製代碼

相關文章
相關標籤/搜索