【NLP】基礎模型之詞向量

時間 2019-11-08

標籤 NLP 基礎模型之詞向量简体版

原文原文鏈接

愈來愈以爲基礎過重要了，要成爲一個合格的算法工程師而不是調包俠，必定要知道各個基礎模型的HOW&WHY，畢竟那些模型都是當年的SOTA，他們的思想也對以後的NLP模型影響很大。最近找到了一個還不錯的nlp-tutorial，準備抽時間過一遍基礎模型，模型的大體思想以及數學公式可能就帶過了，主要是實現上的細節。git

1. NNLM

1.1 思想

經過神經語言模型學習詞向量，網絡結構如圖：github

解決了統計語言模型（n-gram model）的如下問題：算法

維度災難：高維下的數據稀疏會致使不少統計機率爲0，本文提出了分佈式詞表示
長距離依賴：n-gram通常最多爲3
詞的類似關係：在本文中，詞以向量的方式存在，經過LM訓練後類似的詞會具備類似的詞向量

1.2 源碼

$y = b+Wx+Utanh(d+Hx) \\$

class NNLM(nn.Module):
    def __init__(self):
        super(NNLM, self).__init__()
        self.C = nn.Embedding(n_class, m)
        self.H = nn.Parameter(torch.randn(n_step * m, n_hidden).type(dtype))
        self.W = nn.Parameter(torch.randn(n_step * m, n_class).type(dtype))
        self.d = nn.Parameter(torch.randn(n_hidden).type(dtype))
        self.U = nn.Parameter(torch.randn(n_hidden, n_class).type(dtype))
        self.b = nn.Parameter(torch.randn(n_class).type(dtype))

    def forward(self, X):
        X = self.C(X)
        X = X.view(-1, n_step * m) # [batch_size, n_step * emb_dim]
        tanh = torch.tanh(self.d + torch.mm(X, self.H)) # [batch_size, n_hidden]
        output = self.b + torch.mm(X, self.W) + torch.mm(tanh, self.U) # [batch_size, n_class]
        return output複製代碼

要注意的點：網絡