在NLP中,序列標註算法是常見的深度學習模型,可是,對於序列標註算法的評估,咱們真的熟悉嗎?
在本文中,筆者將會序列標註算法的模型效果評估方法和seqeval
的使用。python
在序列標註算法中,通常咱們會造成以下的序列列表,以下:git
['O', 'O', 'B-MISC', 'I-MISC', 'B-MISC', 'I-MISC', 'O', 'B-PER', 'I-PER']
通常序列標註算法的格式有BIO
, IOBES
,BMES
等。其中,實體
指的是從B開頭標籤開始的,同一類型(好比:PER/LOC/ORG)的,非O的連續標籤序列。
常見的序列標註算法的模型效果評估指標有準確率(accuracy)、查準率(percision)、召回率(recall)、F1值等,計算的公式以下:github
舉個例子,咱們有以下的真實序列y_true
和預測序列y_pred
,以下:算法
y_true = ['O', 'O', 'O', 'B-MISC', 'I-MISC', 'I-MISC', 'O', 'B-PER', 'I-PER'] y_pred = ['O', 'O', 'B-MISC', 'I-MISC', 'B-MISC', 'I-MISC', 'O', 'B-PER', 'I-PER']
列表中一個有9個元素,其中預測對的元素個數爲6個,那麼準確率爲2/3。標註的實體總個數爲2個,預測的實體總個數爲3個,預測正確的實體個數爲1個,那麼precision=1/3, recall=1/2, F1=0.4。微信
通常咱們的序列標註算法,是用conlleval.pl
腳本實現,但這是用perl語言實現的。在Python中,也有相應的序列標註算法的模型效果評估的第三方模塊,那就是seqeval
,其官網網址爲:https://pypi.org/project/seqeval/0.0.3/ 。
seqeval
支持BIO
, IOBES
標註模式,可用於命名實體識別,詞性標註,語義角色標註等任務的評估。
官網文檔中給出了兩個例子,筆者修改以下:
例子1:app
# -*- coding: utf-8 -*- from seqeval.metrics import f1_score from seqeval.metrics import precision_score from seqeval.metrics import accuracy_score from seqeval.metrics import recall_score from seqeval.metrics import classification_report y_true = ['O', 'O', 'O', 'B-MISC', 'I-MISC', 'I-MISC', 'O', 'B-PER', 'I-PER'] y_pred = ['O', 'O', 'B-MISC', 'I-MISC', 'B-MISC', 'I-MISC', 'O', 'B-PER', 'I-PER'] print("accuary: ", accuracy_score(y_true, y_pred)) print("p: ", precision_score(y_true, y_pred)) print("r: ", recall_score(y_true, y_pred)) print("f1: ", f1_score(y_true, y_pred)) print("classification report: ") print(classification_report(y_true, y_pred))
輸出結果以下:post
accuary: 0.6666666666666666 p: 0.3333333333333333 r: 0.5 f1: 0.4 classification report: precision recall f1-score support MISC 0.00 0.00 0.00 1 PER 1.00 1.00 1.00 1 micro avg 0.33 0.50 0.40 2 macro avg 0.50 0.50 0.50 2
例子2:學習
# -*- coding: utf-8 -*- from seqeval.metrics import f1_score from seqeval.metrics import precision_score from seqeval.metrics import accuracy_score from seqeval.metrics import recall_score from seqeval.metrics import classification_report y_true = [['O', 'O', 'O', 'B-MISC', 'I-MISC', 'I-MISC', 'O'], ['B-PER', 'I-PER']] y_pred = [['O', 'O', 'B-MISC', 'I-MISC', 'B-MISC', 'I-MISC', 'O'], ['B-PER', 'I-PER']] print("accuary: ", accuracy_score(y_true, y_pred)) print("p: ", precision_score(y_true, y_pred)) print("r: ", recall_score(y_true, y_pred)) print("f1: ", f1_score(y_true, y_pred)) print("classification report: ") print(classification_report(y_true, y_pred))
輸出結果同上。測試
筆者一年多年寫過文章:用深度學習實現命名實體識別(NER), 咱們對模型訓練部分的代碼加以改造,使之在訓練過程當中能輸出F1值。
在Github上下載項目DL_4_NER
,網址爲:https://github.com/percent4/DL_4_NER 。修改utils.py中的文件夾路徑,以及模型訓練部分的代碼(DL_4_NER/Bi_LSTM_Model_training.py)以下:.net
# -*- coding: utf-8 -*- import pickle import numpy as np import pandas as pd from utils import BASE_DIR, CONSTANTS, load_data from data_processing import data_processing from keras.utils import np_utils, plot_model from keras.models import Sequential from keras.preprocessing.sequence import pad_sequences from keras.layers import Bidirectional, LSTM, Dense, Embedding, TimeDistributed # 模型輸入數據 def input_data_for_model(input_shape): # 數據導入 input_data = load_data() # 數據處理 data_processing() # 導入字典 with open(CONSTANTS[1], 'rb') as f: word_dictionary = pickle.load(f) with open(CONSTANTS[2], 'rb') as f: inverse_word_dictionary = pickle.load(f) with open(CONSTANTS[3], 'rb') as f: label_dictionary = pickle.load(f) with open(CONSTANTS[4], 'rb') as f: output_dictionary = pickle.load(f) vocab_size = len(word_dictionary.keys()) label_size = len(label_dictionary.keys()) # 處理輸入數據 aggregate_function = lambda input: [(word, pos, label) for word, pos, label in zip(input['word'].values.tolist(), input['pos'].values.tolist(), input['tag'].values.tolist())] grouped_input_data = input_data.groupby('sent_no').apply(aggregate_function) sentences = [sentence for sentence in grouped_input_data] x = [[word_dictionary[word[0]] for word in sent] for sent in sentences] x = pad_sequences(maxlen=input_shape, sequences=x, padding='post', value=0) y = [[label_dictionary[word[2]] for word in sent] for sent in sentences] y = pad_sequences(maxlen=input_shape, sequences=y, padding='post', value=0) y = [np_utils.to_categorical(label, num_classes=label_size + 1) for label in y] return x, y, output_dictionary, vocab_size, label_size, inverse_word_dictionary # 定義深度學習模型:Bi-LSTM def create_Bi_LSTM(vocab_size, label_size, input_shape, output_dim, n_units, out_act, activation): model = Sequential() model.add(Embedding(input_dim=vocab_size + 1, output_dim=output_dim, input_length=input_shape, mask_zero=True)) model.add(Bidirectional(LSTM(units=n_units, activation=activation, return_sequences=True))) model.add(TimeDistributed(Dense(label_size + 1, activation=out_act))) model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy']) return model # 模型訓練 def model_train(): # 將數據集分爲訓練集和測試集,佔比爲9:1 input_shape = 60 x, y, output_dictionary, vocab_size, label_size, inverse_word_dictionary = input_data_for_model(input_shape) train_end = int(len(x)*0.9) train_x, train_y = x[0:train_end], np.array(y[0:train_end]) test_x, test_y = x[train_end:], np.array(y[train_end:]) # 模型輸入參數 activation = 'selu' out_act = 'softmax' n_units = 100 batch_size = 32 epochs = 10 output_dim = 20 # 模型訓練 lstm_model = create_Bi_LSTM(vocab_size, label_size, input_shape, output_dim, n_units, out_act, activation) lstm_model.fit(train_x, train_y, validation_data=(test_x, test_y), epochs=epochs, batch_size=batch_size, verbose=1) model_train()
模型訓練的結果以下(中間過程省略):
...... 12598/12598 [==============================] - 26s 2ms/step - loss: 0.0075 - acc: 0.9981 - val_loss: 0.2131 - val_acc: 0.9592
咱們修改代碼,在lstm_model.fit那一行修改代碼以下:
lables = ['O', 'B-MISC', 'I-MISC', 'B-ORG', 'I-ORG', 'B-PER', 'B-LOC', 'I-PER', 'I-LOC', 'sO'] id2label = dict(zip(range(len(lables)), lables)) callbacks = [F1Metrics(id2label)] lstm_model.fit(train_x, train_y, validation_data=(test_x, test_y), epochs=epochs, batch_size=batch_size, verbose=1, callbacks=callbacks)
此時輸出結果爲:
12598/12598 [==============================] - 26s 2ms/step - loss: 0.0089 - acc: 0.9978 - val_loss: 0.2145 - val_acc: 0.9560 - f1: 95.40 precision recall f1-score support MISC 0.9707 0.9833 0.9769 15844 PER 0.9080 0.8194 0.8614 1157 LOC 0.7517 0.8095 0.7795 677 ORG 0.8290 0.7289 0.7757 745 sO 0.7757 0.8300 0.8019 100 micro avg 0.9524 0.9556 0.9540 18523 macro avg 0.9520 0.9556 0.9535 18523
這就是seqeval的強大之處。
關於seqeval在Keras的使用,有不清楚的地方能夠參考該項目的Github網址:https://github.com/chakki-works/seqeval 。
感謝你們的閱讀,本次分享到此結束。
歡迎你們關注個人微信公衆號:Python爬蟲與算法
。