keras小程序（一），用cnn作分類

時間 2019-12-12

標籤 keras 程序 cnn 分類简体版

原文原文鏈接

爲了顯示代碼的友好性，我會把代碼的每一步運行的結果顯示出來，讓讀者能夠有一個直觀的認識，瞭解每一步代碼的意思，下面我會先以幾條數據爲例，讓讀者能夠直觀的認識每段代碼執行出來的效果，文章末我會已一個大數據集實驗，而且給出實驗效果，讀者能夠參考網絡

1、首先，筆者的數據存放在兩個excel,一個是存放的是pos評論，一個是neg評論。分別是poss.xlsx和negg.xlsx，裏面的內容以下：app

poss.xls的內容是：框架

neg.xls的內容是：dom

2、而後，讀入數據了，具體代碼以下ide

import numpy as np

import pandas as pd

pos = pd.read_excel('poss.xlsx', header=None)#讀入數據到pandas數據框架

pos['label'] = 1#添加標籤列爲1

neg = pd.read_excel('negg.xlsx', header=None)

neg['label'] = 0#添加標籤列爲0

all= pos.append(neg, ignore_index=True)#合併預料

View Code

print(all) 這段代碼運行的效果是這樣的：大數據

接下來是分詞了lua

cw=lambda s: list(jieba.cut(s))#調用結巴分詞

all['words'] = all[0].apply(cw)

View Code

print(all['words'])這段代碼運行的效果是這樣的：spa

把全部的詞組成一個大的詞典excel

all['words'] = all[0].apply(cw)
content = []
for i in all['words']:
    content.extend(i)
abc = pd.Series(content).value_counts()

給每一個詞一個固定的編號code

abc[:] = range(1, len(abc)+1)
abc[''] = 0 
maxlen=10
def doc2num(s, maxlen):
    s = [i for i in s if i in abc.index]
    s = s[:maxlen] + ['']*max(0, maxlen-len(s))
    return list(abc[s])
all['doc2num'] = all['words'].apply(lambda s: doc2num(s, maxlen))

View Code

結果以下：

打亂數據，而且生成keras的輸入數據

idx = range(len(all))
np.random.shuffle(idx)
all= all.loc[idx]
x = np.array(list(all['doc2num']))
y = np.array(list(all['label']))
y = y.reshape((-1,1))

View Code

首先，咱們看下x裏面的數據形式，以下圖：

接下來就是用keras搭建卷積神經網絡模型了

model = Sequential()
model.add(Embedding(len(abc), embedding_vecor_length,input_length=maxlen))
model.add(Convolution1D(nb_filter=nb_filter,
                        filter_length=filter_length,
                        border_mode='valid',
                        activation='relu'))
model.add(GlobalMaxPooling1D())


model.add(Dense(128))
model.add(Dropout(0.2))
model.add(Activation('relu'))

model.add(Dense(1))
model.add(Activation('sigmoid'))
model.compile(loss='binary_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])
model.fit(X_train, y_train,
          batch_size=batch_size,
          nb_epoch=nb_epoch,
          validation_data=(X_test, y_test))

最後就是對1000條積極評論和1000條消極評論的情感分類代碼了，代碼以下：

from __future__ import print_functionimport jieba
import pandas as pd

import numpy as np
np.random.seed(1337)  # for reproducibility

from keras.preprocessing import sequence
from keras.utils import np_utils
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Embedding
from keras.layers import Convolution1D, GlobalMaxPooling1D
embedding_vecor_length = 32 
maxlen =200  
min_count=5
batch_size = 32
nb_epoch =10
nb_filter =128 
filter_length = 3 
pos = pd.read_excel('poss.xls', header=None)
pos['label'] = 1
neg = pd.read_excel('negg.xls', header=None)
neg['label'] = 0
all= pos.append(neg, ignore_index=True)
cw=lambda s: list(jieba.cut(s))
all['words'] = all[0].apply(cw)
content = []
for i in all['words']:
    content.extend(i)
abc = pd.Series(content).value_counts()
abc[:] = range(1, len(abc)+1)
abc[''] = 0 
def doc2num(s, maxlen):
    s = [i for i in s if i in abc.index]
    s = s[:maxlen] + ['']*max(0, maxlen-len(s))
    return list(abc[s])
all['doc2num'] = all['words'].apply(lambda s: doc2num(s, maxlen))
idx = range(len(all))
np.random.shuffle(idx)
all= all.loc[idx]
x = np.array(list(all['doc2num']))
y = np.array(list(all['label']))
y = y.reshape((-1,1)) 
train_num=1600
X_train=x[:train_num]
y_train=y[:train_num]
X_test=x[train_num:]
y_test=y[train_num:]
model = Sequential()
model.add(Embedding(len(abc), embedding_vecor_length,input_length=maxlen))
model.add(Convolution1D(nb_filter=nb_filter,
                        filter_length=filter_length,
                        border_mode='valid',
                        activation='relu'))
model.add(GlobalMaxPooling1D())


model.add(Dense(128))
model.add(Dropout(0.2))
model.add(Activation('relu'))

model.add(Dense(1))
model.add(Activation('sigmoid'))
model.compile(loss='binary_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])
model.fit(X_train, y_train,
          batch_size=batch_size,
          nb_epoch=nb_epoch,
          validation_data=(X_test, y_test))
score, acc = model.evaluate(X_test, y_test,verbose=0)
print('Test score:', score)
print('Test accuracy:', acc)



print('Train...')

model.fit(X_train, y_train, batch_size=batch_size,nb_epoch=nb_epoch,validation_data=(X_test, y_test))
score, acc = model.evaluate(X_test, y_test,verbose=0)
print('Test score:', score)
print('Test accuracy:', acc)

結果以下：

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。