keras小程序(一),用cnn作分類

 

爲了顯示代碼的友好性,我會把代碼的每一步運行的結果顯示出來,讓讀者能夠有一個直觀的認識,瞭解每一步代碼的意思,下面我會先以幾條數據爲例,讓讀者能夠直觀的認識每段代碼執行出來的效果,文章末我會已一個大數據集實驗,而且給出實驗效果,讀者能夠參考網絡

1、  首先,筆者的數據存放在兩個excel,一個是存放的是pos評論,一個是neg評論。分別是poss.xlsxnegg.xlsx,裏面的內容以下:app

poss.xls的內容是:框架

        neg.xls的內容是:dom

2、  而後,讀入數據了,具體代碼以下ide

import numpy as np

import pandas as pd

pos = pd.read_excel('poss.xlsx', header=None)#讀入數據到pandas數據框架

pos['label'] = 1#添加標籤列爲1

neg = pd.read_excel('negg.xlsx', header=None)

neg['label'] = 0#添加標籤列爲0

all= pos.append(neg, ignore_index=True)#合併預料
View Code

 

print(all) 這段代碼運行的效果是這樣的:大數據

接下來是分詞了lua

cw=lambda s: list(jieba.cut(s))#調用結巴分詞

all['words'] = all[0].apply(cw)
View Code

print(all['words'])這段代碼運行的效果是這樣的:spa

 

把全部的詞組成一個大的詞典excel

 

all['words'] = all[0].apply(cw)
content = []
for i in all['words']:
    content.extend(i)
abc = pd.Series(content).value_counts()

 

給每一個詞一個固定的編號code

 

abc[:] = range(1, len(abc)+1)
abc[''] = 0 
maxlen=10
def doc2num(s, maxlen):
    s = [i for i in s if i in abc.index]
    s = s[:maxlen] + ['']*max(0, maxlen-len(s))
    return list(abc[s])
all['doc2num'] = all['words'].apply(lambda s: doc2num(s, maxlen))
View Code

結果以下:

打亂數據,而且生成keras的輸入數據

 

idx = range(len(all))
np.random.shuffle(idx)
all= all.loc[idx]
x = np.array(list(all['doc2num']))
y = np.array(list(all['label']))
y = y.reshape((-1,1)) 
View Code

 

首先,咱們看下x裏面的數據形式,以下圖:

接下來就是用keras搭建卷積神經網絡模型了

model = Sequential()
model.add(Embedding(len(abc), embedding_vecor_length,input_length=maxlen))
model.add(Convolution1D(nb_filter=nb_filter,
                        filter_length=filter_length,
                        border_mode='valid',
                        activation='relu'))
model.add(GlobalMaxPooling1D())


model.add(Dense(128))
model.add(Dropout(0.2))
model.add(Activation('relu'))

model.add(Dense(1))
model.add(Activation('sigmoid'))
model.compile(loss='binary_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])
model.fit(X_train, y_train,
          batch_size=batch_size,
          nb_epoch=nb_epoch,
          validation_data=(X_test, y_test))

最後就是對1000條積極評論和1000條消極評論的情感分類代碼了,代碼以下:

from __future__ import print_functionimport jieba
import pandas as pd

import numpy as np
np.random.seed(1337)  # for reproducibility

from keras.preprocessing import sequence
from keras.utils import np_utils
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Embedding
from keras.layers import Convolution1D, GlobalMaxPooling1D
embedding_vecor_length = 32 
maxlen =200  
min_count=5
batch_size = 32
nb_epoch =10
nb_filter =128 
filter_length = 3 
pos = pd.read_excel('poss.xls', header=None)
pos['label'] = 1
neg = pd.read_excel('negg.xls', header=None)
neg['label'] = 0
all= pos.append(neg, ignore_index=True)
cw=lambda s: list(jieba.cut(s))
all['words'] = all[0].apply(cw)
content = []
for i in all['words']:
    content.extend(i)
abc = pd.Series(content).value_counts()
abc[:] = range(1, len(abc)+1)
abc[''] = 0 
def doc2num(s, maxlen):
    s = [i for i in s if i in abc.index]
    s = s[:maxlen] + ['']*max(0, maxlen-len(s))
    return list(abc[s])
all['doc2num'] = all['words'].apply(lambda s: doc2num(s, maxlen))
idx = range(len(all))
np.random.shuffle(idx)
all= all.loc[idx]
x = np.array(list(all['doc2num']))
y = np.array(list(all['label']))
y = y.reshape((-1,1)) 
train_num=1600
X_train=x[:train_num]
y_train=y[:train_num]
X_test=x[train_num:]
y_test=y[train_num:]
model = Sequential()
model.add(Embedding(len(abc), embedding_vecor_length,input_length=maxlen))
model.add(Convolution1D(nb_filter=nb_filter,
                        filter_length=filter_length,
                        border_mode='valid',
                        activation='relu'))
model.add(GlobalMaxPooling1D())


model.add(Dense(128))
model.add(Dropout(0.2))
model.add(Activation('relu'))

model.add(Dense(1))
model.add(Activation('sigmoid'))
model.compile(loss='binary_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])
model.fit(X_train, y_train,
          batch_size=batch_size,
          nb_epoch=nb_epoch,
          validation_data=(X_test, y_test))
score, acc = model.evaluate(X_test, y_test,verbose=0)
print('Test score:', score)
print('Test accuracy:', acc)



print('Train...')

model.fit(X_train, y_train, batch_size=batch_size,nb_epoch=nb_epoch,validation_data=(X_test, y_test))
score, acc = model.evaluate(X_test, y_test,verbose=0)
print('Test score:', score)
print('Test accuracy:', acc)

 

結果以下:

相關文章
相關標籤/搜索