本文是泰坦尼克號上的生存機率預測,這是基於Kaggle上的一個經典比賽項目。bash
數據集:url
1.Kaggle泰坦尼克號項目頁面下載數據:https://www.kaggle.com/c/titanicspa
2.網盤地址:https://pan.baidu.com/s/1BfRZdCz6Z1XR6aDXxiHmHA 提取碼:jzb3 .net
數據讀取:code
#%% import tensorflow as tf import keras import pandas as pd import numpy as np data = pd.read_csv("titanic/train.csv") print(data.head()) print(data.describe())
數據處理:blog
#%% strs = "Survived Pclass Sex Age SibSp Parch Fare Embarked" clos = strs.split(" ") print(clos) #%% x_datas = data[clos] print(x_datas.head()) #%% print(x_datas.isnull().sum()) #%% x_datas["Age"] = x_datas["Age"].fillna(x_datas["Age"].mean()) x_datas["Embarked"] = x_datas["Embarked"].fillna(x_datas["Embarked"].mode()[0]) #x_datas["Sex"] = pd.get_dummies(x_datas["Sex"]) x_datas = pd.get_dummies(x_datas,columns=["Pclass","Sex","Embarked"]) x_datas["Age"]/=100 x_datas["Fare"]/=100 print(x_datas.isnull().sum()) print(x_datas.head()) #%% seq = int(0.75*(len(x_datas))) X ,Y = x_datas.iloc[:,1:],x_datas.iloc[:,0] X_train,Y_train,X_test,Y_test = X[:seq],Y[:seq],X[seq:],Y[seq:]
模型搭建:get
#%% strs = "Survived Pclass Sex Age SibSp Parch Fare Embarked" clos = strs.split(" ") print(clos) #%% x_datas = data[clos] print(x_datas.head()) #%% print(x_datas.isnull().sum()) #%% x_datas["Age"] = x_datas["Age"].fillna(x_datas["Age"].mean()) x_datas["Embarked"] = x_datas["Embarked"].fillna(x_datas["Embarked"].mode()[0]) #x_datas["Sex"] = pd.get_dummies(x_datas["Sex"]) x_datas = pd.get_dummies(x_datas,columns=["Pclass","Sex","Embarked"]) x_datas["Age"]/=100 x_datas["Fare"]/=100 print(x_datas.isnull().sum()) print(x_datas.head()) #%% seq = int(0.75*(len(x_datas))) X ,Y = x_datas.iloc[:,1:],x_datas.iloc[:,0] X_train,Y_train,X_test,Y_test = X[:seq],Y[:seq],X[seq:],Y[seq:]
模型訓練與評估:pandas
#%% strs = "Survived Pclass Sex Age SibSp Parch Fare Embarked" clos = strs.split(" ") print(clos) #%% x_datas = data[clos] print(x_datas.head()) #%% print(x_datas.isnull().sum()) #%% x_datas["Age"] = x_datas["Age"].fillna(x_datas["Age"].mean()) x_datas["Embarked"] = x_datas["Embarked"].fillna(x_datas["Embarked"].mode()[0]) #x_datas["Sex"] = pd.get_dummies(x_datas["Sex"]) x_datas = pd.get_dummies(x_datas,columns=["Pclass","Sex","Embarked"]) x_datas["Age"]/=100 x_datas["Fare"]/=100 print(x_datas.isnull().sum()) print(x_datas.head()) #%% seq = int(0.75*(len(x_datas))) X ,Y = x_datas.iloc[:,1:],x_datas.iloc[:,0] X_train,Y_train,X_test,Y_test = X[:seq],Y[:seq],X[seq:],Y[seq:]
_________________________________________________________________
Layer (type) Output Shape Param # ================================================================= dense_1 (Dense) (None, 64) 832 _________________________________________________________________ dropout_1 (Dropout) (None, 64) 0 _________________________________________________________________ dense_2 (Dense) (None, 16) 1040 _________________________________________________________________ dense_3 (Dense) (None, 2) 34 ================================================================= Total params: 1,906 Trainable params: 1,906 Non-trainable params: 0 _________________________________________________________________ ... Epoch 96/100 534/534 [==============================] - 0s 80us/step - loss: 0.3870 - acc: 0.8277 - val_loss: 0.5083 - val_acc: 0.7612 Epoch 97/100 534/534 [==============================] - 0s 80us/step - loss: 0.3921 - acc: 0.8352 - val_loss: 0.5070 - val_acc: 0.7687 Epoch 98/100 534/534 [==============================] - 0s 82us/step - loss: 0.3940 - acc: 0.8371 - val_loss: 0.5102 - val_acc: 0.7687 Epoch 99/100 534/534 [==============================] - 0s 78us/step - loss: 0.3996 - acc: 0.8277 - val_loss: 0.5106 - val_acc: 0.7687 Epoch 100/100 534/534 [==============================] - 0s 80us/step - loss: 0.3892 - acc: 0.8352 - val_loss: 0.5082 - val_acc: 0.7612 223/223 [==============================] - 0s 63us/step test loss is 0.389338, acc 0.829596