本身動手玩轉深度學習項目

時間 2019-12-07

標籤本身動手深度學習項目简体版

原文原文鏈接

摘要： 深度學習項目何其多？對於入門而言，合適且有趣的項目能產生積極的影響，本文總結了一份計算機視覺領域的幾個有趣項目，感興趣的能夠動手嘗試一下。

自從2012年AlexNet網絡在ImageNet挑戰賽上取得巨大成功以後，計算機視覺和深度學習等領域再一次迎來研究熱潮。計算機視覺，從字面意義上理解就是讓計算機等機器也具有人類視覺，研究讓機器進行圖像分類、目標檢測等。在這近十年裏，該領域取得的成就讓人大吃一驚，有些研究已經超越了人類的表現水平。html

對於想入門或者從事深度學習領域的工做者而言，通常是從計算機視覺入手，網上有不少資料去介紹理論方面的知識，進行相應的實踐也必不可少。本文總結一些在計算機視覺和深度學習領域的一些實踐項目，供讀者針對本身的感興趣點挑選並實踐。python

若是不熟悉上述術語，能夠從下面的文章中瞭解更多的相關信息：git

下面進行正文介紹：github

1.使用OpenCV進行手部動做跟蹤

項目地址：akshaybahadur21/HandMovementTracking算法

爲了執行視頻跟蹤，算法分析連續視頻幀並輸出幀之間的目標移動。針對這類問題，有各類各樣的算法，每種算法都有各自的優缺點。在選擇使用哪一種算法時，針對實際應用場景考慮是很重要的。視覺跟蹤系統有兩個主要組成部分：目標表示和局部化，以及過濾和數據關聯。

視頻跟蹤是使用相機隨時間定位移動物體（或多個物體）的過程。它有多種用途，好比：人機交互、安全和監控、視頻通訊和壓縮、加強現實、交通控制、醫學成像和視頻編輯等。數據庫

項目代碼以下：macos

import numpy as np
import cv2
import argparse
from collections import deque

cap=cv2.VideoCapture(0)

pts = deque(maxlen=64)

Lower_green = np.array([110,50,50])
Upper_green = np.array([130,255,255])
while True:
    ret, img=cap.read()
    hsv=cv2.cvtColor(img,cv2.COLOR_BGR2HSV)
    kernel=np.ones((5,5),np.uint8)
    mask=cv2.inRange(hsv,Lower_green,Upper_green)
    mask = cv2.erode(mask, kernel, iterations=2)
    mask=cv2.morphologyEx(mask,cv2.MORPH_OPEN,kernel)
    #mask=cv2.morphologyEx(mask,cv2.MORPH_CLOSE,kernel)
    mask = cv2.dilate(mask, kernel, iterations=1)
    res=cv2.bitwise_and(img,img,mask=mask)
    cnts,heir=cv2.findContours(mask.copy(),cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_SIMPLE)[-2:]
    center = None

    if len(cnts) > 0:
        c = max(cnts, key=cv2.contourArea)
        ((x, y), radius) = cv2.minEnclosingCircle(c)
        M = cv2.moments(c)
        center = (int(M["m10"] / M["m00"]), int(M["m01"] / M["m00"]))

        if radius > 5:
            cv2.circle(img, (int(x), int(y)), int(radius),(0, 255, 255), 2)
            cv2.circle(img, center, 5, (0, 0, 255), -1)

    pts.appendleft(center)
    for i in xrange (1,len(pts)):
        if pts[i-1]is None or pts[i] is None:
            continue
        thick = int(np.sqrt(len(pts) / float(i + 1)) * 2.5)
        cv2.line(img, pts[i-1],pts[i],(0,0,225),thick)

    cv2.imshow("Frame", img)
    cv2.imshow("mask",mask)
    cv2.imshow("res",res)

    k=cv2.waitKey(30) & 0xFF
    if k==32:
        break
# cleanup the camera and close any open windows
cap.release()
cv2.destroyAllWindows()

54行代碼便可解決這個問題，很簡單吧！該項目依賴Opencv包，須要在計算機中安裝OpenCV才能執行這個項目，下面是針對Mac系統、Ubuntu系統以及Windows系統的安裝方法：ubuntu

Mac系統：windows
- 如何在MacOS上安裝OpenCV3
Ubuntu系統：安全
- OpenCV：在Ubuntu中安裝OpenCV-Python
Windows系統：
- 在Windows中安裝OpenCV-Python

2.使用OpenCV進行睡意檢測

項目地址：akshaybahadur21/Drowsiness_Detection
睡意檢測對於長時間駕駛、工廠生產等場景很重要，因爲長時間開車時，容易產生睡意，進而可能致使事故。當用戶昏昏欲睡時，此代碼能夠經過檢測眼睛來發出警報。
依賴包

CV2
immutils
DLIB
SciPy

實現算法

每隻眼睛用6個(x,y)座標表示，從眼睛的左角開始，而後順時針繞眼睛工做：

條件

它檢查20個連續幀，若是眼睛縱橫比低於0.25，則生成警報。

關係

實現

3.使用Softmax迴歸進行數字手寫體識別

項目地址：akshaybahadur21/Digit-Recognizer
此代碼使用softmax迴歸對不一樣的數字進行分類，依賴的包有點多，能夠安裝下Conda，它包含了機器學習所須要的全部依賴包。

描述

Softmax迴歸（同義詞：多項Logistic、最大熵分類器、多類Logistic迴歸）是邏輯迴歸的推廣，能夠將其用於多類分類（假設類間是互斥的）。相反，在二分類任務中使用（標準）Logistic迴歸模型。

Python實現

使用的是MNIST數字手寫體數據集，每張圖像大小爲28×28，這裏嘗試使用三種方法對0到9的數字進行分類，分別是Logistic迴歸、淺層神經網絡和深度神經網絡。

Logistic迴歸：

import numpy as np
import matplotlib.pyplot as plt

def softmax(z):
    z -= np.max(z)
    sm = (np.exp(z).T / np.sum(np.exp(z), axis=1))
    return sm

def initialize(dim1, dim2):
    """
    :param dim: size of vector w initilazied with zeros
    :return:
    """
    w = np.zeros(shape=(dim1, dim2))
    b = np.zeros(shape=(10, 1))
    return w, b

def propagate(w, b, X, Y):
    """
    :param w: weights for w
    :param b: bias
    :param X: size of data(no of features, no of examples)
    :param Y: true label
    :return:
    """
    m = X.shape[1]  # getting no of rows

    # Forward Prop
    A = softmax((np.dot(w.T, X) + b).T)
    cost = (-1 / m) * np.sum(Y * np.log(A))

    # backwar prop
    dw = (1 / m) * np.dot(X, (A - Y).T)
    db = (1 / m) * np.sum(A - Y)

    cost = np.squeeze(cost)
    grads = {"dw": dw,
             "db": db}
    return grads, cost

def optimize(w, b, X, Y, num_iters, alpha, print_cost=False):
    """
    :param w: weights for w
    :param b: bias
    :param X: size of data(no of features, no of examples)
    :param Y: true label
    :param num_iters: number of iterations for gradient
    :param alpha:
    :return:
    """

    costs = []
    for i in range(num_iters):
        grads, cost = propagate(w, b, X, Y)
        dw = grads["dw"]
        db = grads["db"]
        w = w - alpha * dw
        b = b - alpha * db

        # Record the costs
        if i % 50 == 0:
            costs.append(cost)

        # Print the cost every 100 training examples
        if print_cost and i % 50 == 0:
            print("Cost after iteration %i: %f" % (i, cost))

    params = {"w": w,
              "b": b}

    grads = {"dw": dw,
             "db": db}

    return params, grads, costs

def predict(w, b, X):
    """
    :param w:
    :param b:
    :param X:
    :return:
    """
    # m = X.shape[1]
    # y_pred = np.zeros(shape=(1, m))
    # w = w.reshape(X.shape[0], 1)

    y_pred = np.argmax(softmax((np.dot(w.T, X) + b).T), axis=0)
    return y_pred

def model(X_train, Y_train, Y,X_test,Y_test, num_iters, alpha, print_cost):
    """
    :param X_train:
    :param Y_train:
    :param X_test:
    :param Y_test:
    :param num_iterations:
    :param learning_rate:
    :param print_cost:
    :return:
    """

    w, b = initialize(X_train.shape[0], Y_train.shape[0])
    parameters, grads, costs = optimize(w, b, X_train, Y_train, num_iters, alpha, print_cost)

    w = parameters["w"]
    b = parameters["b"]

    y_prediction_train = predict(w, b, X_train)
    y_prediction_test = predict(w, b, X_test)
    print("Train accuracy: {} %", sum(y_prediction_train == Y) / (float(len(Y))) * 100)
    print("Test accuracy: {} %", sum(y_prediction_test == Y_test) / (float(len(Y_test))) * 100)

    d = {"costs": costs,
         "Y_prediction_test": y_prediction_test,
         "Y_prediction_train": y_prediction_train,
         "w": w,
         "b": b,
         "learning_rate": alpha,
         "num_iterations": num_iters}

    # Plot learning curve (with costs)
    #costs = np.squeeze(d['costs'])
    #plt.plot(costs)
    #plt.ylabel('cost')
    #plt.xlabel('iterations (per hundreds)')
    #plt.title("Learning rate =" + str(d["learning_rate"]))
    #plt.plot()
    #plt.show()
    #plt.close()

    #pri(X_test.T, y_prediction_test)
    return d

def pri(X_test, y_prediction_test):
    example = X_test[2, :]
    print("Prediction for the example is ", y_prediction_test[2])
    plt.imshow(np.reshape(example, [28, 28]))
    plt.plot()
    plt.show()

淺層神經網絡：

import numpy as np
import matplotlib.pyplot as plt

def softmax(z):
    z -= np.max(z)
    sm = (np.exp(z).T / np.sum(np.exp(z),axis=1))
    return sm

def layers(X, Y):
    """
    :param X:
    :param Y:
    :return:
    """
    n_x = X.shape[0]
    n_y = Y.shape[0]
    return n_x, n_y

def initialize_nn(n_x, n_h, n_y):
    """
    :param n_x:
    :param n_h:
    :param n_y:
    :return:
    """
    np.random.seed(2)
    W1 = np.random.randn(n_h, n_x) * 0.01
    b1 = np.random.rand(n_h, 1)
    W2 = np.random.rand(n_y, n_h)
    b2 = np.random.rand(n_y, 1)
    parameters = {"W1": W1,
                  "b1": b1,
                  "W2": W2,
                  "b2": b2}

    return parameters

def forward_prop(X, parameters):
    W1 = parameters['W1']
    b1 = parameters['b1']
    W2 = parameters['W2']
    b2 = parameters['b2']

    Z1 = np.dot(W1, X) + b1
    A1 = np.tanh(Z1)
    Z2 = np.dot(W2, A1) + b2
    A2 = softmax(Z2.T)

    cache = {"Z1": Z1,
             "A1": A1,
             "Z2": Z2,
             "A2": A2}

    return A2, cache

def compute_cost(A2, Y, parameters):
    m = Y.shape[1]
    W1 = parameters['W1']
    W2 = parameters['W2']
    logprobs = np.multiply(np.log(A2), Y)
    cost = - np.sum(logprobs) / m
    cost = np.squeeze(cost)

    return cost

def back_prop(parameters, cache, X, Y):
    m = Y.shape[1]
    W1 = parameters['W1']
    W2 = parameters['W2']
    A1 = cache['A1']
    A2 = cache['A2']

    dZ2 = A2 - Y
    dW2 = (1 / m) * np.dot(dZ2, A1.T)
    db2 = (1 / m) * np.sum(dZ2, axis=1, keepdims=True)

    dZ1 = np.multiply(np.dot(W2.T, dZ2), 1 - np.square(A1))
    dW1 = (1 / m) * np.dot(dZ1, X.T)
    db1 = (1 / m) * np.sum(dZ1, axis=1, keepdims=True)

    grads = {"dW1": dW1,
             "db1": db1,
             "dW2": dW2,
             "db2": db2}

    return grads

def update_params(parameters, grads, alpha):
    W1 = parameters['W1']
    b1 = parameters['b1']
    W2 = parameters['W2']
    b2 = parameters['b2']

    dW1 = grads['dW1']
    db1 = grads['db1']
    dW2 = grads['dW2']
    db2 = grads['db2']

    W1 = W1 - alpha * dW1
    b1 = b1 - alpha * db1
    W2 = W2 - alpha * dW2
    b2 = b2 - alpha * db2

    parameters = {"W1": W1,
                  "b1": b1,
                  "W2": W2,
                  "b2": b2}
    return parameters

def model_nn(X, Y,Y_real,test_x,test_y, n_h, num_iters, alpha, print_cost):
    np.random.seed(3)
    n_x,n_y = layers(X, Y)
    parameters = initialize_nn(n_x, n_h, n_y)
    W1 = parameters['W1']
    b1 = parameters['b1']
    W2 = parameters['W2']
    b2 = parameters['b2']

    costs = []
    for i in range(0, num_iters):

        A2, cache = forward_prop(X, parameters)

        cost = compute_cost(A2, Y, parameters)
        grads = back_prop(parameters, cache, X, Y)
        if (i > 1500):
            alpha1 = 0.95*alpha
            parameters = update_params(parameters, grads, alpha1)
        else:
            parameters = update_params(parameters, grads, alpha)

        if i % 100 == 0:
            costs.append(cost)
        if print_cost and i % 100 == 0:
            print("Cost after iteration for %i: %f" % (i, cost))

    predictions = predict_nn(parameters, X)
    print("Train accuracy: {} %", sum(predictions == Y_real) / (float(len(Y_real))) * 100)
    predictions=predict_nn(parameters,test_x)
    print("Train accuracy: {} %", sum(predictions == test_y) / (float(len(test_y))) * 100)

    #plt.plot(costs)
    #plt.ylabel('cost')
    #plt.xlabel('iterations (per hundreds)')
    #plt.title("Learning rate =" + str(alpha))
    #plt.show()

    return parameters

def predict_nn(parameters, X):
    A2, cache = forward_prop(X, parameters)
    predictions = np.argmax(A2, axis=0)
    return predictions

深度神經網絡：

import numpy as np
import matplotlib.pyplot as plt

def softmax(z):
    cache = z
    z -= np.max(z)
    sm = (np.exp(z).T / np.sum(np.exp(z), axis=1))
    return sm, cache

def relu(z):
    """
    :param z:
    :return:
    """
    s = np.maximum(0, z)
    cache = z
    return s, cache

def softmax_backward(dA, cache):
    """
    :param dA:
    :param activation_cache:
    :return:
    """
    z = cache
    z -= np.max(z)
    s = (np.exp(z).T / np.sum(np.exp(z), axis=1))
    dZ = dA * s * (1 - s)
    return dZ

def relu_backward(dA, cache):
    """
    :param dA:
    :param activation_cache:
    :return:
    """
    Z = cache
    dZ = np.array(dA, copy=True)  # just converting dz to a correct object.
    dZ[Z <= 0] = 0
    return dZ

def initialize_parameters_deep(dims):
    """
    :param dims:
    :return:
    """

    np.random.seed(3)
    params = {}
    L = len(dims)

    for l in range(1, L):
        params['W' + str(l)] = np.random.randn(dims[l], dims[l - 1]) * 0.01
        params['b' + str(l)] = np.zeros((dims[l], 1))
    return params

def linear_forward(A, W, b):
    """
    :param A:
    :param W:
    :param b:
    :return:
    """

    Z = np.dot(W, A) + b
    cache = (A, W, b)

    return Z, cache

def linear_activation_forward(A_prev, W, b, activation):
    """
    :param A_prev:
    :param W:
    :param b:
    :param activation:
    :return:
    """
    if activation == "softmax":
        Z, linear_cache = linear_forward(A_prev, W, b)
        A, activation_cache = softmax(Z.T)

    elif activation == "relu":
        Z, linear_cache = linear_forward(A_prev, W, b)
        A, activation_cache = relu(Z)

    cache = (linear_cache, activation_cache)

    return A, cache

def L_model_forward(X, params):
    """
    :param X:
    :param params:
    :return:
    """

    caches = []
    A = X
    L = len(params) // 2  # number of layers in the neural network

    # Implement [LINEAR -> RELU]*(L-1). Add "cache" to the "caches" list.
    for l in range(1, L):
        A_prev = A
        A, cache = linear_activation_forward(A_prev,
                                             params["W" + str(l)],
                                             params["b" + str(l)],
                                             activation='relu')
        caches.append(cache)

    A_last, cache = linear_activation_forward(A,
                                              params["W" + str(L)],
                                              params["b" + str(L)],
                                              activation='softmax')
    caches.append(cache)
    return A_last, caches

def compute_cost(A_last, Y):
    """
    :param A_last:
    :param Y:
    :return:
    """

    m = Y.shape[1]
    cost = (-1 / m) * np.sum(Y * np.log(A_last))
    cost = np.squeeze(cost)  # To make sure your cost's shape is what we expect (e.g. this turns [[17]] into 17).
    return cost

def linear_backward(dZ, cache):
    """
    :param dZ:
    :param cache:
    :return:
    """

    A_prev, W, b = cache
    m = A_prev.shape[1]

    dW = (1. / m) * np.dot(dZ, cache[0].T)
    db = (1. / m) * np.sum(dZ, axis=1, keepdims=True)
    dA_prev = np.dot(cache[1].T, dZ)

    return dA_prev, dW, db

def linear_activation_backward(dA, cache, activation):
    """
    :param dA:
    :param cache:
    :param activation:
    :return:
    """

    linear_cache, activation_cache = cache

    if activation == "relu":
        dZ = relu_backward(dA, activation_cache)
        dA_prev, dW, db = linear_backward(dZ, linear_cache)

    elif activation == "softmax":
        dZ = softmax_backward(dA, activation_cache)
        dA_prev, dW, db = linear_backward(dZ, linear_cache)

    return dA_prev, dW, db

def L_model_backward(A_last, Y, caches):
    """
    :param A_last:
    :param Y:
    :param caches:
    :return:
    """

    grads = {}
    L = len(caches)  # the number of layers
    m = A_last.shape[1]
    Y = Y.reshape(A_last.shape)  # after this line, Y is the same shape as A_last

    dA_last = - (np.divide(Y, A_last) - np.divide(1 - Y, 1 - A_last))
    current_cache = caches[-1]
    grads["dA" + str(L)], grads["dW" + str(L)], grads["db" + str(L)] = linear_activation_backward(dA_last,
                                                                                                  current_cache,
                                                                                                  activation="softmax")

    for l in reversed(range(L - 1)):
        current_cache = caches[l]

        dA_prev_temp, dW_temp, db_temp = linear_activation_backward(grads["dA" + str(l + 2)], current_cache,
                                                                    activation="relu")
        grads["dA" + str(l + 1)] = dA_prev_temp
        grads["dW" + str(l + 1)] = dW_temp
        grads["db" + str(l + 1)] = db_temp

    return grads

def update_params(params, grads, alpha):
    """
    :param params:
    :param grads:
    :param alpha:
    :return:
    """

    L = len(params) // 2  # number of layers in the neural network

    for l in range(L):
        params["W" + str(l + 1)] = params["W" + str(l + 1)] - alpha * grads["dW" + str(l + 1)]
        params["b" + str(l + 1)] = params["b" + str(l + 1)] - alpha * grads["db" + str(l + 1)]

    return params

def model_DL( X, Y, Y_real, test_x, test_y, layers_dims, alpha, num_iterations, print_cost):  # lr was 0.009
    """
    Implements a L-layer neural network: [LINEAR->RELU]*(L-1)->LINEAR->SIGMOID.
    Arguments:
    X -- data, numpy array of shape (number of examples, num_px * num_px * 3)
    Y -- true "label" vector (containing 0 if cat, 1 if non-cat), of shape (1, number of examples)
    layers_dims -- list containing the input size and each layer size, of length (number of layers + 1).
    alpha -- learning rate of the gradient descent update rule
    num_iterations -- number of iterations of the optimization loop
    print_cost -- if True, it prints the cost every 100 steps
    Returns:
    params -- params learnt by the model. They can then be used to predict.
    """

    np.random.seed(1)
    costs = []  # keep track of cost

    params = initialize_parameters_deep(layers_dims)

    for i in range(0, num_iterations):

        A_last, caches = L_model_forward(X, params)
        cost = compute_cost(A_last, Y)
        grads = L_model_backward(A_last, Y, caches)

        if (i > 800 and i<1700):
            alpha1 = 0.80 * alpha
            params = update_params(params, grads, alpha1)
        elif(i>=1700):
            alpha1 = 0.50 * alpha
            params = update_params(params, grads, alpha1)
        else:
            params = update_params(params, grads, alpha)

        if print_cost and i % 100 == 0:
            print("Cost after iteration %i: %f" % (i, cost))
        if print_cost and i % 100 == 0:
            costs.append(cost)
    predictions = predict(params, X)
    print("Train accuracy: {} %", sum(predictions == Y_real) / (float(len(Y_real))) * 100)
    predictions = predict(params, test_x)
    print("Test accuracy: {} %", sum(predictions == test_y) / (float(len(test_y))) * 100)

    #plt.plot(np.squeeze(costs))
    #plt.ylabel('cost')
    #plt.xlabel('iterations (per tens)')
    #plt.title("Learning rate =" + str(alpha))
    #plt.show()

    return params

def predict(parameters, X):
    A_last, cache = L_model_forward(X, parameters)
    predictions = np.argmax(A_last, axis=0)
    return predictions

經過網絡攝像頭執行寫入

運行代碼

python Dig-Rec.py

經過網絡攝像頭顯示圖像

運行代碼

python Digit-Recognizer.py

梵文字母識別

項目地址：akshaybahadur21/Devanagiri-Recognizer
此代碼可幫助您使用CNN對不一樣梵文字母進行分類。

使用技術

使用了卷積神經網絡，使用Tensorflow做爲框架和Keras API來提供高級抽象。

網絡結構
CONV2D→MAXPOOL→CONV2D→MAXPOOL→FC→Softmax→Classification

額外的要點

1.能夠嘗試不一樣的網絡結構；
2.添加正則化以防止過擬合；
3.能夠向訓練集添加其餘圖像以提升準確性；

Python實現

使用Dataset-DHCD（Devnagari Character Dataset）數據集，每張圖大小爲32 X 32，詳細代碼請參考項目地址。
運行代碼

python Dev-Rec.py

4.使用FaceNet進行面部識別

項目地址：akshaybahadur21/Face-Recoinion
此代碼使用facenet進行面部識別。facenet的概念最初是在一篇研究論文中提出的,主要思想是談到三重損失函數來比較不一樣人的圖像。爲了提供穩定性和更好的檢測，額外添加了本身的幾個功能。

本項目依賴的包以下：

numpy 
matplotlib 
cv2 
keras 
dlib 
h5py 
scipy

描述

面部識別系統是可以從數字圖像或來自視頻源的視頻幀識別或驗證人的技術。面部識別系統有多種方法，但通常來講，它們都是經過將給定圖像中的選定面部特徵與數據庫中的面部進行比較來工做。

功能增長
1.僅在眼睛睜開時檢測臉部（安全措施）；
2.使用dlib中的面部對齊功能在實時流式傳輸時有效預測；

Python實現

1.網絡使用 - Inception Network;
2.原始論文 - Google的Facenet；

程序

1.若是想訓練網絡，運行Train-inception.py，可是不須要這樣作，由於已經訓練了模型並將其保存爲face-rec_Google.h5，在運行時只需加載這個文件便可；
2.如今須要在數據庫中放置一些圖像。檢查/images文件夾，能夠將圖片粘貼到此，也可使用網絡攝像頭拍攝;
3.運行rec-feat.py以運行該應用程序；

5.表情識別器

項目地址akshaybahadur21/Emojinator
此項目可識別和分類不一樣的表情符號。可是目前爲止，只支持手勢表達的情緒。

代碼依賴包

numpy 
matplotlib 
cv2 
keras 
dlib 
h5py 
scipy

描述

表情符號是電子信息和網頁中使用的表意文字和表情的符號。表情符號存在於各類類型中，包括面部表情、常見物體、地點和天氣以及動物等。

功能

1.用於檢測手的過濾器；
2.CNN用於訓練模型；

Python實現

1.網絡使用 - 卷積神經網絡

程序

1.首先，建立一個手勢數據庫。爲此，運行CreateGest.py。嘗試在框架內稍微移動一下手，以確保模型在訓練時不會發生過擬合;
2.對所需的全部功能重複此操做;
3.運行CreateCSV.py將圖像轉換爲CSV文件;
4.訓練模型，運行TrainEmojinator.py;
5.運行Emojinator.py經過網絡攝像頭測試模型；