最近準備系統地學習一下深度學習和TensorFlow,就以人臉識別做爲目的。python
十年前我作過一些圖像處理相關的項目和研究,涉及到圖像檢索。記得當時使用的是SIFT特徵提取,該特徵算子能很好地抵抗圖像旋轉、仿射變換等變化。能夠說SIFT是圖像特徵工程方面作得很出色的算子。linux
現現在深度學習特別是CNN,ResNet等模型被研究者發明以後,圖像特徵工程彷佛已經很「沒有必要」了。深度神經網絡經過多層表示可以更抽象地表示圖像的特徵(稱做embedding)。git
在人臉識別也得益於深度學習,其中facenet的性能很是出色。facenet基於triplet loss訓練模型輸出128維embedding。訓練時準備M我的,每一個人N張圖像,目標使得同一我的的不一樣人臉的embedding距離儘可能小,而不一樣人的人臉圖像的embedding儘可能大。github
本文將描述基於raspberry 3B + movidius做爲硬件平臺,TensorFlow facenet做爲模型實現人臉識別。後續將基於這套edge computing作一套完整的人臉識別系統,例如考勤系統。
本文將不涉及在線人臉檢測過程。web
當前的系統:api
pi@raspberrypi:~ $ uname -a Linux raspberrypi 4.14.34-v7+ #1110 SMP Mon Apr 16 15:18:51 BST 2018 armv7l GNU/Linux
相關外設:網絡
首先在raspberry上安裝TensorFlow。目前raspberry上預裝了python2.7和python3.5.咱們選擇python3.5.
從https://github.com/lhelontra/tensorflow-on-arm/releases下載tensorflow-1.3.1-cp35-none-linux_armv7l.whl並安裝:
pip3 install tensorflow-1.3.1-cp35-none-linux_armv7l.whl
可能須要pip3一些別的:app
# numpy issue sudo apt-get install libatlas-base-dev # opencv cv2 pip3 install opencv-python sudo apt-get install libjpeg-dev libtiff5-dev libjasper-dev libpng12-dev pip3 install sklearn pip3 install scipy # qt issue sudo apt-get install libqtgui4 libqt4-test
測試:python2.7
pi@raspberrypi:~ $ python3 Python 3.5.3 (default, Jan 19 2017, 14:11:04) [GCC 6.3.0 20170124] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import tensorflow >>> tensorflow.__version__ '1.3.1'
有了TensorFlow以後咱們能夠編譯facenet並在pi上運行。https://github.com/davidsandberg/facenet/tree/tl_revisited
基於模型20170512-110547運行compare.py來比較多張圖像中人臉的距離。發現速度很是慢。
具體說,首先檢測圖像中的人臉,這裏運行了mtnet網絡,而後再經過facenet網絡inference。單獨測試inference的時間開銷20+秒(inference時人臉圖像都是160x160)。相比之下用dlib的開銷在2秒左右。這樣的性能很讓人沮喪?
爲了將facenet進行到底,我選擇加速,movidius是神經計算神器,inference速度很是快。ide
clone代碼git clone -b ncsdk2 https://github.com/movidius/ncsdk.git
由於咱們事先安裝了TensorFlow,因此修改ncsdk.conf,再也不安裝TensorFlow,可是還須要caffe
INSTALL_DIR=/opt/movidius INSTALL_CAFFE=yes CAFFE_FLAVOR=ssd CAFFE_USE_CUDA=no INSTALL_TENSORFLOW=no INSTALL_TOOLKIT=yes PIP_SYSTEM_INSTALL=no VERBOSE=yes USE_VIRTUALENV=no #MAKE_NJOBS=1
make install
clone代碼:git clone -b ncsdk2 https://github.com/movidius/ncappzoo.git
在tensorflow/facenet下,根據README一步一步編譯。最終獲得facenet_celeb_ncs.graph文件,這個文件是movidius識別的圖模型文件。
這裏我先不考慮在線人臉檢測。先準備一張照片,離線人臉檢測並保存人臉圖像做爲比對目標。先以一張人臉爲例,多我的臉圖像實際上是同樣的。
在線檢測時咱們將攝像頭的resolution設置小一些,例如280x280。在線識別是,人臉儘可能靠近攝像頭,這樣能夠認爲這張照片就是人臉照片。或者也能夠限定人臉在顯示屏上給定的一個區域。
目前inference的速度~100ms,當前對ncs還不是很瞭解,待進一步研究後再優化。
代碼以下(保存在ncappzoo/tensorflow/facenet)
VALIDATED_IMAGES_DIR + '/my1.png'
是一張人臉圖像,經過人臉檢測獲得後保存的結果#! /usr/bin/env python3 import sys sys.path.insert(0, "../../ncapi2_shim") import mvnc_simple_api as mvnc import numpy import cv2 import sys import os from picamera.array import PiRGBArray from picamera import PiCamera import time # initialize the camera and grab a reference to the raw camera capture camera = PiCamera() camera.resolution = (280, 280) camera.framerate = 32 rawCapture = PiRGBArray(camera, size=(280, 280)) frame_name='' EXAMPLES_BASE_DIR='../../' IMAGES_DIR = './' VALIDATED_IMAGES_DIR = IMAGES_DIR + 'validated_images/' validated_image_filename = VALIDATED_IMAGES_DIR + 'my1.png' GRAPH_FILENAME = "facenet_celeb_ncs.graph" # name of the opencv window CV_WINDOW_NAME = "FaceNet" # the same face will return 0.0 # different faces return higher numbers # this is NOT between 0.0 and 1.0 FACE_MATCH_THRESHOLD = 1.2 # Run an inference on the passed image # image_to_classify is the image on which an inference will be performed # upon successful return this image will be overlayed with boxes # and labels identifying the found objects within the image. # ssd_mobilenet_graph is the Graph object from the NCAPI which will # be used to peform the inference. def run_inference(image_to_classify, facenet_graph): # get a resized version of the image that is the dimensions # SSD Mobile net expects resized_image = preprocess_image(image_to_classify) # *************************************************************** # Send the image to the NCS # *************************************************************** facenet_graph.LoadTensor(resized_image.astype(numpy.float16), None) # *************************************************************** # Get the result from the NCS # *************************************************************** output, userobj = facenet_graph.GetResult() return output # overlays the boxes and labels onto the display image. # display_image is the image on which to overlay to # image info is a text string to overlay onto the image. # matching is a Boolean specifying if the image was a match. # returns None def overlay_on_image(display_image, image_info, matching): rect_width = 10 offset = int(rect_width/2) if (image_info != None): cv2.putText(display_image, image_info, (30, 30), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255, 0, 0), 1) if (matching): # match, green rectangle cv2.rectangle(display_image, (0+offset, 0+offset), (display_image.shape[1]-offset-1, display_image.shape[0]-offset-1), (0, 255, 0), 10) else: # not a match, red rectangle cv2.rectangle(display_image, (0+offset, 0+offset), (display_image.shape[1]-offset-1, display_image.shape[0]-offset-1), (0, 0, 255), 10) # whiten an image def whiten_image(source_image): source_mean = numpy.mean(source_image) source_standard_deviation = numpy.std(source_image) std_adjusted = numpy.maximum(source_standard_deviation, 1.0 / numpy.sqrt(source_image.size)) whitened_image = numpy.multiply(numpy.subtract(source_image, source_mean), 1 / std_adjusted) return whitened_image # create a preprocessed image from the source image that matches the # network expectations and return it def preprocess_image(src): # scale the image NETWORK_WIDTH = 160 NETWORK_HEIGHT = 160 preprocessed_image = cv2.resize(src, (NETWORK_WIDTH, NETWORK_HEIGHT)) #convert to RGB preprocessed_image = cv2.cvtColor(preprocessed_image, cv2.COLOR_BGR2RGB) #whiten preprocessed_image = whiten_image(preprocessed_image) # return the preprocessed image return preprocessed_image # determine if two images are of matching faces based on the # the network output for both images. def face_match(face1_output, face2_output): if (len(face1_output) != len(face2_output)): print('length mismatch in face_match') return False total_diff = 0 for output_index in range(0, len(face1_output)): this_diff = numpy.square(face1_output[output_index] - face2_output[output_index]) total_diff += this_diff print('Total Difference is: ' + str(total_diff)) if (total_diff < FACE_MATCH_THRESHOLD): # the total difference between the two is under the threshold so # the faces match. return True # differences between faces was over the threshold above so # they didn't match. return False # handles key presses # raw_key is the return value from cv2.waitkey # returns False if program should end, or True if should continue def handle_keys(raw_key): ascii_code = raw_key & 0xFF if ((ascii_code == ord('q')) or (ascii_code == ord('Q'))): return False return True # start the opencv webcam streaming and pass each frame # from the camera to the facenet network for an inference # Continue looping until the result of the camera frame inference # matches the valid face output and then return. # valid_output is inference result for the valid image # validated image filename is the name of the valid image file # graph is the ncsdk Graph object initialized with the facenet graph file # which we will run the inference on. # returns None def run_camera(valid_output, validated_image_filename, graph): frame_count = 0 cv2.namedWindow(CV_WINDOW_NAME) found_match = False for frame in camera.capture_continuous(rawCapture, format="bgr", use_video_port=True): # grab the raw NumPy array representing the image, then initialize the timestamp # and occupied/unoccupied text vid_image = frame.array test_output = run_inference(vid_image, graph) if (face_match(valid_output, test_output)): print('PASS! File ' + frame_name + ' matches ' + validated_image_filename) found_match = True else: found_match = False print('FAIL! File ' + frame_name + ' does not match ' + validated_image_filename) overlay_on_image(vid_image, frame_name, found_match) # check if the window is visible, this means the user hasn't closed # the window via the X button prop_val = cv2.getWindowProperty(CV_WINDOW_NAME, cv2.WND_PROP_ASPECT_RATIO) if (prop_val < 0.0): print('window closed') break # display the results and wait for user to hit a key cv2.imshow(CV_WINDOW_NAME, vid_image) raw_key = cv2.waitKey(1) if (raw_key != -1): if (handle_keys(raw_key) == False): print('user pressed Q') break # show the frame #cv2.imshow("Frame", image) key = cv2.waitKey(1) & 0xFF # clear the stream in preparation for the next frame rawCapture.truncate(0) # if the `q` key was pressed, break from the loop if key == ord("q"): break # This function is called from the entry point to do # all the work of the program def main(): # Get a list of ALL the sticks that are plugged in # we need at least one devices = mvnc.EnumerateDevices() if len(devices) == 0: print('No NCS devices found') quit() # Pick the first stick to run the network device = mvnc.Device(devices[0]) # Open the NCS device.OpenDevice() # The graph file that was created with the ncsdk compiler graph_file_name = GRAPH_FILENAME # read in the graph file to memory buffer with open(graph_file_name, mode='rb') as f: graph_in_memory = f.read() # create the NCAPI graph instance from the memory buffer containing the graph file. graph = device.AllocateGraph(graph_in_memory) validated_image = cv2.imread(validated_image_filename) valid_output = run_inference(validated_image, graph) run_camera(valid_output, validated_image_filename, graph) # Clean up the graph and the device graph.DeallocateGraph() device.CloseDevice() # main entry point for program. we'll call main() to do what needs to be done. if __name__ == "__main__": sys.exit(main())