https://adeshpande3.github.io/adeshpande3.github.io/html
https://blog.csdn.net/weiwei9363/article/details/79112872python
https://blog.csdn.net/and_w/article/details/70336506git
https://hackernoon.com/visualizing-parts-of-convolutional-neural-networks-using-keras-and-cats-5cc01b214e59github
https://keras-cn.readthedocs.io/en/latest/other/visualization/算法
https://blog.keras.io/category/demo.html網絡
https://stackoverflow.com/questions/39280813/visualization-of-convolutional-layer-in-keras-model架構
http://nbviewer.jupyter.org/github/BVLC/caffe/blob/master/examples/00-classification.ipynbapp
https://blog.csdn.net/thystar/article/details/50662972less
原始網頁:Visualizing parts of Convolutional Neural Networks using Keras and Cats
翻譯:卷積神經網絡實戰(可視化部分)——使用keras識別貓咪dom
It is well known that convolutional neural networks (CNNs or ConvNets) have been the source of many major breakthroughs in the field of Deep learning in the last few years, but they are rather unintuitive to reason about for most people. I’ve always wanted to break down the parts of a ConvNet and see what an image looks like after each stage, and in this post I do just that!
在近些年,深度學習領域的卷積神經網絡(CNNs或ConvNets)在各行各業爲咱們解決了大量的實際問題。可是對於大多數人來講,CNN彷彿戴上了神祕的面紗。我常常會想,要是能將神經網絡的過程分解,看一看每個步驟是什麼樣的結果該有多好!這也就是這篇博客存在的意義。
First off, what are ConvNets good at? ConvNets are used primarily to look for patterns in an image. You did that by convoluting over an image and looking for patterns. In the first few layers of CNNs the network can identify lines and corners, but we can then pass these patterns down through our neural net and start recognizing more complex features as we get deeper. This property makes CNNs really good at identifying objects in images.
首先,咱們要了解一下卷積神經網絡擅長什麼。CNN主要被用來找尋圖片中的模式。這個過程主要有兩個步驟,首先要對圖片作卷積,而後找尋模式。在神經網絡中,前幾層是用來尋找邊界和角,隨着層數的增長,咱們就能識別更加複雜的特徵。這個性質讓CNN很是擅長識別圖片中的物體。
A CNN is a neural network that typically contains several types of layers, one of which is a convolutional layer, as well as pooling, and activation layers.
CNN是一種特殊的神經網絡,它包含卷積層、池化層和激活層。
To understand what a CNN is, you need to understand how convolutions work. Imagine you have an image represented as a 5x5 matrix of values, and you take a 3x3 matrix and slide that 3x3 window around the image. At each position the 3x3 visits, you matrix multiply the values of your 3x3 window by the values in the image that are currently being covered by the window. This results in a single number the represents all the values in that window of the image. Here’s a pretty gif for clarity:
要想了解什麼是卷積神經網絡,你首先要知道卷積是怎麼工做的。想象你有一個5*5矩陣表示的圖片,而後你用一個3*3的矩陣在圖片中滑動。每當3*3矩陣通過的點就用原矩陣中被覆蓋的矩陣和這個矩陣相乘。這樣一來,咱們可使用一個值來表示當前窗口中的全部點。下面是一個過程的動圖:
As you can see, each item in the feature matrix corresponds to a section of the image. Note that the value of the kernel matrix is the red number in the corner of the gif.
正如你所見的那樣,特徵矩陣中的每個項都和原圖中的一個區域相關。
The 「window」 that moves over the image is called a kernel. Kernels are typically square and 3x3 is a fairly common kernel size for small-ish images. The distance the window moves each time is called the stride. Additionally of note, images are sometimes padded with zeros around the perimeter when performing convolutions, which dampens the value of the convolutions around the edges of the image (the idea being typically the center of photos matter more).
在圖中像窗口同樣移動的叫作核。核通常都是方陣,對於小圖片來講,通常選用3*3的矩陣就能夠了。每次窗口移動的距離叫作步長。值得注意的是,一些圖片在邊界會被填充零,若是直接進行卷積運算的話會致使邊界處的數據變小(固然圖片中間的數據更重要)。
The goal of a convolutional layer is filtering. As we move over an image we effective check for patterns in that section of the image. This works because of filters, stacks of weights represented as a vector, which are multiplied by the values outputed by the convolution.When training an image, these weights change, and so when it is time to evaluate an image, these weights return high values if it thinks it is seeing a pattern it has seen before. The combinations of high weights from various filters let the network predict the content of an image. This is why in CNN architecture diagrams, the convolution step is represented by a box, not by a rectangle; the third dimension represents the filters.
卷積層的主要目的是濾波。當咱們在圖片上操做時,咱們能夠很容易得檢查出那部分的模式,這是因爲咱們使用了濾波,咱們用權重向量乘以卷積以後的輸出。當訓練一張圖片時,這些權重會不斷改變,並且當遇到以前見過的模式時,相應的權值會提升。來自各類濾波器的高權重的組合讓網絡預測圖像的內容的能力。 這就是爲何在CNN架構圖中,卷積步驟由一個框而不是一個矩形表示; 第三維表明濾波器。
Pooling works very much like convoluting, where we take a kernel and move the kernel over the image, the only difference is the function that is applied to the kernel and the image window isn’t linear.
池化層和卷積層很相似,也是用一個卷積核在圖上移動。惟一的不一樣就是池化層中核和圖片窗口的操做再也不是線性的。
Max pooling and Average pooling are the most common pooling functions. Max pooling takes the largest value from the window of the image currently covered by the kernel, while average pooling takes the average of all values in the window.
最大池化和平均池化是最多見的池化函數。最大池化選取當前核覆蓋的圖片窗口中最大的數,而平均池化則是選擇圖片窗口的均值。
Activation layers work exactly as in other neural networks, a value is passed through a function that squashes the value into a range. Here’s a bunch of common ones:
在CNN中,激活函數和其餘網絡同樣,函數將數值壓縮在一個範圍內。下面列出了一些常見的函數:
The most used activation function in CNNs is the relu (Rectified Linear Unit). There are a bunch of reason that people like relus, but a big one is because they are really cheap to perform, if the number is negative: zero, else: the number. Being cheap makes it faster to train networks.
在CNN中最經常使用的是relu(修正線性單元)。人們有許多喜歡relu的理由,可是最重要的一點就是它很是的易於實現,若是數值是負數則輸出0,不然輸出自己。這種函數運算簡單,因此訓練網絡也很是快。
Before we get into what a CNN looks like, a little bit of background. The first successful applications of ConvNets was by Yann LeCun in the 90’s, he created something called LeNet, that could be used to read hand written numbers. Since then, computing advancements and powerful GPUs have allowed researchers to be more ambitious. In 2010 the Stanford Vision Lab released ImageNet. Image net is data set of 14 million images with labels detailing the contents of the images. It has become one of the research world’s standards for comparing CNN models, with current best models will successfully detect the objects in 94+% of the images. Every so often someone comes in and beats the all time high score on imagenet and its a pretty big deal. In 2014 it was GoogLeNet and VGGNet, before that it was ZF Net. The first viable example of a CNN applied to imagenet was AlexNet in 2012, before that researches attempted to use traditional computer vision techiques, but AlexNet outperformed everything else up to that point by ~15%.
在咱們深刻了解CNN以前,讓咱們先補充一些背景知識。早在上世紀90年代,Yann LeCun就使用CNN作了一個手寫數字識別的程序。而隨着時代的發展,尤爲是計算機性能和GPU的改進,研究人員有了更加豐富的想象空間。 2010年斯坦福的機器視覺實驗室發佈了ImageNet項目。該項目包含1400萬帶有描述標籤的圖片。這個幾乎已經成爲了比較CNN模型的標準。目前,最好的模型在這個數據集上能達到94%的準確率。人們不斷的改善模型來提升準確率。在2014年GoogLeNet 和VGGNet成爲了最好的模型,而在此以前是ZFNet。CNN應用於ImageNet的第一個可行例子是AlexNet,在此以前,研究人員試圖使用傳統的計算機視覺技術,但AlexNet的表現要比其餘一切都高出15%。
Anyway, lets look at LeNet:
讓咱們一塊兒看一下LeNet:
This diagram doesn’t show the activation functions, but the architecture is:
這個圖中並無顯示激活層,整個的流程是:
Input image →ConvLayer →Relu → MaxPooling →ConvLayer →Relu→ MaxPooling →Hidden Layer →Softmax (activation)→output layer
輸入圖片 →卷積層 →Relu → 最大池化→卷積層 →Relu→ 最大池化→隱藏層 →Softmax (activation)→輸出層。
Here is an image of a cat:
下圖是一個貓的圖片:
Our picture of the cat has a height 320px, a width of 400px, and 3 channels of color (RGB).
這張圖長400像素寬320像素,有三個通道(rgb)的顏色。
So what does he look like after one layer of convolution?
那麼通過一層卷積運算以後會變成什麼樣子呢?
Here is the cat with a kernel size of 3x3 and 3 filters (if we have more than 3 filter layers we cant plot a 2d image of the cat. Higher dimensional cats are notoriously tricky to deal with.).
這是用一個3*3的卷積核和三個濾波器處理的效果(若是咱們有超過3個的濾波器,那麼我能夠畫出貓的2d圖像。更高維的話就很難處理)
As you can see the cat is really noisy because all of our weights are randomly initialized and we haven’t trained the network. Oh and they’re all on top of each other so even if there was detail on each layer we wouldn’t be able to see it. But we can make out areas of the cat that were the same color like the eyes and the background. What happens if we increase the kernel size to 10x10?
咱們能夠看到,圖中的貓很是的模糊,由於咱們使用了一個隨機的初始值,並且咱們尚未訓練網絡。他們都在彼此的頂端,即便每層都有細節,咱們將沒法看到它。但咱們能夠製做出與眼睛和背景相同顏色的貓的區域。若是咱們將內核大小增長到10x10,會發生什麼呢?
As we can see, we lost some of the detail because the kernel was too big. Also note the shape of the image is slightly smaller because of the larger kernel, and because math governs stuff.
咱們能夠看到,因爲內核太大,咱們失去了一些細節。還要注意,從數學角度來看,卷積核越大,圖像的形狀會變得越小。
What happens if we squish it down a bit so we can see the color channels better?
若是咱們把它壓扁一點,咱們能夠更好的看到色彩通道會發生什麼?
Much better! Now we can see some of the things our filter is seeing. It looks like red is really liking the black bits of the nose an eyes, and blue is digging the light grey that outlines the cat. We can start to see how the layer captures some of the more important details in the photo.
這張看起來好多了!如今咱們能夠看到咱們的過濾器看到的一些事情。看起來紅色替換掉了黑色的鼻子和黑色眼睛,藍色替換掉了貓邊界的淺灰色。咱們能夠開始看到圖層如何捕獲照片中的一些更重要的細節。
If we increase the kernel size its far more obvious now that we get less detail, but the image is also smaller than the other two.
若是咱們增長內核大小,咱們獲得的細節就會愈來愈明顯,固然圖像也比其餘兩個都小。
We get rid of of a lot of the not blue-ness by adding a relu.
咱們經過添加一個relu,去掉了不少不是藍色的部分。
We add a pooling layer (getting rid of the activation just max it a bit easier to show)
咱們添加一個池化層(擺脫激活層最大限度地讓圖片更加更容易顯示)。
As expected, the cat is blockier, but we can go even blockyier!
正如預期的那樣,貓咪變成了斑駁的,而咱們可讓它更加斑駁。
Notice how the image is now about a third the size of the original.
如今圖片大約成了原來的三分之一。
What do the cats look like if we put them through the convolutional and pools sections of LeNet?
若是咱們將貓咪的圖片放到LeNet模型中作卷積和池化,那麼效果會怎麼樣呢?
ConvNets are powerful due to their ability to extract the core features of an image and use these features to identify images that contain features like them. Even with our two layer CNN we can start to see the network is paying a lot of attention to regions like the whiskers, nose, and eyes of the cat. These are the types of features that would allow the CNN to differentiate a cat from a bird for example.
ConvNets功能強大,由於它們可以提取圖像的核心特徵,並使用這些特徵來識別包含其中的特徵的圖像。即便咱們的兩層CNN,咱們也能夠開始看到網絡正在對貓的晶須,鼻子和眼睛這樣的地區給予不少的關注。這些是讓CNN將貓與鳥區分開的特徵的類型。
CNNs are remarkably powerful, and while these visualizations aren’t perfect, I hope they can help people like myself who are still learning to reason about ConvNets a little better.
CNN是很是強大的,雖然這些可視化並不完美,但我但願他們可以幫助像我這樣正在嘗試更好地理解ConvNets的人。
All code is on Github: https://github.com/erikreppel/visualizing_cnns
Follow me on Twitter, I’m @programmer (yes, seriously).
A guide to convolution arithmetic for deep learning by Vincent Dumoulin and Francesco Visin