visualization of filters keras 基於Keras的卷積神經網絡(CNN)可視化

https://adeshpande3.github.io/adeshpande3.github.io/html

https://blog.csdn.net/weiwei9363/article/details/79112872python

https://blog.csdn.net/and_w/article/details/70336506git

https://hackernoon.com/visualizing-parts-of-convolutional-neural-networks-using-keras-and-cats-5cc01b214e59github

https://keras-cn.readthedocs.io/en/latest/other/visualization/算法

https://blog.keras.io/category/demo.html網絡

https://stackoverflow.com/questions/39280813/visualization-of-convolutional-layer-in-keras-model架構

http://nbviewer.jupyter.org/github/BVLC/caffe/blob/master/examples/00-classification.ipynbapp

https://blog.csdn.net/thystar/article/details/50662972less

 

原始網頁:Visualizing parts of Convolutional Neural Networks using Keras and Cats 
翻譯:卷積神經網絡實戰(可視化部分)——使用keras識別貓咪dom

It is well known that convolutional neural networks (CNNs or ConvNets) have been the source of many major breakthroughs in the field of Deep learning in the last few years, but they are rather unintuitive to reason about for most people. I’ve always wanted to break down the parts of a ConvNet and see what an image looks like after each stage, and in this post I do just that!

在近些年,深度學習領域的卷積神經網絡(CNNs或ConvNets)在各行各業爲咱們解決了大量的實際問題。可是對於大多數人來講,CNN彷彿戴上了神祕的面紗。我常常會想,要是能將神經網絡的過程分解,看一看每個步驟是什麼樣的結果該有多好!這也就是這篇博客存在的意義。

CNNs at a high level (高級CNN)

First off, what are ConvNets good at? ConvNets are used primarily to look for patterns in an image. You did that by convoluting over an image and looking for patterns. In the first few layers of CNNs the network can identify lines and corners, but we can then pass these patterns down through our neural net and start recognizing more complex features as we get deeper. This property makes CNNs really good at identifying objects in images.

首先,咱們要了解一下卷積神經網絡擅長什麼。CNN主要被用來找尋圖片中的模式。這個過程主要有兩個步驟,首先要對圖片作卷積,而後找尋模式。在神經網絡中,前幾層是用來尋找邊界和角,隨着層數的增長,咱們就能識別更加複雜的特徵。這個性質讓CNN很是擅長識別圖片中的物體。

What is a CNN? (CNN是什麼?)

A CNN is a neural network that typically contains several types of layers, one of which is a convolutional layer, as well as pooling, and activation layers.

CNN是一種特殊的神經網絡,它包含卷積層、池化層和激活層。

Convolutional Layer (卷積層)

To understand what a CNN is, you need to understand how convolutions work. Imagine you have an image represented as a 5x5 matrix of values, and you take a 3x3 matrix and slide that 3x3 window around the image. At each position the 3x3 visits, you matrix multiply the values of your 3x3 window by the values in the image that are currently being covered by the window. This results in a single number the represents all the values in that window of the image. Here’s a pretty gif for clarity:

要想了解什麼是卷積神經網絡,你首先要知道卷積是怎麼工做的。想象你有一個5*5矩陣表示的圖片,而後你用一個3*3的矩陣在圖片中滑動。每當3*3矩陣通過的點就用原矩陣中被覆蓋的矩陣和這個矩陣相乘。這樣一來,咱們可使用一個值來表示當前窗口中的全部點。下面是一個過程的動圖:

As you can see, each item in the feature matrix corresponds to a section of the image. Note that the value of the kernel matrix is the red number in the corner of the gif.

正如你所見的那樣,特徵矩陣中的每個項都和原圖中的一個區域相關。

The 「window」 that moves over the image is called a kernel. Kernels are typically square and 3x3 is a fairly common kernel size for small-ish images. The distance the window moves each time is called the stride. Additionally of note, images are sometimes padded with zeros around the perimeter when performing convolutions, which dampens the value of the convolutions around the edges of the image (the idea being typically the center of photos matter more).

在圖中像窗口同樣移動的叫作核。核通常都是方陣,對於小圖片來講,通常選用3*3的矩陣就能夠了。每次窗口移動的距離叫作步長。值得注意的是,一些圖片在邊界會被填充零,若是直接進行卷積運算的話會致使邊界處的數據變小(固然圖片中間的數據更重要)。

The goal of a convolutional layer is filtering. As we move over an image we effective check for patterns in that section of the image. This works because of filters, stacks of weights represented as a vector, which are multiplied by the values outputed by the convolution.When training an image, these weights change, and so when it is time to evaluate an image, these weights return high values if it thinks it is seeing a pattern it has seen before. The combinations of high weights from various filters let the network predict the content of an image. This is why in CNN architecture diagrams, the convolution step is represented by a box, not by a rectangle; the third dimension represents the filters.

卷積層的主要目的是濾波。當咱們在圖片上操做時,咱們能夠很容易得檢查出那部分的模式,這是因爲咱們使用了濾波,咱們用權重向量乘以卷積以後的輸出。當訓練一張圖片時,這些權重會不斷改變,並且當遇到以前見過的模式時,相應的權值會提升。來自各類濾波器的高權重的組合讓網絡預測圖像的內容的能力。 這就是爲何在CNN架構圖中,卷積步驟由一個框而不是一個矩形表示; 第三維表明濾波器。

Architecture of AlexNet]

Things to note: (注意事項:)

  • The output of the convolution is smaller (in width and height) than the original image
  • A linear function is applied between the kernel and the image window that is under the kernel
  • Weights in the filters are learned by seeing lots of images

  • 卷積運算後的輸出不管在寬度上仍是高度上都比原來的小
  • 核和圖片窗口之間進行的是線性的運算
  • 濾波器中的權重是經過許多圖片學習的

Pooling Layers (池化層)

Pooling works very much like convoluting, where we take a kernel and move the kernel over the image, the only difference is the function that is applied to the kernel and the image window isn’t linear.

池化層和卷積層很相似,也是用一個卷積核在圖上移動。惟一的不一樣就是池化層中核和圖片窗口的操做再也不是線性的。

Max pooling and Average pooling are the most common pooling functions. Max pooling takes the largest value from the window of the image currently covered by the kernel, while average pooling takes the average of all values in the window.

最大池化和平均池化是最多見的池化函數。最大池化選取當前核覆蓋的圖片窗口中最大的數,而平均池化則是選擇圖片窗口的均值。

Activation Layers (激活層)

Activation layers work exactly as in other neural networks, a value is passed through a function that squashes the value into a range. Here’s a bunch of common ones:

在CNN中,激活函數和其餘網絡同樣,函數將數值壓縮在一個範圍內。下面列出了一些常見的函數:

The most used activation function in CNNs is the relu (Rectified Linear Unit). There are a bunch of reason that people like relus, but a big one is because they are really cheap to perform, if the number is negative: zero, else: the number. Being cheap makes it faster to train networks.

在CNN中最經常使用的是relu(修正線性單元)。人們有許多喜歡relu的理由,可是最重要的一點就是它很是的易於實現,若是數值是負數則輸出0,不然輸出自己。這種函數運算簡單,因此訓練網絡也很是快。

Recap (回顧:)

  • Three main types of layers in CNNs: Convolutional, Pooling, Activation
  • Convolutional layers multiply kernel value by the image window and optimize the kernel weights over time using gradient descent
  • Pooling layers describe a window of an image using a single value which is the max or the average of that window
  • Activation layers squash the values into a range, typically [0,1] or [-1,1]

  • CNN中主要有三種層,分別是:卷積層、池化層和激活層。
  • 卷積層使用卷積核和圖片窗口相乘,並使用梯度降低法去優化卷積核。
  • 池化層使用最大值或者均值來描述一個圖形窗口。
  • 激活層使用一個激活函數將輸入壓縮到一個範圍中,典型的[0,1][-1,1]。

What does a CNN look like? (CNN是什麼樣的呢?)

Before we get into what a CNN looks like, a little bit of background. The first successful applications of ConvNets was by Yann LeCun in the 90’s, he created something called LeNet, that could be used to read hand written numbers. Since then, computing advancements and powerful GPUs have allowed researchers to be more ambitious. In 2010 the Stanford Vision Lab released ImageNet. Image net is data set of 14 million images with labels detailing the contents of the images. It has become one of the research world’s standards for comparing CNN models, with current best models will successfully detect the objects in 94+% of the images. Every so often someone comes in and beats the all time high score on imagenet and its a pretty big deal. In 2014 it was GoogLeNet and VGGNet, before that it was ZF Net. The first viable example of a CNN applied to imagenet was AlexNet in 2012, before that researches attempted to use traditional computer vision techiques, but AlexNet outperformed everything else up to that point by ~15%.

在咱們深刻了解CNN以前,讓咱們先補充一些背景知識。早在上世紀90年代,Yann LeCun就使用CNN作了一個手寫數字識別的程序。而隨着時代的發展,尤爲是計算機性能和GPU的改進,研究人員有了更加豐富的想象空間。 2010年斯坦福的機器視覺實驗室發佈了ImageNet項目。該項目包含1400萬帶有描述標籤的圖片。這個幾乎已經成爲了比較CNN模型的標準。目前,最好的模型在這個數據集上能達到94%的準確率。人們不斷的改善模型來提升準確率。在2014年GoogLeNet 和VGGNet成爲了最好的模型,而在此以前是ZFNet。CNN應用於ImageNet的第一個可行例子是AlexNet,在此以前,研究人員試圖使用傳統的計算機視覺技術,但AlexNet的表現要比其餘一切都高出15%。

Anyway, lets look at LeNet:

讓咱們一塊兒看一下LeNet:

LeNet architecture

This diagram doesn’t show the activation functions, but the architecture is:

這個圖中並無顯示激活層,整個的流程是:

Input image →ConvLayer →Relu → MaxPooling →ConvLayer →Relu→ MaxPooling →Hidden Layer →Softmax (activation)→output layer

輸入圖片 →卷積層 →Relu → 最大池化→卷積層 →Relu→ 最大池化→隱藏層 →Softmax (activation)→輸出層。

On to the cats! (讓咱們一塊兒看一個實際的例子)

Here is an image of a cat:

下圖是一個貓的圖片:

That’s a good looking cat

Our picture of the cat has a height 320px, a width of 400px, and 3 channels of color (RGB).

這張圖長400像素寬320像素,有三個通道(rgb)的顏色。

Convolutional Layer

So what does he look like after one layer of convolution?

那麼通過一層卷積運算以後會變成什麼樣子呢?

1 convcat

Here is the cat with a kernel size of 3x3 and 3 filters (if we have more than 3 filter layers we cant plot a 2d image of the cat. Higher dimensional cats are notoriously tricky to deal with.).

這是用一個3*3的卷積核和三個濾波器處理的效果(若是咱們有超過3個的濾波器,那麼我能夠畫出貓的2d圖像。更高維的話就很難處理)

As you can see the cat is really noisy because all of our weights are randomly initialized and we haven’t trained the network. Oh and they’re all on top of each other so even if there was detail on each layer we wouldn’t be able to see it. But we can make out areas of the cat that were the same color like the eyes and the background. What happens if we increase the kernel size to 10x10?

咱們能夠看到,圖中的貓很是的模糊,由於咱們使用了一個隨機的初始值,並且咱們尚未訓練網絡。他們都在彼此的頂端,即便每層都有細節,咱們將沒法看到它。但咱們能夠製做出與眼睛和背景相同顏色的貓的區域。若是咱們將內核大小增長到10x10,會發生什麼呢?

As we can see, we lost some of the detail because the kernel was too big. Also note the shape of the image is slightly smaller because of the larger kernel, and because math governs stuff.

咱們能夠看到,因爲內核太大,咱們失去了一些細節。還要注意,從數學角度來看,卷積核越大,圖像的形狀會變得越小。

What happens if we squish it down a bit so we can see the color channels better?

若是咱們把它壓扁一點,咱們能夠更好的看到色彩通道會發生什麼?

Much better! Now we can see some of the things our filter is seeing. It looks like red is really liking the black bits of the nose an eyes, and blue is digging the light grey that outlines the cat. We can start to see how the layer captures some of the more important details in the photo.

這張看起來好多了!如今咱們能夠看到咱們的過濾器看到的一些事情。看起來紅色替換掉了黑色的鼻子和黑色眼睛,藍色替換掉了貓邊界的淺灰色。咱們能夠開始看到圖層如何捕獲照片中的一些更重要的細節。

3x3 Kernel convcat

Original

15x15 pixel kernel size

If we increase the kernel size its far more obvious now that we get less detail, but the image is also smaller than the other two.

若是咱們增長內核大小,咱們獲得的細節就會愈來愈明顯,固然圖像也比其餘兩個都小。

Add an Activation layer (增長一個激活層)

\reluCat

We get rid of of a lot of the not blue-ness by adding a relu.

咱們經過添加一個relu,去掉了不少不是藍色的部分。

Adding a Pooling Layer (增長一個池化層)

We add a pooling layer (getting rid of the activation just max it a bit easier to show)

咱們添加一個池化層(擺脫激活層最大限度地讓圖片更加更容易顯示)。

2x2 pool size

As expected, the cat is blockier, but we can go even blockyier!

正如預期的那樣,貓咪變成了斑駁的,而咱們可讓它更加斑駁。

PoolCat with a 5x5 pool size. All your poolz belong to us

Notice how the image is now about a third the size of the original.

如今圖片大約成了原來的三分之一。

Activation and Max Pooling (激活和最大池化)

LeNet Cats

What do the cats look like if we put them through the convolutional and pools sections of LeNet?

若是咱們將貓咪的圖片放到LeNet模型中作卷積和池化,那麼效果會怎麼樣呢?

1 filter for each conv layer

3 filters in first conv layer, 1 in second conv later

3 filter layers in each convolution

Conclusion

ConvNets are powerful due to their ability to extract the core features of an image and use these features to identify images that contain features like them. Even with our two layer CNN we can start to see the network is paying a lot of attention to regions like the whiskers, nose, and eyes of the cat. These are the types of features that would allow the CNN to differentiate a cat from a bird for example.

ConvNets功能強大,由於它們可以提取圖像的核心特徵,並使用這些特徵來識別包含其中的特徵的圖像。即便咱們的兩層CNN,咱們也能夠開始看到網絡正在對貓的晶須,鼻子和眼睛這樣的地區給予不少的關注。這些是讓CNN將貓與鳥區分開的特徵的類型。

CNNs are remarkably powerful, and while these visualizations aren’t perfect, I hope they can help people like myself who are still learning to reason about ConvNets a little better.

CNN是很是強大的,雖然這些可視化並不完美,但我但願他們可以幫助像我這樣正在嘗試更好地理解ConvNets的人。

All code is on Github: https://github.com/erikreppel/visualizing_cnns

Follow me on Twitter, I’m @programmer (yes, seriously).

Further Resources

Andrej Karpathy’s cs231n

A guide to convolution arithmetic for deep learning by Vincent Dumoulin and Francesco Visin

 

 

卷積神經網絡可視化

  • 本文整理自Deep Learning with Python,書本上完整的代碼在 這裏的5.4節,並陪有詳細的註釋。
  • 深度學習一直被人們稱爲「黑盒子」,即內部算法不可見。可是,卷積神經網絡(CNN)卻可以被可視化,經過可視化,人們可以瞭解CNN識別圖像的過程。
  • 介紹三種可視化方法 
    1. 卷積核輸出的可視化(Visualizing intermediate convnet outputs (intermediate activations),便可視化卷積核通過激活以後的結果。可以看到圖像通過卷積以後結果,幫助理解卷積核的做用
    2. 卷積核的可視化(Visualizing convnets filters),幫助咱們理解卷積核是如何感覺圖像的
    3. 熱度圖可視化(Visualizing heatmaps of class activation in an image),經過熱度圖,瞭解圖像分類問題中圖像哪些部分起到了關鍵做用,同時能夠定位圖像中物體的位置。

卷積核輸出的可視化(Visualizing intermediate convnet outputs (intermediate activations)

  • 想法很簡單:向CNN輸入一張圖像,得到某些卷積層的輸出,可視化該輸出
  • 代碼中,使用到了cats_and_dogs_small_2.h5模型,這是在原書5.2節訓練好的模型,固然你徹底可使用keras.applications 中的模型,例如VGG16等。
  • 可視化結果以下圖。 
    • 1.png-161kB
    • 2.png-280.7kB
    • 3.png-216.8kB
    • 4.png-48.2kB
  • 結論: 
    • 第一層卷積層相似邊緣檢測的功能,在這個階段裏,卷積核基本保留圖像全部信息
    • 隨着層數的加深,卷積核輸出的內容也愈來愈抽象,保留的信息也愈來愈少。
    • 越深的層數,越多空白的內容,也就說這些內容空白卷積核沒有在輸入圖像中找到它們想要的特徵

卷積核的可視化(Visualizing convnets filters)

  • 卷積核究竟是如何識別物體的呢?想要解決這個問題,有一個方法就是去了解卷積核最感興趣的圖像是怎樣的。咱們知道,卷積的過程就是特徵提取的過程,每個卷積核表明着一種特徵。若是圖像中某塊區域與某個卷積核的結果越大,那麼該區域就越「像」該卷積核。
  • 基於以上的推論,若是咱們找到一張圖像,可以使得這張圖像對某個卷積核的輸出最大,那麼咱們就說找到了該卷積核最感興趣的圖像。
  • 具體思路:輸入一張隨機內容的圖像II, 求某個卷積核FF對圖像的梯度 G=F/IG=∂F/∂I,用梯度上升的方法迭代更新圖像 I=I+ηGI=I+η∗G,ηη是相似於學習率的東西。
  • 代碼中,使用以及訓練好的VGG16模型,可視化該模型的卷積核。結果以下 
    • block1_conv1 11.png-860.5kB
    • block2_conv122.png-925.6kB
    • block3_conv133.png-896.8kB
    • block4_conv144.png-897.2kB
    • block5_conv155.png-930.4kB
  • 結論: 
    • 低層的卷積核彷佛對顏色,邊緣信息感興趣。
    • 越高層的卷積核,感興趣的內容越抽象(很是魔幻啊),也越複雜。
    • 高層的卷積核感興趣的圖像愈來愈難經過梯度上升得到(block5_conv1有不少仍是隨機噪聲的圖像)

熱度圖可視化(Visualizing heatmaps of class activation in an image)

    • 在圖像分類問題中,假設網絡將一張圖片識別成「貓」的機率是0.9,我想了解到底最後一層的卷積層對這0.9的機率的貢獻是多少。換句話時候,假設最後一層卷積層有512個卷積核,我想了解這512個卷積覈對該圖片是」貓」分別投了幾票。投票越多的卷積核,就越確信圖片是「貓」,由於它們提取到的特徵趨向貓的特徵。
    • 代碼中,輸入了一張大象的圖片,而後得到最後一層卷積層的熱度圖,最後將熱度圖疊加到原圖像,得到圖像中起到關鍵分類做用的部分。結果以下: 
相關文章
相關標籤/搜索