三維卷積:全景圖像Spherical CNNs(Code)

         卷積神經網絡(CNN)能夠很好的處理二維平面圖像的問題。然而,對球面圖像進行處理需求日益增長。例如,對無人機、機器人、自動駕駛汽車、分子迴歸問題、全球天氣和睦候模型的全方位視覺處理問題。html

         將球形信號的平面投影做爲卷積神經網絡的輸入的這種Too Naive作法是註定要失敗的,Cnns的巨大成就來源於局部感覺野的權值共享,而多層結構總能找到不一樣rect的相同目標,給出響應。而對於球形圖像,一個目標在圖片的不一樣位置是發生形變的,若要使用CNNs直接共享,構建的局部感覺野理應描述這種轉換。以下圖所示,而這種平面投影引發的空間扭曲會致使CNN沒法共享權重。git

        

     We propose a definition for the spherical cross-correlation that is both expressive and rotation-equivariant. The spherical correlation satisfies a generalized Fourier theorem, which allows us to compute it efficiently using a generalized(non-commutative) Fast Fourier Transform (FFT) algorithm. We demonstrate the computational efficiency, numerical accuracy, and effectiveness of spherical CNNs applied to 3D model recognition and atomization energy regression.github

       如何使三維圖像由二維圖像重構出來,解決在不一樣位置產生形變問題,經典的FFT方法和李羣模型就成爲這種橋樑。算法

       關於SO3 做爲剛體變換的闡述,參考:半閒居士視覺SLAM十四講筆記(3)三維空間剛體運動 - par..._CSDN博客
express

       wocao,這個大綱寫的更簡潔明瞭:高翔《視覺SLAM十四講》從理論到實踐。
網絡

       區分出三維圖像和平面的細微差異,把球面圖像看作是三維流形,把球面展開爲離散的三維李羣,把SO(3)的關係用CNNs的高層進行表示。
app

      As shown in Figure 1, there is no good way to use translational convolution or cross-correlation1 to analyze spherical signals. The most obvious approach, then, is to change the definition of crosscorrelation by replacing filter translations by rotations. Doing so, we run into a subtle but important difference between the plane and the sphere: whereas the space of moves for the plane (2D translations) is itself isomorphic to the plane, the space of moves  for the sphere (3D rotations) is a different, three-dimensional manifold called SO(3)2. It follows that the result of a spherical correlation (the output feature map) is to be considered a signal on SO(3), not a signal on the sphere, S2. For this reason, we deploy SO(3) group correlation in the higher layers of a spherical CNN (Cohen and Welling, 2016).dom

       The implementation of a spherical CNN (S2-CNN) involves two major challenges. Whereas a square grid of pixels has discrete translation symmetries, no perfectly symmetrical grids for the sphere exist. This means that there is no simple way to define the rotation of a spherical filter by one pixel. Instead, in order to rotate a filter we would need to perform some kind of interpolation. The other challenge is computational efficiency; SO(3) is a three-dimensional manifold, so a naive implementation of SO(3) correlation is O(n6).
ide

       球形CNNs的兩個難點:圖像網格化的粒度,多大的粒度分解能保證重建的準確性;SO(3)的三維流形計算複雜度問題,時間複雜度是O(n6)的。post

........................................

The Key moments:

      使用G-FFT進行快速相關性卷積,的相關結構。It is well known that correlations and convolutions can be computed efficiently using the Fast Fourier Transform (FFT). This is a result of the Fourier theorem, which states that[f    = ^ f ^  . Since the FFT can be computed in O(n log n) time and the product has linear complexity, implementing the correlation using FFTs is asymptotically faster than the naive O(n2) spatial implementation.

       .................

   

    .......................................

        最重要的一點,Our code is available at: https://github.com/jonas-koehler/s2cnn .


實驗效果:

       Results We evaluate by RMSE and compare our results to Montavon et al. (2012) and Raj et al. (2016) (see table 3). Our learned representation outperforms all kernel-based approaches and a MLP trained on sorted Coulomb matrices. Superior performance could only be achieved for an MLP trained on randomly permuted Coulomb matrices. However, sufficient sampling of random permutations grows exponentially with N, so this method is unlikely to scale to large molecules.

       文中定義了S2和SO(3)的互相關,並分析了它們的屬性,進而實現了一個通用的RRT相關算法。實驗的數值結果證明了該算法的穩定性和準確性,即便在深度網絡上依然有效。

       總之,在準確率、可擴展性、等方面是綜合最有前途的一個三維網絡。


進一步優化:

      For intrinsically volumetric tasks like 3D model recognition, we believe that further improvements can be attained by generalizing further beyond SO(3) to the roto-translation group SE(3). The development of Spherical CNNs is an important first step in this direction. Another interesting generalization is the development of a Steerable CNN for the sphere (Cohen and Welling, 2017), which would make it possible to analyze vector fields such as global wind directions, as well as other sections of vector bundles over the sphere.

       把SO(3)上的計算往SE(3)上進行轉化,把旋轉相關性變換到切空間的平移SE(3),應該能夠達到新的加速效果。


Appendix:

李羣與李代數

三維旋轉矩陣構成了特殊正交羣SO(3),而變換矩陣構成了特殊歐氏羣SE(3)

 

 但不管SO(3),仍是SE(3),它們都不符合加法封閉性,即加以後再也不符合旋轉矩陣的定義,可是乘法卻知足,將這樣的矩陣稱爲羣。即只有一種運算的集合叫作羣。

 羣記做G=(A, .),其中A爲集合,.表示運算。羣要求運算知足如下幾個條件:

(1)封閉性。(2)結合律。

(3)幺元。一種集合裏特殊的數集。

(4)逆。

能夠證實,旋轉矩陣集合和矩陣乘法構成羣,而變換矩陣和矩陣乘法也構成羣。

介紹了羣的概念以後,那麼,什麼叫李羣呢?

李羣就是連續(光滑)的羣。一個剛體的運動是連續的,因此它是李羣。

每一個李羣都有對應的李代數。那麼什麼叫李代數呢?

李代數就是李羣對應的代數關係式。

李羣和李代數之間的代數關係以下:

可見二者之間是指數與對數關係。

 那麼exp(φ^)是如何計算的呢?它是一個矩陣的指數,在李羣和李代數中,它稱爲指數映射。任意矩陣的指數映射能夠寫成一個泰勒展開式,可是隻有在收斂的狀況下才會有結果,它的結果仍然是一個矩陣。

 一樣對任意一元素φ,咱們亦可按此方式定義它的指數映射:

 因爲φ是三維向量,咱們能夠定義它的模長θ和方向向量a知足使φ=θa。那麼,對於a^,能夠推導出如下兩個公式:

 設a=(cosα, cosβ, cosγ),可知(cosα)^2+(cosβ)^2+(cosγ)^2=1

 (1)a^a^=aaT-I

 (2)a^a^a^=-a^

 上面兩個公式說明了a^的二次方和a^的三次方的對應變換,從而可得:

exp(φ^)=exp(θa^)=∑(1/n!(θa^)n)=...=a^a^+I+sinθa^-cosθa^a^=(1-cosθ)a^a^+I+sinθa^=cosθI+(1-cosθ)aaT+sinθa^.

回憶前一講內容,它和羅德里格斯公式一模一樣。這代表,so(3)實際上就是由旋轉向量組成的空間,而指數映射即羅德里格斯公式。經過它們咱們把so(3)中任意一個向量對應到了一個位於SO(3)中的旋轉矩陣。反之,若是定義對數映射,咱們也能把SO(3)中的元素對應到so(3)中:

但一般咱們會經過跡的性質分別求解轉角和轉軸,那種方式會更加省事一些。

 OK,講了李羣和李代數的對應轉換關係以後,有什麼用呢?

主要是經過李代數來對李羣進行優化。好比說,對李羣中的兩個數進行運算,對應的他們的李代數會有什麼變化?

首先是,兩個李羣中的數進行乘積時,對應的李代數是怎麼樣的變化,是否是指數變化呢?可是注意,李羣裏的數是矩陣,不是常數,因此不知足ln(exp(A+B))=A+B,由於A,B是矩陣,不是常數,那麼是怎麼的對應關係呢?

相關文章
相關標籤/搜索