三維卷積：全景圖像Spherical CNNs（Code）

時間 2019-12-14

標籤三維全景圖像 spherical cnns code 简体版

原文原文鏈接

卷積神經網絡（CNN）能夠很好的處理二維平面圖像的問題。然而，對球面圖像進行處理需求日益增長。例如，對無人機、機器人、自動駕駛汽車、分子迴歸問題、全球天氣和睦候模型的全方位視覺處理問題。html

將球形信號的平面投影做爲卷積神經網絡的輸入的這種Too Naive作法是註定要失敗的，Cnns的巨大成就來源於局部感覺野的權值共享，而多層結構總能找到不一樣rect的相同目標，給出響應。而對於球形圖像，一個目標在圖片的不一樣位置是發生形變的，若要使用CNNs直接共享，構建的局部感覺野理應描述這種轉換。以下圖所示，而這種平面投影引發的空間扭曲會致使CNN沒法共享權重。git

We propose a definition for the spherical cross-correlation that is both expressive and rotation-equivariant. The spherical correlation satisfies a generalized Fourier theorem, which allows us to compute it efficiently using a generalized(non-commutative) Fast Fourier Transform (FFT) algorithm. We demonstrate the computational efficiency, numerical accuracy, and effectiveness of spherical CNNs applied to 3D model recognition and atomization energy regression.github

如何使三維圖像由二維圖像重構出來，解決在不一樣位置產生形變問題，經典的FFT方法和李羣模型就成爲這種橋樑。算法

關於SO3 做爲剛體變換的闡述，參考：半閒居士視覺SLAM十四講筆記(3)三維空間剛體運動 - par..._CSDN博客。
express

wocao，這個大綱寫的更簡潔明瞭：高翔《視覺SLAM十四講》從理論到實踐。
網絡

區分出三維圖像和平面的細微差異，把球面圖像看作是三維流形，把球面展開爲離散的三維李羣，把SO(3)的關係用CNNs的高層進行表示。
app

As shown in Figure 1, there is no good way to use translational convolution or cross-correlation1 to analyze spherical signals. The most obvious approach, then, is to change the definition of crosscorrelation by replacing filter translations by rotations. Doing so, we run into a subtle but important difference between the plane and the sphere: whereas the space of moves for the plane (2D translations) is itself isomorphic to the plane, the space of moves for the sphere (3D rotations) is a different, three-dimensional manifold called SO(3)2. It follows that the result of a spherical correlation (the output feature map) is to be considered a signal on SO(3), not a signal on the sphere, S2. For this reason, we deploy SO(3) group correlation in the higher layers of a spherical CNN (Cohen and Welling, 2016).dom

The implementation of a spherical CNN (S2-CNN) involves two major challenges. Whereas a square grid of pixels has discrete translation symmetries, no perfectly symmetrical grids for the sphere exist. This means that there is no simple way to define the rotation of a spherical filter by one pixel. Instead, in order to rotate a filter we would need to perform some kind of interpolation. The other challenge is computational efficiency; SO(3) is a three-dimensional manifold, so a naive implementation of SO(3) correlation is O(n6).
ide

球形CNNs的兩個難點：圖像網格化的粒度，多大的粒度分解能保證重建的準確性；SO(3)的三維流形計算複雜度問題，時間複雜度是O(n6)的。post

........................................

The Key moments:

使用G-FFT進行快速相關性卷積，的相關結構。It is well known that correlations and convolutions can be computed efficiently using the Fast Fourier Transform (FFT). This is a result of the Fourier theorem, which states that[f = ^ f ^ . Since the FFT can be computed in O(n log n) time and the product has linear complexity, implementing the correlation using FFTs is asymptotically faster than the naive O(n2) spatial implementation.

.................

.......................................

最重要的一點，Our code is available at： https://github.com/jonas-koehler/s2cnn .

實驗效果：

Results We evaluate by RMSE and compare our results to Montavon et al. (2012) and Raj et al. (2016) (see table 3). Our learned representation outperforms all kernel-based approaches and a MLP trained on sorted Coulomb matrices. Superior performance could only be achieved for an MLP trained on randomly permuted Coulomb matrices. However, sufficient sampling of random permutations grows exponentially with N, so this method is unlikely to scale to large molecules.

文中定義了S2和SO（3）的互相關，並分析了它們的屬性，進而實現了一個通用的RRT相關算法。實驗的數值結果證明了該算法的穩定性和準確性，即便在深度網絡上依然有效。

總之，在準確率、可擴展性、等方面是綜合最有前途的一個三維網絡。

進一步優化：

For intrinsically volumetric tasks like 3D model recognition, we believe that further improvements can be attained by generalizing further beyond SO(3) to the roto-translation group SE(3). The development of Spherical CNNs is an important first step in this direction. Another interesting generalization is the development of a Steerable CNN for the sphere (Cohen and Welling, 2017), which would make it possible to analyze vector fields such as global wind directions, as well as other sections of vector bundles over the sphere.

把SO(3)上的計算往SE(3)上進行轉化，把旋轉相關性變換到切空間的平移SE(3)，應該能夠達到新的加速效果。

Appendix：

李羣與李代數

三維旋轉矩陣構成了特殊正交羣SO(3)，而變換矩陣構成了特殊歐氏羣SE(3)

但不管SO(3)，仍是SE(3)，它們都不符合加法封閉性，即加以後再也不符合旋轉矩陣的定義，可是乘法卻知足，將這樣的矩陣稱爲羣。即只有一種運算的集合叫作羣。

羣記做G=(A, .)，其中A爲集合，.表示運算。羣要求運算知足如下幾個條件：

（1）封閉性。（2）結合律。

（3）幺元。一種集合裏特殊的數集。

（4）逆。

能夠證實，旋轉矩陣集合和矩陣乘法構成羣，而變換矩陣和矩陣乘法也構成羣。

介紹了羣的概念以後，那麼，什麼叫李羣呢？

李羣就是連續（光滑）的羣。一個剛體的運動是連續的，因此它是李羣。