Learning Disentangled Feature Representation for Hybrid-distorted Image Restoration

Hybrid-distorted image restoration (HD-IR) is dedicated to restore real distorted image that is degraded by multiple distortions. Existing HD-IR approaches usually ignore the inherent interference among hybrid distortions which compromises the restoration performance.

研究方向：Hybrid-distorted image restoration 這個任務是幹啥的；

motivation：現有的 HD-IR 方法一般忽略了混合失真的內在干擾，從而影響了恢復性能。

To decompose such interference, we introduce the concept of Disentangled Feature Learning to achieve the feature-level divide-and-conquer of hybrid distortions.

Specifically, we propose the feature disentanglement module (FDM) to distribute feature representations of different distortions into different channels by revising gain-control-based normalization. We also propose a feature aggregation module (FAM) with channel-wise attention to adaptively filter out the distortion representations and aggregate useful content information from different channels for the construction of raw image.

策略：爲了分解這種干擾，引入了解耦特徵學習的概念來實現混合失真的特徵級分而治之。

方法：1. feature disentanglement module (FDM)：特徵解耦模塊，經過修正增益控制歸一化，將不一樣畸變的特徵表示分佈到不一樣的信道中；

2. feature aggregation module (FAM)：特徵聚合模塊（結合通道注意力模型），自適應地濾除失真表示，並從不一樣的通道中聚合有用的內容信息來構造原始圖像。

The effectiveness of the proposed scheme is verified by visualizing the correlation matrix of features and channel responses of different distortions.

Extensive experimental results also prove superior performance of our approach compared with the latest HD-IR schemes.

實驗結論：經過對不一樣畸變特徵和通道響應的相關矩陣的可視化驗證了該方法的有效性。

大量的實驗結果也證實了本文的方法優於最新的 HD-IR 方案。

Introduction

仍是那句話，Introduction 是一個倒三角！從大到小；從通常到具體的過程。應當注意幾點：1. 大的部分能夠用幾句話就概況，不用大段說明（大背景通常就幾句話，通常第一段就將大的背景縮小到文章的主要研究領域；而後再用第二段縮小到文章的具體研究內容和問題）；2. 從大到小銜接連貫。

Nowadays, Image restoration techniques have been applied in various fields, including streaming media, photo processing, video surveillance, and cloud storage, etc. In the process of image acquisition and transmission, raw images are usually contaminated with various distortions due to capturing devices, high ratio compression, transmission, post-processing, etc.

Previous image restoration methods focusing on single distortion have been extensively studied [5,12,36,25,14,6,27,17,18,35] and achieved satisfactory performance on the field of super resolution [22,7,21] , deblurring [30,24,38], denoising [4,43,44], deraining [10,9,34], dehazing [45,1,41] and so on.

However, these works are usually designed for solving one specific distortion, which makes them difficult to be applied to real world applications as shown in Fig. 13.

Fig. 1. Examples of hybrid-distorted image restoration. (I) Hybrid-distorted image including noise, blur, and jpeg artifacts. (II) Processed with the cascading single distortion restoration networks including dejpeg, denoise and deblurring. (III)Processed with our FDR-Net.

故事開篇，經過介紹圖像處理的簡單背景，引出了本文的重要議題，即：

真實圖像從採集到傳輸，會遇到各類各樣的污染。

傳統的方法只考慮單一污染處理。

這些方法沒法適應於真實圖像混合污染的現實狀況。

{ 這段是大方向的問題，後面將講述一個跟爲細節的問題，也是本文不一樣於其它 HI-IR 方法的獨到之處。}

Real world images are typically affected by multiple distortions simultaneously. In addition, different distortions might be interfered with each other, which makes it difficult to restore images. More details about interference between different distortions can be seen in the supplementary material.

Recently, there have been proposed some pioneering works for hybrid distortion. For example, Yu et al. [37] pre-train several light-weight CNNs as tools for different distortions. Then they utilize the Reinforcement Learning (RL) agent to learn to choose the best tool-chain for unknown hybrid distortions. Then Suganuma et al. [28] propose to use the attention mechanism to implement the adaptive selection of different operations for hybrid distortions.

However, these methods are designed regardless of the interference between hybrid distortions.

本段進一步縮小研究問題，將研究內容聚焦在一個更具體的問題上，即：

不一樣的失真可能會相互干擾；

現有的方法採用強化學習和注意力機制；

然而，這些方法的設計不考慮混合畸變之間的干擾。

Yu et al. [37] Crafting a toolchain for image restoration by deep reinforcement learning 2018 CVPR

Suganuma et al. [28] Attention-based Adaptive Selection of Operations for Image Restoration in the Presence of Unknown Combined Distortions 2019 CVPR ; CSDN 博客：MyDLNote

Previous literature [2] suggests that deep feature representations could be employed to efficiently characterize various image distortions. In other words, the feature representation extracted from hybrid distortions images might be disentangled to characterize different distortions respectively. Based on the above, we propose to implement the feature-level divide-and-conquer of the hybrid distortions by learning disentangled feature representations.

On a separate note, Schwartz et al. [26] point out that a series of filters and gain-control-based normalization could achieve the decomposition of different filter responses. And the convolution layer of CNN is also composed of a series of basic filters/kernels. Inspired by this, we expand this theory and design a feature disentanglement module (FDM) to implement the channel-wise feature decorrelation.

By such feature decorrelation, the feature representations of different distortions could be distributed across different channels respectively as shown in Fig. 2.

Fig. 2. Illustration of feature disentangle module (FDM) works. The FDM disentangles the input feature representation which characterizes the hybrid-distorted image into disentangled features. The responses of different distortions are distributed across different channels as shown in brighter region. The visualizations of distortions are also displayed.

此時 Introduction 倒三角已經到底了，開始介紹一下針對上述具體問題，本文的採用的策略或方法。

爲了解決上段提出的具體問題，本文的策略是使用解耦表示。但在這以前，還有考慮一下合理性。

爲了說明合理性，做者從兩個方面加以論證。

1. 解釋不通失真的特徵能夠被分解出來：以前的文獻 [2] 代表，深度特徵表示能夠有效地表徵各類圖像畸變。也就是說，能夠對混合失真圖像提取的特徵表示進行分解，分別表徵不一樣的失真。

2. 解釋分解手段：Schwartz等 [26] 指出，一系列濾波器和基於增益控制的歸一化能夠實現對不一樣濾波器響應的分解。而 CNN 的卷積層也是由一系列基本濾波器 / kernel 組成的。受此啓發，設計了一個特徵解耦模塊 (FDM) 來實現信道方向上的特徵解耦。

經過這種特徵去關係，能夠將不一樣失真的特徵表示分別分佈在不一樣的通道上，如圖2所示。

err，後面兩段分別說明了一下實驗結論和本文貢獻。這裏略過。

[2] Disentangling image distortions in deep feature space 2020

Schwartz et al. [26] Natural signal statistics and sensory gain control 2001 Nature neuroscience

Related Work

Image Restoration on Hybrid Distortion

Recently, with the wide applications of image restoration, the special image restoration on single distortion cannot meet the need of real world application. Then, some works on hybrid distortion restoration have been proposed. Among them, RL-Restore [37] trains a policy to select the appropriate tools for single distortion from the pre-trained toolbox to restore the hybrid-distorted image. Then Suganuma et al. [28] propose a simple framework which can select the proper operation with attention mechanism. However, these works don’t consider the interference between different distortions. In this paper, we propose the FDR-Net for hybrid distortions image restoration by reducing the interference with feature disentanglement, which achieves more superior performance by disentangling the feature representation for hybird-distorted image restoration.

在 Related Work 裏，不是簡單的羅列一堆頂會頂刊文獻，而是要說明一些文獻與本文之間的關係，能夠說是理論基礎（本文是在某些文獻提供的理論基礎上發展而來的），能夠是繼承關係（本文是對某個方法的進一步高水平改進，解決了某個方法存在的問題），也能夠是對比關係（以前的方法不能實現什麼功能，本文提出的方法能夠實現）。這些方法，是本文提出 motivation 的重要線索、依據或者是理論基礎。

總之，要和 motivation 搭上才行。不然，與本文核心內容無關的文獻，即便是大牛巨做，也不應引用。

Approach

Primary Knowledge

To derive the models of sensory processing, gain-control-based normalization has been proposed to implement the nonlinear decomposition of natural signals [26]. Given a set of signals X, the signal decomposition first uses a set of filters <f1, f2, ..., fn> to extract different representations of signals <L1, L2, ...Ln> as

(2)

and then uses gain-control-based normalizaiton as Eq. 2 to further eliminate the dependency between <L1, L2, ...Ln>.

where independent response Ri can be generated based on the suitable weights $w_{ji}$ and offsets $\sigma ^2$ with corresponding inputs Li. In this way, the input signals X can be decomposed into R1, R2, ...Rn according to their statistical property.

本節介紹了一個重要方法，gain-control-based normalization。目的是將輸入信號 X 分解爲 R1, R2, ...Rn，消除之間的依賴關係

Feature Disentanglement Module

Previous literature [2] has proved that different distortions have different deep feature representation and could be disentangled at feature representation. Based on this analysis, we design the FDM by expanding the signal decomposition into channel-wise feature disentanglement to reduce the interference between hybrid distortions.

As shown in [26], the combination of diverse linear filters and divisive normalization has the capability to decompose the responses of filters. And the feature representation of CNN is also composed of responses from a series of basic filters/kernels. Based on such observation, we implement the adaptive channel-wise feature decomposition by applying such algorithm in learning based framework.

本文的特徵解耦，是創建在兩個理論基礎（知識）之上的。

第一，前的文獻[2]已經證實了不一樣的畸變有不一樣的深層特徵表示，能夠從特徵表示中解出。

第二，不一樣線性濾波器與分裂歸一化相結合，具備對濾波器響應進行分解的能力。而 CNN 的 feature representation 也是由一系列基本 filter /kernel 的響應組成。

有了這些基礎，本文的特徵解耦模型才得以成立。

Specifically, we use the convolution layer (Cin ×Cout×k×k) in neural network to replace traditional filters <f1, f2, ..., fn> in signal decomposition as section 3.1. Here, the number of input channel Cin represents the channel dimension of input feature Fin, the number of output channel Cout represents the channel dimension of output feature Fout, and k is the kernel size of filters. In this way, the extraction results $S_1, S_2, ...S_{Cout}$ of convolution layer will be distributed in different channels as:

（3）

where represents the ith channel in output feature and conv represents the convolution layer.

To introduce the gain-control-based normalization as Eq. 2 into CNN, we modified the Eq. 2 as Eq. 4.

（4）

where $w_{ji}$ and can be learned by gradient descent. Si and Di represent the ith channel components of features before and after gain control.

In formula 4, we make two major improvements to make it applicable to our task. One improvement is that the denominator and numerator is the square root of the original one in Eq. 2 which makes it is easy to implement the gradient propagation. Another improvement is to replace the response of filters Li with channel components of features Si , which is proper for channel-wise feature disentanglement.

In order to guide the study of parameters from convolution layer, $w_{ji}$ and , we introduce the spectral value difference orthogonality regularization (SVDO) from [3] as a loss constraint. As a novel method to reduce feature correlations, SVDO can be expressed as Eq. 5.

（5）

where $\lambda _1(FF^T)$ and $\lambda _2(FF^T)$ denote the largest and smallest eigenvalues of , respectively. F is feature maps and T expresses the transposition.

具體地，本文的特徵解耦模型是：

1. 對輸入特徵作卷積操做；輸出爲 $S_1, S_2, ...S_{Cout}$ ；

2. 對 $S_1, S_2, ...S_{Cout}$ 作公式（2）的 gain-control-based normalization 操做；輸出爲 $D_1, D_2, ...D_{Cout}$ ；

（注意：公式（4）和公式（2）有兩個不一樣：一是（4）的分母作了開平方，易於反向傳播；另外一個改進是用特徵 Si 的通道份量代替濾波器 Li 的響應，這適合於通道方向的特徵解耦。）

3. 訓練時，引入了譜值差正交正則化 spectral value difference orthogonality regularization (SVDO) 做爲損失函數。

Feature aggregation Module

To further filter out the distortion representation and maintain the raw image content details, we utilize channelwise attention mechanism to adaptively select useful feature representation from processed disentangled feature as Eq. 6.

(6)

where PM represents the process module and CA presents channel attention. Di is the ith channel of disentangled feature. Fp i represents the ith channel of output feature.

To construct the image, we get the inversion formula from Eq. 4 as:

(7)

where Fc i represents the output feature corresponding to the distribution of clean image.

With this module, the processed image information could be aggregated to original feature space, which is proper for reconstructing restored image. Feature aggregation module (FAM) is designed as shown in Fig. 4

特徵聚合模塊包括兩個步驟：

1. 通道注意力模型用於選擇有用通道；

2. gain-control-based normalization 的逆變換；目的是，能夠將處理後的圖像信息聚合到原始特徵空間中，適合重建恢復後的圖像。

Auxiliary Module

In the processes of feature disentanglement, the mutual information of different channels of features is reduced, which will result in some loss of image information. In order to make up for the weakness of the feature disentanglement branch, we used the existing ResBlock to enhance the transmission of image information in parallel.

做者認爲，在特徵解耦的過程當中，減小了不一樣通道特徵之間的相互信息，會形成圖像信息的必定損失。

爲了彌補特徵解糾纏分支的不足，用 ResBlock 做爲並行路徑，來加強圖像信息的傳輸。

Overview of Whole Framework

注意幾點：

1. FDM 和 FAM 之間有若干 ResNet Block；

2. FDM 和 FAM 之間是有跳接的，像 U-Net 那樣的跳接；

3. multi-phases，即多個階段。

4. 利用雙殘差將不一樣相位聯繫起來，加強了不一樣相位之間信息的相互利用。（注意到，Phase 1 上面還有一條紅色的 skip connection；FDM 後面還有一個 skip connection）。

Loss Function

L1 loss and feature orthogonality loss

β = 0.00001 ！意不意外，或許特性的正交性損失自己值很大？

Experiments

Dataset

對於混合失真圖像訓練和測試的數據集：

DIV2K dataset：

1. 750 張訓練；50 張測試；固然不會這麼少，每張圖片切成 patches of size 63 ×63，這樣一共有 249344 訓練，3584 測試；

2. 這些圖片要人工加入 Gaussian noise, Gaussian blur, and JPEG compression artifacts。各參數：

The standard deviations of Gaussian blur and Gaussian noise are randomly chosen from the range of [0, 10] and [0, 50]；

The quality of JPEG compression is chosen from [10, 100]。

可是這些圖分辨率不高，因此又引入了一些 DID-HY 數據集的圖。

DID-HY dataset ： is built by adding Gaussian blur, Gaussian noise and JPEG compression artifacts based on DID-MDN dataset

The training set contains 12, 000 distortion/clean image pairs, which have resolutions of 512 × 512. And the testing set contains 1, 200 distorted/clean image pairs.

還有一個很是重要的問題就是，若是真實圖像確實只受到一種干擾，本文的混合失真圖像復原算法，對這特定的失真圖像復原效果，是否會好於單一失真圖像復原算法呢？這個考察很是重要！

因此，文章還在單一失真圖像數據集上作測試實驗，數據集包括：

Gopro dataset for Deblurring. As the standard dataset for image deblurring, GOPRO datast is produced by [14], which contains 3214 blurry/clean image pairs. The training set and testing set consists of 2103 pairs and 1111 pairs respectively.

DID-MDN Dataset for Deraining. The DID-MDN dataset is produced by [40]. Its training dataset consists of 12000 rain/clean image pairs which are classified to three level (light, medium, and heavy) based on its rain density. There are 4000 image pairs per level in the dataset. The testing dataset consists of 1200 images, which contains rain streaks with different orientations and scales.

Comparison with State-of-the-Arts

先看一下在混合失真圖像算法上的效果。

1. DIV2K dataset.

2. DID-HY dataset

err，毫無懸念，在兩個數據集上，確定是本文的方法最好。

再看看在單一失真圖像上的效果。

1. Gopro-test dataset : deblurring

2. DID-MDN dataset : deraining

err, 毫無心外，本文的方法仍是最好的效果。

也就是說，雖然本文是解決混合失真圖像復原；但當這個混合中，只有一個時，該方法效果也是能夠的。這是一個很是重要的結論。

Interpretative Experiment

Fig. 8. (a)Visualization of correlation matrix between channels from the feature before FDM. (b) Visualization of correlation matrix between channels from feature after FDM. (c) Channels responses corresponding to different distortion types after FDM. As shown in (a)(b), FDM reduces the channel-wise correlations by disentangling the feature. From (c), the different distortions are distributed in different channels regardless of the levels of distortion by feature disentanglement.

本文還作了一個很是重要的解釋實驗。

(a) FDM 前特徵的通道間相關矩陣可視化

(b) FDM 後特徵通道間相關矩陣的可視化

對比 (a) 和 (b)，說明了FDM經過分離特徵來減小信道相關

Ablation Studies

FDM 和輔助分支（ResNet Blocks 分支）的做用：

FDlayers （就是在 FDM 模塊中，重複多少次 FD 層，看圖 2；FAlayers 同理，看圖 3）和通道數的做用：

能夠看到， FDlayer 選擇 3 便可；通道數雖然越多越好，但本文仍是考慮模型大小，選擇了 32。

Multi-phases 的做用

越多越好；仍是選了 6。

MyDLNote-Enhancment : 基於解耦特徵表示的混合失真圖像修復算法