[論文速覽] CVPR 2020 那些有趣的圖像超分辨算法(9篇)(1/2)

[論文速覽] CVPR 2020 那些有趣的圖像超分辨算法(共9篇)(1/2)

關鍵詞:Unpaired; Pseudo-Supervision; Gradient Guidance; Texture Transformer Network; Deep Unfolding Network; Meta-Transfer; Zero-Shot; Super-Resolutionios

本文以速覽形式,帶領你們大概瞭解一下 CVPR2020 那些有趣(重要)的 SR 文章,目的是快速瞭解 SR 的最新動向(解決什麼問題,採用什麼模型)。算法

共分爲兩期博客分別介紹。這是第一期,第二期的連接:express

[論文速覽] CVPR 2020 那些重要的圖像超分辨算法(共9篇)(2/2)【持續更新中】網絡

 

目錄app

————————  第一期  ————————less

[論文速覽] CVPR 2020 那些有趣的圖像超分辨算法(共9篇)(1/2)dom

Unpaired Image Super-Resolution Using Pseudo-Supervision ide

[pdf] [supp] [bibtex]函數

Abstract 工具

Loss Functions

Network Architecture

Structure-Preserving Super Resolution With Gradient Guidance

[pdf] [supp] [bibtex]

Abstract 

Details in Architecture

Objective Functions

Learning Texture Transformer Network for Image Super-Resolution

[pdf] [supp] [bibtex]

Abstract

Texture Transformer

Cross-Scale Feature Integration

Loss Function

Deep Unfolding Network for Image Super-Resolution

[pdf] [bibtex]

Meta-Transfer Learning for Zero-Shot Super-Resolution

[pdf] [supp] [bibtex]

————————  第二期  ————————

Closed-Loop Matters: Dual Regression Networks for Single Image Super-Resolution

[pdf] [supp] [bibtex]

Residual Feature Aggregation Network for Image Super-Resolution

[pdf] [supp] [bibtex]

Correction Filter for Single Image Super-Resolution: Robustifying Off-the-Shelf Deep Super-Resolvers

[pdf] [supp] [bibtex]

Image Super-Resolution With Cross-Scale Non-Local Attention and Exhaustive Self-Exemplars Mining 

[pdf] [supp] [bibtex]



1


Unpaired Image Super-Resolution Using Pseudo-Supervision 

[pdf] [supp] [bibtex]

Abstract 

In most studies on learning-based image super-resolution (SR), the paired training dataset is created by downscaling high-resolution (HR) images with a predetermined operation (e.g., bicubic). However, these methods fail to super-resolve real-world low-resolution (LR) images, for which the degradation process is much more complicated and unknown.

Motivation:傳統算法沒有很好針對現實世界的低分辨率,其降解過程是複雜和未知的。

In this paper, we propose an unpaired SR method using a generative adversarial network that does not require a paired/aligned training dataset. Our network consists of an unpaired kernel/noise correction network and a pseudo-paired SR network. The correction network removes noise and adjusts the kernel of the inputted LR image; then, the corrected clean LR image is upscaled by the SR network. In the training phase, the correction network also produces a pseudo-clean LR image from the inputted HR image, and then a mapping from the pseudo-clean LR image to the inputted HR image is learned by the SR network in a paired manner. Because our SR network is independent of the correction network, well-studied existing network architectures and pixel-wise loss functions can be integrated with the proposed framework.

本文的工做:

1. 提出一種不須要配對/對齊訓練數據集的生成對抗網絡的非配對SR方法

2. 網絡結構:unpaired kernel/noise correction network(非配對核/噪聲校訂網絡)和 pseudo-paired SR network(僞配對SR網絡)。

correction network:去除噪音並調整輸入 LR 圖像的內核;在訓練系統時, 校訂網絡從輸入的 HR 圖像中生成一個僞乾淨 LR 圖像;

SR network:放大修正後的清潔 LR 圖像;在訓練系統時,SR 網絡經過配對學習僞乾淨 LR 圖像到輸入HR圖像的映射。

看圖,能夠很好理解。

Figure 3: Data-flow diagram of proposed method. SR network U_{Y_{\downarrow }Y} can be learned in a paired manner through \mathcal{L}_{rec}, even if the training dataset \{X, Y \} is not paired. The whole network is end-to-end trainable.

 

Experiments on diverse datasets show that the proposed method is superior to existing solutions to the unpaired SR problem.

結論

 

Loss Functions

公式中的符號能夠從圖 3 中找對應。

  • Adversarial loss

1.   (圖3,中間的那個 GAN)

最左邊的也同樣,

2.      (圖3,右邊的那個 GAN) (公式中,圓圈的含義是 

  • Cycle consistency loss

The normal CycleGAN learns one-to-one mappings because it imposes cycle consistency on both cycles (i.e.,X \rightarrow Y \rightarrow X andY \rightarrow X \rightarrow Y).

對兩個生成器:G_{X,Y_{\downarrow }} 和 G_{Y_{\downarrow },X} 的學習。

  • Identity mapping loss

對生成器 G_{Y_{\downarrow },X} 的學習。

  • Geometric ensemble loss

幾何一致性是在最近的做品 [Geometryconsistent generative adversarial networks for one-sided unsupervised domain mapping] 中引入的,它減小了可能的平移空間以保存場景幾何。

operators represent eight distinct patterns of flip and rotation (八種不一樣的翻轉和旋轉模式).

  • Full objective

 

Network Architecture

:看幾個關鍵詞,就大概知道結構了。

The RCAN consists of 10 residual groups (RGs), where each RG contains 20 residual channel attention blocks (RCABs).

Our GXY↓ (UY↓Y ) is a reduced version of the RCAN consisting of five RGs with 10 (20) RCABs.

RCAN : Image super-resolution using very deep residual channel attention networks

use several residual blocks with 5×5 filters and several fusion layers with 1×1 filters, where each convolution layer is followed by batch normalization (BN) [16] and LeakyReLU.

---------------------------------------------------------------------

本文變量有點多,下圖是這些變量在訓練時的展現:

Figure 4: Intermediate images of proposed method. x is image 「0886」 from the DIV2K realistic-wild validation set, and y is image 「0053」 from the DIV2K training ground-truth set.

 


2.


Structure-Preserving Super Resolution With Gradient Guidance

[pdf] [supp] [bibtex]

Abstract 

Structures matter in single image super resolution (SISR). Recent studies benefiting from generative adversarial network (GAN) have promoted the development of SISR by recovering photo-realistic images. However, there are always undesired structural distortions in the recovered images.

motivation:在恢復的圖像中,老是存在着不但願看到的結構畸變

In this paper, we propose a structure-preserving super resolution method to alleviate the above issue while maintaining the merits of GAN-based methods to generate perceptual-pleasant details. Specifically, we exploit gradient maps of images to guide the recovery in two aspects. On the one hand, we restore high-resolution gradient maps by a gradient branch to provide additional structure priors for the SR process. On the other hand, we propose a gradient loss which imposes a second-order restriction on the super-resolved images. Along with the previous image-space loss functions, the gradient-space objectives help generative networks concentrate more on geometric structures. Moreover, our method is model-agnostic, which can be potentially used for off-the-shelf SR networks.

本文工做:

利用圖像的梯度圖,從兩個方面指導恢復:

1. 經過一個梯度分支來恢復高分辨率的梯度地圖,爲SR過程提供額外的結構先驗;

2. 提出了一種梯度損失,它對超分辨圖像施加二階限制。

在原有的像-空間損失函數的基礎上,梯度-空間目標使得生成網絡更加集中於幾何結構。此外,本文的方法是與模型無關的,能夠潛在地用於現成的SR網絡。

Experimental results show that we achieve the best PI and LPIPS performance and meanwhile comparable PSNR and SSIM compared with state-of-the-art perceptual-driven SR methods. Visual results demonstrate our superiority in restoring structures while generating natural SR images.

結論

 

Details in Architecture

Figure 2. Overall framework of our SPSR method. Our architecture consists of two branches, the SR branch and the gradient branch. The gradient branch aims to super-resolve LR gradient maps to the HR counterparts. It incorporates multi-level representations from the SR branch to reduce parameters and outputs gradient information to guide the SR process by a fusion block in turn. The final SR outputs are optimized by not only conventional image-space losses, but also the proposed gradient-space objectives.

  • Gradient Branch

圖 2 中的 M() 函數是表示提取梯度映射的操做,是這樣計算的:

 

如圖2所示,gradient分支包含了來自SR分支的幾個中間層表示。這種方案的動機是設計良好的SR分支可以攜帶豐富的結構信息。

在每兩個中間特徵之間有一個梯度塊,它能夠是任意一個基本塊來提取更高層次的特徵。

經過梯度分支獲得 SR 梯度映射後,就能夠將獲得的梯度特徵整合到 SR 分支中,依次指導 SR 重構。

梯度圖的大小能夠隱含地反映復原區域是銳化仍是平滑化。

在實踐中,將梯度分支的下一層生成的特徵圖饋送到SR分支。同時,將這些特徵圖做爲輸入,經過1x1卷積層生成梯度圖輸出。

  • Structure-Preserving SR Branch

這個分支由兩部分組成。

第一部分是由多個生成神經塊組成的規則 SR 網絡,能夠是任何結構。

本文介紹了在 ESRGAN [42] 中提出的殘差稠密塊 (RRDB) 中的殘差。原始模型中有23個 RRDB 塊。所以,將第五、十、1五、20個 block 的 feature maps 合併到 gradient branch 中。因爲常規的SR模型生成的圖像只有 3 個通道,因此咱們去掉最後一個卷積重建層,將輸出特徵輸入到連續的部分。SR 分支的第二部分鏈接從上面提到的梯度分支獲得的SR梯度特徵圖。咱們經過一個融合塊將兩個分支的特徵融合在一塊兒來融合結構信息。

 

Objective Functions

  • pixelwise loss

  • Perceptual loss

where φi(.) denotes the ith layer output of the VGG model.

  • adversarial loss

  • Gradient Loss

Figure 3. An illumination of a simple 1-D case. The first row shows the pixel sequences and the second row shows their corresponding gradient maps

圖 3 清楚地說明了動機。這裏只考慮一個簡單的一維狀況。真實(HR)邊緣是圖 3 (a),超分辨(SR)的邊緣是圖 3 (b)。  若是模型只有在圖像空間優化 L1 損失,模型未能恢復銳利邊緣,緣由是模型每每會給一個從訓練數據統計平均的 HR 解。在這種狀況下,若是計算和顯示兩個序列的梯度大小,能夠觀察到 SR 梯度是平緩的,數值較低,而 HR 梯度是一個尖峯,數值較高。

這啓發了咱們,若是咱們在優化目標上增長一個二階梯度約束,模型能夠從梯度空間學到更多。它使模型聚焦於相鄰配置(neighboring configuration),從而更準確地推斷出銳度的局部強度。所以,若是捕獲圖 3 (f) 的梯度信息,則恢復圖 3 (c) 的機率顯著增長。SR 方法能夠受益於這樣的指導,以免過分平滑或過分銳化的恢復。在梯度空間中更容易提取幾何特徵。所以,幾何結構也能夠很好地保留,從而產生更逼真的 SR 圖像。

這裏咱們提出了一個梯度損失來實現上述目標。既然已經提到了梯度映射是反映圖像結構信息的理想工具,它也能夠做爲一個二階約束,爲生成器提供監督。經過減少從 SR 圖像提取的梯度圖與對應的 HR 圖像提取的梯度圖之間的距離來表示梯度損失。在圖像和梯度域的監督下,生成器不只能夠學習良好的外觀,還能夠注意避免詳細的幾何失真。所以,咱們設計了兩個損失項來彌補SR和HR圖像的梯度圖(GM)的差別。一個是基於像素損失,以下所示

pixelwise loss

二是從HR梯度圖中判別一個梯度patch是否存在。爲了實現這個目標,咱們設計了另外一個梯度鑑別器網絡:

gradient discriminator network

梯度鑑別器也能夠經過對抗性學習監督SR結果的生成:

注意,操做M(·)中的每一步都是可微的。所以,能夠對梯度損失的模型進行端到端的訓練。此外,在任何生成模型中,因爲公式簡潔且可轉移性強,能夠方便地採用梯度損失做爲附加指導。


3.


Learning Texture Transformer Network for Image Super-Resolution

[pdf] [supp] [bibtex]

Abstract

We study on image super-resolution (SR), which aims to recover realistic textures from a low-resolution (LR) image. Recent progress has been made by taking high-resolution images as references (Ref), so that relevant textures can be transferred to LR images. However, existing SR approaches neglect to use attention mechanisms to transfer high-resolution (HR) textures from Ref images, which limits these approaches in challenging cases.

motivition: 現有的 SR 方法忽略了使用注意力機制從參考圖像 Ref 轉移高分辨率 HR 紋理

In this paper, we propose a novel Texture Transformer Network for Image Super-Resolution (TTSR), in which the LR and Ref images are formulated as queries and keys in a transformer, respectively. TTSR consists of four closely-related modules optimized for image generation tasks, including a learnable texture extractor by DNN, a relevance embedding module, a hard-attention module for texture transfer, and a soft-attention module for texture synthesis. Such a design encourages joint feature learning across LR and Ref images, in which deep feature correspondences can be discovered by attention, and thus accurate texture features can be transferred. The proposed texture transformer can be further stacked in a cross-scale way, which enables texture recovery from different levels (e.g., from 1x to 4x magnification).

本文方法:

在本文中,提出了一種新的用於圖像超分辨率 (TTSR) 的紋理變換網絡,其中 LR 和 Ref 圖像分別被表示爲變換中的 查詢 Q 和 鍵 K。TTSR 由四個模塊組成,包括基於 DNN 的可學習紋理提取器、相關性嵌入模塊、用於紋理傳輸的硬注意模塊和用於紋理合成的軟注意模塊。這樣的設計促進了 LR 和 Ref 圖像的聯合特徵學習,經過注意能夠發現深層特徵對應,從而傳遞準確的紋理特徵。所提出的紋理轉換器能夠進一步以跨尺度的方式堆疊,從而可以從不一樣的級別 (例如,從 1 倍到 4 倍放大) 恢復紋理。

Figure 2. The proposed texture transformer. Q, K and V are the texture features extracted from an up-sampled LR image, a sequentially down/up-sampled Ref image, and an original Ref image, respectively. H and S indicate the hard/soft attention map, calculated from relevance embedding. F is the LR features extracted from a DNN backbone, and is further fused with the transferred texture features T for generating the SR output.

Extensive experiments show that TTSR achieves significant improvements over state-of-the-art approaches on both quantitative and qualitative evaluations. 

結論

 

Texture Transformer

圖 2 所示。

In Figure 2, LR, LR↑ and Ref represent the input image, the 4× bicubic-upsampled input image and the reference image, respectively.

There are four parts in the texture transformer: the learnable texture extractor (LTE), the relevance embedding module (RE), the hard-attention module for feature transfer (HA) and the soft-attention module for feature synthesis (SA). 

  • Learnable Texture Extractor

We design a learnable texture extractor whose parameters will be updated during end-to-end training. Such a design encourages a joint feature learning across the LR and Ref image, in which more accurate texture features can be captured. The process of texture extraction can be expressed as:

where LT E(·) denotes the output of our learnable texture extractor. The extracted texture features, Q (query), K (key), and V (value) indicate three basic elements of the attention mechanism inside a transformer and will be further used in our relevance embedding module.

特徵提取:就是把卷積層的名字改爲了 Learnable Texture Extractor,目的是爲了輔助本文的 texture 這個主線。

這裏注意 Ref ↓↑。這個操做是先對 Ref 圖片作雙三次(bicubic)下采樣,再作雙三次上採樣,目的是爲了和 LR↑ 保持域一致(即都是通過雙三次變換獲得的)。

  • Relevance Embedding

Relevance embedding aims to embed the relevance between the LR and Ref image by estimating the similarity between Q and K. We unfold both Q and K into patches , denoted as q_i (i \in [1, H_{LR }\times W_{LR}]) ~and~ k_j (j \in [1, H_{Ref }\times W_{Ref }]). Then for each patch q_i in Q and k_j in K, we calculate the relevance r_{i,j} between these two patches by normalized inner product:

計算 Q 和 K 之間 patch-wise 相關性。注意,最後計算獲得的 r 是一個 i\times j 的矩陣,其中i \in [1, H_{LR }\times W_{LR}], ~j \in [1, H_{Ref }\times W_{Ref }]

  • Hard-Attention

We propose a hard-attention module to transfer the HR texture features V from the Ref image. Traditional attention mechanism takes a weighted sum of V for each query q_i . However, such an operation may cause blur effect which lacks the ability of transferring HR texture features. Therefore, in our hard-attention module, we only transfer features from the most relevant position in V for each query q_i .

More specifically, we first calculate a hard-attention map H in which the i-th element h_i (i \in [1, H_{LR} \times W_{LR}]) is calculated from the relevance r_{i,j} :

The value of h_i can be regarded as a hard index, which represents the most relevant position in the Ref image to the i-th position in the LR image. To obtain the transferred HR texture features T from the Ref image, we apply an index selection operation to the unfolded patches of V using the hard-attention map as the index:

where t_i denotes the value of T in the i-th position, which is selected from the h_i-th position of V . As a result, we obtain a HR feature representation T for the LR image which will be further used in our softattention module.

傳統的注意機制對每一個查詢 Q 取 V 的加權和。可是這樣的操做可能會產生模糊效果(在一個位置 point 上,各通道的聚合,聚合方式確實不能很明確地 extract 最重要的那個元素),缺少傳遞 HR 紋理特徵的能力。所以,在咱們的 hard-attention 模塊中,對於每一個查詢 Q,只從 V 中最相關的位置轉移特徵。

更具體地說,首先計算一個 hard-attention map H,按照上述公式,h_i 表示的是當 r_{i,j} 取最大值(與 LR 圖像位置 i 最相關)的位置索引 j

爲了得到 Ref 圖像中轉移的 HR 紋理特徵 T,以 hard-attention map 爲索引,對 V 中展開的 patch 進行索引選擇操做。這句話的意思是,T 在位置 i 的取值是 V 中以 i 爲中心的 patch 中,h_i 索引的位置的取值(有點繞)。

  • Soft-Attention

We propose a soft-attention module to synthesize features from the transferred HR texture features T and the LR features F of the LR image from a DNN backbone. During the synthesis process, relevant texture transfer should be enhanced while the less relevant ones should be relived. To achieve that, a soft-attention map S is computed from r_{i,j} to represent the confidence of the transferred texture features for each position in T:

where s_i denotes the i-th position of the soft-attention map S. Instead of directly applying the attention map S to T, we first fuse the HR texture features T with the LR features F to leverage more information from the LR image. Such fused features are further element-wisely multiplied by the soft-attention map S and added back to F to get the final output of the texture transformer. This operation can be represented as:

where F_{out } indicates the synthesized output features. Conv and Concat represent a covolutional layer and Concatenation operation, respectively. The operator \odot denotes element-wise multiplication between feature maps.

s_i 頗有意思,它是取 r_{i,j} 中第 j 列的最大值,而 h_i 是取 s_i 所在的列 j

不是直接將注意力圖 S 應用到 T 上,而是首先融合 HR 紋理特徵 T 和 LR 特徵F,以從 LR 圖像中獲取更多信息。這種融合的特徵是與 soft-attention map 進行元素相乘,再加上 F,獲得紋理轉換器的最終輸出。

綜上所述,紋理轉換器 texture transformer 能夠有效地將 Ref 圖像中的相關 HR 紋理特徵轉換爲 LR 特徵,提升紋理生成的精度。

 

Cross-Scale Feature Integration

Our texture transformer can be further stacked in a crossscale way with a cross-scale feature integration module. The architecture is shown in Figure 3. Stacked texture transformers output the synthesized features for three resolution scales (1×, 2× and 4×), such that the texture features of different scales can be fused into the LR image. To learn a better representation across different scales, inspired by [25, 37], we propose a cross-scale feature integration module (CSFI) to exchange information among the features of different scales. A CSFI module is applied each time the LR feature is up-sampled to the next scale. For the each scale inside the CSFI module, it receives the exchanged features from other scales by up/down-sampling, followed by a concatenation operation in the channel dimension. Then a convolutional layer will map the features into the original number of channels. In such a design, the texture features transferred from the stacked texture transformers are exchanged across each scale, which achieves a more powerful feature representation. This cross-scale feature integration module further improves the performance of our approach.

Figure 3. Architecture of stacking multiple texture transformers in a cross-scale way with the proposed cross-scale feature integration module (CSFI). RBs indicates a group of residual blocks.

紋理轉換器能夠經過一個跨尺度特徵集成模塊進一步以交叉尺度的方式堆疊。該體系結構如圖3所示。堆疊紋理變換輸出三種分辨率尺度 (1,2,4) 的合成特徵,從而將不一樣尺度的紋理特徵融合到 LR 圖像中。爲了學習更好的跨尺度表徵,受 [25,37] 的啓發,提出了跨尺度特徵集成模塊 (CSFI),用於在不一樣尺度的特徵之間交換信息。每次將LR特性向上採樣到下一個級別時,都會應用 CSFI模塊。對於 CSFI 模塊內的每一個尺度,它經過上/下采樣從其餘尺度接收交換的特徵,而後在信道維度進行鏈接操做。而後一個卷積層將特徵映射到原始的通道數量。在這種設計中,從堆疊的紋理變形器中傳輸的紋理特徵在各個尺度上進行交換,實現了更強大的特徵表示。

Loss Function

Reconstruction loss

Adversarial loss

Perceptual loss

 

Deep Unfolding Network for Image Super-Resolution

[pdf] [bibtex]

Abstract

Learning-based single image super-resolution (SISR) methods are continuously showing superior effectiveness and efficiency over traditional model-based methods, largely due to the end-to-end training. However, different from model-based methods that can handle the SISR problem with different scale factors, blur kernels and noise levels under a unified MAP (maximum a posteriori) framework, learning-based methods generally lack such flexibility. To address this issue, this paper proposes an end-to-end trainable unfolding network which leverages both learningbased methods and model-based methods. Specifically, by unfolding the MAP inference via a half-quadratic splitting algorithm, a fixed number of iterations consisting of alternately solving a data subproblem and a prior subproblem can be obtained. The two subproblems then can be solved with neural modules, resulting in an end-to-end trainable, iterative network. As a result, the proposed network inherits the flexibility of model-based methods to super-resolve blurry, noisy images for different scale factors via a single model, while maintaining the advantages of learning-based methods. Extensive experiments demonstrate the superiority of the proposed deep unfolding network in terms of flexibility, effectiveness and also generalizability. 

本文參考個人博客:MyDLNote-Enhancment: [SR轉文] Deep Unfolding Network for Image Super-Resolution

 


Meta-Transfer Learning for Zero-Shot Super-Resolution

[pdf] [supp] [bibtex]

Abstract

Convolutional neural networks (CNNs) have shown dramatic improvements in single image super-resolution (SISR) by using large-scale external samples. Despite their remarkable performance based on the external dataset, they cannot exploit internal information within a specific image. Another problem is that they are applicable only to the specific condition of data that they are supervised. For instance, the low-resolution (LR) image should be a "bicubic" downsampled noise-free image from a high-resolution (HR) one. To address both issues, zero-shot super-resolution (ZSSR) has been proposed for flexible internal learning. However, they require thousands of gradient updates, i.e., long inference time. In this paper, we present Meta-Transfer Learning for Zero-Shot Super-Resolution (MZSR), which leverages ZSSR. Precisely, it is based on finding a generic initial parameter that is suitable for internal learning. Thus, we can exploit both external and internal information, where one single gradient update can yield quite considerable results. With our method, the network can quickly adapt to a given image condition. In this respect, our method can be applied to a large spectrum of image conditions within a fast adaptation process.

本文參考個人博客 MyDLNote-Enhancement:[SR轉文] 2020CVPR : Meta-Transfer Learning for Zero-Shot Super-Resolution

 


第二期內容預覽:

Closed-Loop Matters: Dual Regression Networks for Single Image Super-Resolution

[pdf] [supp] [bibtex]

Abstract

Deep neural networks have exhibited promising performance in image super-resolution (SR) by learning a nonlinear mapping function from low-resolution (LR) images to high-resolution (HR) images. However, there are two underlying limitations to existing SR methods. First, learning the mapping function from LR to HR images is typically an ill-posed problem, because there exist infinite HR images that can be downsampled to the same LR image. As a result, the space of the possible functions can be extremely large, which makes it hard to find a good solution. Second, the paired LR-HR data may be unavailable in real-world applications and the underlying degradation method is often unknown. For such a more general case, existing SR models often incur the adaptation problem and yield poor performance. To address the above issues, we propose a dual regression scheme by introducing an additional constraint on LR data to reduce the space of the possible functions. Specifically, besides the mapping from LR to HR images, we learn an additional dual regression mapping estimates the down-sampling kernel and reconstruct LR images, which forms a closed-loop to provide additional supervision. More critically, since the dual regression process does not depend on HR images, we can directly learn from LR images. In this sense, we can easily adapt SR models to real-world data, e.g., raw video frames from YouTube. Extensive experiments with paired training data and unpaired real-world data demonstrate our superiority over existing methods. 

 


Residual Feature Aggregation Network for Image Super-Resolution

[pdf] [supp] [bibtex]

Abstract

Recently, very deep convolutional neural networks (CNNs) have shown great power in single image super-resolution (SISR) and achieved significant improvements against traditional methods. Among these CNN-based methods, the residual connections play a critical role in boosting the network performance. As the network depth grows, the residual features gradually focused on different aspects of the input image, which is very useful for reconstructing the spatial details. However, existing methods neglect to fully utilize the hierarchical features on the residual branches. To address this issue, we propose a novel residual feature aggregation (RFA) framework for more efficient feature extraction. The RFA framework groups several residual modules together and directly forwards the features on each local residual branch by adding skip connections. Therefore, the RFA framework is capable of aggregating these informative residual features to produce more representative features. To maximize the power of the RFA framework, we further propose an enhanced spatial attention (ESA) block to make the residual features to be more focused on critical spatial contents. The ESA block is designed to be lightweight and efficient. Our final RFANet is constructed by applying the proposed RFA framework with the ESA blocks. Comprehensive experiments demonstrate the necessity of our RFA framework and the superiority of our RFANet over state-of-the-art SISR methods. 

 


Correction Filter for Single Image Super-Resolution: Robustifying Off-the-Shelf Deep Super-Resolvers

[pdf] [supp] [bibtex]

Abstract

The single image super-resolution task is one of the most examined inverse problems in the past decade. In the recent years, Deep Neural Networks (DNNs) have shown superior performance over alternative methods when the acquisition process uses a fixed known downscaling kernel---typically a bicubic kernel. However, several recent works have shown that in practical scenarios, where the test data mismatch the training data (e.g. when the downscaling kernel is not the bicubic kernel or is not available at training), the leading DNN methods suffer from a huge performance drop. Inspired by the literature on generalized sampling, in this work we propose a method for improving the performance of DNNs that have been trained with a fixed kernel on observations acquired by other kernels. For a known kernel, we design a closed-form correction filter that modifies the low-resolution image to match one which is obtained by another kernel (e.g. bicubic), and thus improves the results of existing pre-trained DNNs. For an unknown kernel, we extend this idea and propose an algorithm for blind estimation of the required correction filter. We show that our approach outperforms other super-resolution methods, which are designed for general downscaling kernels. 

 


Image Super-Resolution With Cross-Scale Non-Local Attention and Exhaustive Self-Exemplars Mining 

[pdf] [supp] [bibtex]

Abstract

Deep convolution-based single image super-resolution (SISR) networks embrace the benefits of learning from large-scale external image resources for local recovery, yet most existing works have ignored the long-range feature-wise similarities in natural images. Some recent works have successfully leveraged this intrinsic feature correlation by exploring non-local attention modules. However, none of the current deep models have studied another inherent property of images: cross-scale feature correlation. In this paper, we propose the first Cross-Scale Non-Local (CS-NL) attention module with integration into a recurrent neural network. By combining the new CS-NL prior with local and in-scale non-local priors in a powerful recurrent fusion cell, we can find more cross-scale feature correlations within a single low-resolution (LR) image. The performance of SISR is significantly improved by exhaustively integrating all possible priors. Extensive experiments demonstrate the effectiveness of the proposed CS-NL module by setting new state-of-the-arts on multiple SISR benchmarks. 

相關文章
相關標籤/搜索