Image Super-Resolution via Sparse Representation——基於稀疏表示的超分辨率重建

時間 2021-03-05

標籤 html python 算法 vim 網絡 app dom 性能學習測試欄目 HTML 简体版

原文原文鏈接

　　經典超分辨率重建論文，基於稀疏表示。下面首先介紹稀疏表示，而後介紹論文的基本思想和算法優化過程，最後使用python進行實驗。html

稀疏表示

　　稀疏表示是指，使用過完備字典中少許向量的線性組合來表示某個元素。過完備字典是一個列數大於行數的行滿秩矩陣，也就是說，它的列向量有無數種線性組合來表達列向量空間中的任意點。因爲它的列數一般遠大於行數，可使用佔比很小的列向量來表示特定的向量，咱們稱這種表示爲稀疏表示。python

　　那麼如何得到這個字典呢？它在特定的任務下有特定的取值。和煉丹相似，咱們先要用大量數據來訓練這個矩陣，讓它提取出能稀疏表示這些數據的特徵，進而擁有稀疏表示其它類似數據的能力。算法

　　訓練過完備字典的過程稱爲稀疏編碼。設訓練數據集爲矩陣$X=(x_1,x_2,...,x_n)\in R^{m\times n}$，待訓練矩陣爲$A\in R^{m\times K}$，矩陣對每一數據的表示權重爲$\alpha = (\alpha_1,\alpha_2,...,\alpha_n)\in R^{K\times n}$。進行以下優化：vim

\begin{align} \min\limits_{A,\alpha}\|\alpha\|_0\;\;\;s.t. \; \|A\alpha - X\|_2^2\le \epsilon \end{align}網絡

　　容易理解，約束重建向量與原始向量差別的同時，優化表示權重$\alpha$的稀疏性。經過優化得到所需的過完備字典後，咱們就能夠用它來稀疏表示新的數據了。對於某個向量$y$，咱們能夠進行相似的優化來得到它在字典中的稀疏表示：app

\begin{align} \min\limits_{\alpha}\|\alpha\|_0\;\;\;s.t. \; \|A\alpha - y\|_2^2\le \epsilon \end{align}dom

　　由於零範數的離散性，以上優化是NP-難問題，沒法用正常的優化算法解決。因此一般會對上式進行調整，以方便優化。而在線性約束下，對一範數的優化有稀疏性（點擊連接理解），所以能夠轉換爲對一範數的優化。而後根據拉格朗日乘子法（聽說如此，但我以爲這個轉換並不等價），不等式約束能夠移入優化中：性能

\begin{align} \min\limits_{\alpha} \lambda \|\alpha\|_1 + \|A\alpha - y\|_2^2 \end{align}學習

　　一樣，$(1)$式也能夠進行相似的轉換。測試

　　以上就是稀疏表示的流程，看到這個提取特徵而後重建的過程，咱們會聯想到主成分分析（PCA）。PCA能使咱們方便地找到一組「完備」基向量，可是這裏咱們要作的是找到一組「過完備」的基向量來表示輸入向量。過完備基的好處是它們能更有效地找出隱含在輸入數據內部的結構與模式。然而，與PCA不一樣，對於過完備基來講，係數$\alpha$再也不由輸入向量$y$單獨肯定。所以，在稀疏編碼算法中，咱們另加一個評判標準「稀疏性」來解決因過完備而致使的退化（degeneracy）問題。

　　上面這段是百度百科原話。我以爲，把過完備字典與神經網絡進行對比，能夠把這個待訓練的很「寬」的矩陣看作參數量很大的網絡。咱們知道參數量大而訓練數據不充足的時候模型很容易過擬合，爲了防止過擬合就要加上正則項，以使參數能專一於學習更有共性的特徵。咱們能夠把上面的稀疏性看作正則化來理解，使字典的列向量能表達一些更有「特色」的信息。

論文原理及實現流程

基本思想

　　在訓練階段，論文同時對LR訓練集$Y= (y_1,y_2,...,y_n)$和對應的HR訓練集$X = (x_1,x_2,...,x_n)$分別訓練兩個過完備字典$D_l,D_h$，使得LR數據$y_i$和它對應的HR數據$x_i$能以相同的稀疏編碼$\alpha_i$分別被$D_l$和$D_h$表示。也就是

$\left\{ \begin{aligned} &D_l\alpha_i \approx y_i \\ &D_h\alpha_i \approx x_i \end{aligned} \right.$

　　在測試階段，咱們已經有了訓練好的$D_l$和相對應的$D_h$。對於測試圖像$y_t$，首先經過優化得到$y_t$在$D_l$中的稀疏表示$\alpha_t$，此時有$D_l\alpha_t \approx y_t$。而後用這個表示經過$D_h$映射出對應的SR圖像，即$\hat{x}_t=D_h\alpha_t$。

訓練過程

　　訓練過程就是訓練上述的過完備字典對。由於性能的因素，咱們不可能直接對整張圖進行稀疏編碼，論文是將圖像分爲方形的區塊（patch）進行編碼的。所以，用於訓練的成對數據不是整張的LR-HR圖像對，而是全部圖像分割開來的區塊對。如今把LR訓練集的全部區塊表示爲$Y=(y_1,y_2,...,y_n)\in R^{M\times n}$，相應的HR訓練集區塊表示爲$X = (x_1,x_2,...,x_n)\in R^{N\times n}$。若是放大倍數爲$t$倍，則有$N=t^2M$。

　　優化式很直觀：

\begin{align} \min\limits_{D_l,D_h,\alpha}\frac{1}{N}\|X - D_h\alpha\|_2^2+\frac{1}{M}\|Y - D_l\alpha\|_2^2 + \lambda \|\alpha\|_1 \end{align}

　　其中$D_l\in R^{M\times K},D_h\in R^{N\times K}$，分別表示待訓練的LR和HR字典，$K$表示字典的原子數；$\alpha\in R^{K\times n}$爲表示矩陣；$\lambda$爲平衡稀疏性和重建一致性的係數。一二兩項懲罰使用相同表示的重建差別，第三項用來優化表示的稀疏性。把前兩項合併，可得：

\begin{aligned} \min\limits_{D_c,\alpha}\|X_c - D_c\alpha\|_2^2 + \lambda \|\alpha\|_1 \end{aligned}

　　其中

$X_c = \left[ \begin{aligned} &\frac{1}{\sqrt{N}}X\\ &\frac{1}{\sqrt{M}}Y \end{aligned} \right], D_c = \left[ \begin{aligned} &\frac{1}{\sqrt{N}}D_h\\ &\frac{1}{\sqrt{M}}D_l \end{aligned} \right]$

　　論文說同時優化$D_c$和$\alpha$非凸，可是固定其中一個變量，而後對另外一個變量的優化則是凸的。所以能夠將凸優化交替進行，最終能夠達到一個局部最優勢。然而我仍是選擇無腦梯度降低算了。固然咱們可使用SGD，每次隨機選擇部分區塊執行迭代，這樣作的好處在於能夠引入隨機性從而增長跳出局部最優的可能性。引人注意的是，執行SGD時，待優化的權重是$D_c$和$\alpha$，但因爲每次只選擇部分區塊，$\alpha$也只能選擇對應的那部分進行更新，這有點像Dropout。可是，後面實驗的時候發現，SGD的效果並無GD好，所以沒有分紅小批量來迭代。

推理過程

　　在得到$D_l$和$D_h$後，就能夠用它們對LR圖像進行重建了。論文采用掃描的方式，一個區塊一個區塊從上到下、從左到右對測試圖像進行超分辨率重建。另外，爲了使相鄰區塊之間能相互匹配，防止顏色上的衝突，先後兩個區塊之間設有重疊部分，在重建一個區塊時，重疊部分要和上一個區塊一致。具體優化方式以下。

　　首先將測試圖像$y$按順序劃分爲$m$個區塊$(p_1,p_2,...,p_m)$，設區塊在$D_l$中的表示爲$(\alpha_1,\alpha_2,...,\alpha_m)$。按順序對全部區塊執行優化，對於第$i$個區塊，優化式以下：

\begin{align} \min\limits_{\alpha_i} \lambda\|\alpha_i\|_1 + \|FD_l\alpha_i - Fp_i\|_2^2+ \|PD_h\alpha_i - w\|_2^2 \end{align}

　　其中$w$表示已重建區塊和當前區塊的重疊部分，$P$表示將當前區塊映射爲重疊部分的矩陣。至於$F$，論文說是一個線性提取器，使圖像中感知效果更好。論文實驗時用的是一階、二階導數濾波器，但沒說清楚具體如何操做，我實驗就沒有用。

　　式子意義很明顯，第一項保證表示的稀疏性，第二項優化原始LR圖像的重建一致性，第三項優化SR圖像相鄰區塊重疊部分的一致性。得到SR圖像全部區塊的稀疏表示後，左乘$D_h$，而後將區塊拼接起來，就是最終的SR圖像了。

　　除了以上步驟之外，論文還多了一個所謂全局重建約束。用什麼反向投影法，經過迭代讓SR圖像退化後能和原始圖像更類似。因爲說的很不清楚，這裏就不加了，並且我以爲這不是這篇論文的主要內容。

　　另外，論文在執行推理過程以前，先將原圖減去自身元素的均值，以使模型能更專一於紋理的重建，在重建完的SR圖像上再加回這個均值。可是這個策略只在推理章節提了一句，在訓練$D_l,D_h$時是否使用標準化的圖像並無說明。

實驗與分析

　　實驗使用LSUN數據集中的bedroom做爲訓練集，從中選取1024張長寬都大於256像素的圖片，居中裁剪至256x256，得到HR訓練集。而後對HR使用Bicubic縮小4倍至64x64，得到LR訓練集。將LR區塊劃分爲4x4大小，HR區塊劃分爲16x16大小，則每張圖片均可被劃分爲16x16個區塊。另外定義$D_l$和$D_h$的原子（列向量）數爲2560（算力有限），又因爲彩色圖片有三個通道，所以$D_l$和$D_h$的列向量長度分別爲$4\times 4 \times 3$和$16\times 16 \times 3$。

訓練

　　綜上，對於$(4)$式中的各個參數，有：

\begin{equation} \left\{ \begin{aligned} &X \in R^{(16\times 16\times 3)\times (1024\times 16^2)}\\ &Y \in R^{(4\times 4\times 3)\times (1024\times 16^2)}\\ &D_h \in R^{(16\times 16\times 3)\times 2560}\\ &D_l \in R^{(4\times 4\times 3)\times 2560}\\ &\alpha \in R^{2560\times (1024\times 16^2)}\\ \end{aligned} \right. \end{equation}

　　另設$(4)$式$\lambda=0.1$，使用RMSProp對$(4)$式進行優化，根據以上所列參數，Pytorch代碼以下：

#%%
import torch,os
import numpy as np
import matplotlib.pyplot as plt
from torch import optim,cuda

#讀取圖像
LR_path = r'E:\DataSets\SRTest\LR'
HR_path = r'E:\DataSets\SRTest\HR'
LR_imgs = np.zeros([1024,64,64,3])
HR_imgs = np.zeros([1024,256,256,3])

for i, j in zip(os.listdir(LR_path),range(1024)):
  img_path = os.path.join(LR_path,i)
  LR_imgs[j] = plt.imread(img_path)/255
for i, j in zip(os.listdir(HR_path),range(1024)):
  img_path = os.path.join(HR_path,i)
  HR_imgs[j] = plt.imread(img_path)/255
 
#定義各個變量
def imgs2patches(imgs, patch_size):
  #將圖像集轉換爲區塊集
  imgs_n = len(imgs)
  patch_n = int(imgs.shape[1]/patch_size)
  patches = np.zeros([imgs_n*patch_n**2, patch_size*patch_size*3]) 
  for i in range(patch_n): 
    for j in range(patch_n):
      t = imgs[:,i*patch_size:(i+1)*patch_size,j*patch_size:(j+1)*patch_size,:]
      t = np.reshape(t,[imgs_n,-1]) 
      now = i * patch_n + j
      patches[imgs_n*now:imgs_n*(now+1),:] = t
  return patches.T

atom_n = 2560

X = torch.tensor(imgs2patches(HR_imgs, 16),device='cuda')*255 #訓練集圖像元素色值取值在[0,255]
Y = torch.tensor(imgs2patches(LR_imgs, 4), device='cuda')*255
Dh = torch.normal(0,1,[16*16*3,atom_n],device='cuda') 
Dl = torch.normal(0,1,[4*4*3,atom_n],device='cuda') 
 
alpha = torch.normal(0,1,[atom_n,1024*16*16],device='cuda') 
Dh.requires_grad_(True)
Dl.requires_grad_(True)
alpha.requires_grad_(True)
opt = optim.RMSprop([Dh,Dl,alpha]) 
#%%
#訓練模型
from torch.utils.tensorboard import SummaryWriter 

writer = SummaryWriter('logs/')
def iter_one_epoch(lamb=0.01):
  patch_n = alpha.shape[1]
  term1 = torch.sum((X - torch.matmul(Dh,alpha))**2)/256/patch_n
  term2 = torch.sum((Y - torch.matmul(Dl,alpha))**2)/16/patch_n
  term3 = lamb * torch.sum(torch.abs(alpha))/patch_n
  loss = term1 + term2 + term3
  opt.zero_grad()
  loss.backward()
  opt.step()
  return term1,term2,term3,loss
 
for i in range(1, 1500): 
  term1,term2,term3,loss = iter_one_epoch(lamb=0.1)
  print(i,loss.cpu().detach().numpy())  
  writer.add_scalar('term1', term1, int(i))
  writer.add_scalar('term2', term2, int(i))
  writer.add_scalar('term3', term3, int(i))
  writer.add_scalar('loss', loss, int(i))
  if i % 700 == 0: 
    for i in opt.param_groups: 
      i['lr'] = i['lr']*0.5 
print("保存字典")
torch.save(Dl,'dictionaries/Dic_LR')#保存兩個字典 
torch.save(Dh,'dictionaries/Dic_HR') 
#%%
#用Dh重建HR圖像驗證訓練結果
def get_recon_LR_HR(n): 
  print(Dh.shape,Dl.shape)
 
  recon_LR = torch.matmul(Dl, alpha)
  recon_HR = torch.matmul(Dh, alpha) 
  LR = torch.zeros([1024,64,64,3],device='cuda')
  HR = torch.zeros([1024,256,256,3],device='cuda') 
  
  for i in range(n): 
    print(i)
    for j in range(16):
      for k in range(16): 
        LR[i,4*j:4*(j+1),4*k:4*(k+1),:] = recon_LR[:,i+(j*16+k)*1024].reshape([4,4,3])
        HR[i,16*j:16*(j+1),16*k:16*(k+1),:] = recon_HR[:,i+(j*16+k)*1024].reshape([16,16,3])
  return LR,HR 
lr,hr = get_recon_LR_HR(100) 
n = 10
fig = plt.figure(figsize=(15,15))
ax1,ax2,ax3,ax4 = fig.add_subplot(221),fig.add_subplot(222),fig.add_subplot(223),fig.add_subplot(224)
ax1.imshow(LR_imgs[n])
ax2.imshow(lr[n].cpu().detach()/255)
ax3.imshow(HR_imgs[n])
ax4.imshow(hr[n].cpu().detach()/255)
ax1.set_title('LR image',fontsize=20)
ax2.set_title('Reconstructed LR image',fontsize=20)
ax3.set_title('HR image',fontsize=20)
ax4.set_title('Reconstructed HR image',fontsize=20)
plt.show()

　　我另外對比過Adam和原始GD，迭代速度都沒有RMSProp快，並且loss在穩定後是最小的。算法總共迭代了1500次，使用的是RMSProp默認的學習率，但每700次迭代都會下調爲原來的一半。整個迭代在3090下用時10分鐘，loss變化以下：

　　如下是訓練集的LR和它對應的HR圖像的重建效果，幾乎看不出差別：

推理

順序優化區塊

　　論文的推理策略是按順序重建SR圖像的各個區塊，同時約束相鄰區塊的重疊部分的類似性。定義LR圖像相鄰區塊之間的重疊爲1像素寬，則相應SR圖像相鄰區塊之間有4像素寬的重疊，而且圖像能分紅21x21個區塊。則$(5)$式各個變量的規模以下

\begin{equation} \left\{ \begin{aligned} &\alpha_{ij}\in R^{2560\times 1},\;\;i,j=1,2,...,21\\ &p_{ij}\in R^{(4\times 4\times 3) \times 1},\;\;i,j=1,2,...,21\\ &D_l\in R^{(4\times 4\times 3) \times2560}\\ &D_h\in R^{(16\times 16\times 3) \times 2560}\\ \end{aligned} \right. \end{equation}

　　優化代碼以下：

#%%
import torch
import matplotlib.pyplot as plt
from torch import optim
from torch import random 
from torch.nn import functional as F


path_lr = r'E:\DataSets\SRTest\TestImages\LR\0003.jpg'
path_hr = r'E:\DataSets\SRTest\TestImages\HR\0003.jpg'
img_lr = plt.imread(path_lr) 
img_hr = plt.imread(path_hr) 

Dl = torch.load('dictionaries/Dic_LR').reshape([4,4,3,2560])
Dh = torch.load('dictionaries/Dic_HR').reshape([16,16,3,2560])   
def img_SR(LR_img, lambda1=0.5,lambda2=1,lambda3=1,lambda4=1,epoch=100):
  '''
  LR_img取值須在[0,255],形狀爲[64,64,3]
  ''' 
  LR_img = torch.tensor(LR_img,device='cuda',requires_grad=False) 
  SR_img = torch.zeros([256,256,3],device='cuda',requires_grad=False) 
  
  alpha_array = [] 
  for i in range(21):
    al = []
    for j in range(21): 
      al.append(torch.normal(0,1,[2560],device='cuda',requires_grad=True)) 
    alpha_array.append(al)
  
  def SRcompat_loss(patch, i, j):
    loss = 0
    if i > 0:
      loss += torch.mean(torch.abs(SR_img[12*i:12*i+4,12*j:12*j+16] - patch[:4]))
    if j > 0:
      loss += torch.mean(torch.abs(SR_img[12*i:12*i+16,12*j:12*j+4] - patch[:,:4]))
    return loss

  #按順序計算SR各個區塊  
  for i in range(21):
    for j in range(21): 
      alpha = alpha_array[i][j]
      opt = optim.RMSprop([alpha])
      for k in range(1, epoch): 
        L1 = torch.mean(torch.abs(alpha))
        L2 = torch.mean(torch.abs(torch.matmul(Dl,alpha) - LR_img[i*3:i*3+4,j*3:j*3+4]))
        L3 = SRcompat_loss(torch.matmul(Dh,alpha), i, j)
        down_SR = F.interpolate(torch.matmul(Dh,alpha).reshape([1,16,16,3]).permute([0,3,1,2]),size=[4,4], mode='bicubic')
        L4 = torch.mean(torch.abs(down_SR[0].permute([1,2,0]) - LR_img[i*3:i*3+4,j*3:j*3+4]))#額外的下采樣一致性
        Loss = lambda1 * L1 + lambda2 * L2 + lambda3 * L3 + lambda4 * L4
        if k%800 ==0:
          print(k,Loss)
          for l in opt.param_groups:  
            l['lr'] *= 0.5
        opt.zero_grad()
        Loss.backward()
        opt.step()
        if Loss < 5:
          print(k,Loss)
          break 
      SR_img[12*i:12*i+16,12*j:12*j+16] = torch.matmul(Dh,alpha).detach()
    plt.imshow(SR_img.detach().cpu()/255)
    plt.show()
  return SR_img

img_sr = img_SR(img_lr,0.01,1,1,1,5000).cpu()/255   
fig = plt.figure(figsize=(15,15))
ax1,ax2,ax3 = fig.add_subplot(131),fig.add_subplot(132),fig.add_subplot(133)
ax1.imshow(img_lr),ax1.set_title('LR',fontsize=20)
ax2.imshow(img_sr),ax2.set_title('SR',fontsize=20)
ax3.imshow(img_hr),ax3.set_title('HR',fontsize=20)
plt.show()

　　迭代了很久，結果以下：

　　噪聲不少，緣由應該就是沒有用論文中提到的使用反向投影法進行精煉，以及使用$F$操做吧。

另外一種推理方式

　　因爲上述優化有前後順序，後面的區塊可能會損失精讀來「遷就」前面已得到的SR區塊，讓重疊部分一致。所以論文在完成這一步後又加了一個全局一致性的約束，來調整已得到的SR圖像，就是用所謂的反向投影法，可是寫得很不明白。所以，我試驗了一種直接對全部區塊同時進行優化的方法。

　　首先定義LR圖像相鄰區塊的重疊爲2像素寬，也就是4x4區塊的一半。則相應的SR圖像的相鄰區塊重疊爲8像素寬，一樣佔其區塊的一半。如此一來，SR相鄰區塊的兼容性能夠經過建立兩個「補丁」圖來約束，以下圖所示：

　　即同時優化三個SR圖像，第一張是最終的SR結果$x$，第二張用於約束$x$橫向相鄰區塊之間的匹配度，第三張用於約束$x$縱向相鄰區塊之間的匹配度。也就是說，第一張圖像相鄰區塊各取一半拼接成的圖塊要與第2、三圖像中對應的區塊一致。

　　綜上，對於測試LR圖像$y$，分別去掉左右、上下邊緣的2像素寬、高的圖塊，得到用於匹配約束的$y_r,y_c$。而後分別定義相應的表示$\alpha,\alpha_r,\alpha_c$。根據以上定義，各個參數的規模以下（LR圖像區塊被展開爲向量的形式）：

\begin{equation} \left\{ \begin{aligned} &\alpha\in R^{2560\times (16\times 16)}\\ &\alpha_r\in R^{2560\times (16\times 15)}\\ &\alpha_c\in R^{2560\times (15\times 16)}\\ &y\in R^{(4\times 4\times 3) \times(16\times 16)}\\ &y_r\in R^{(4\times 4\times 3) \times(16\times 15)}\\ &y_c\in R^{(4\times 4\times 3) \times(15\times 16)}\\ &D_l\in R^{(4\times 4\times 3) \times2560}\\ &D_h\in R^{(16\times 16\times 3) \times 2560}\\ \end{aligned} \right. \end{equation}

　　則稀疏性約束、重建一致性約束、區塊匹配度約束和最終的優化式以下：

\begin{equation} \begin{aligned} &L_{spars} = \|\alpha\|_1+\|\alpha_r\|_1+\|\alpha_c\|_1 \\ &L_{recon} = \|D_l\alpha - y\|_1+\|D_l\alpha_r - y_r\|_1+\|D_l\alpha_c - y_c\|_1 \\ &L_{comp} = \|P_1D_h\alpha - D_h\alpha_r\|_1+\|P_2D_h\alpha - D_h\alpha_c\|_1\\ &\min\limits_{\alpha,\alpha_r,\alpha_c}Loss = \lambda_1 L_{spars}+\lambda_2L_{recon}+\lambda_3L_{comp} \end{aligned} \end{equation}

　　$P_1,P_2$表示將SR圖像映射到相應的匹配約束圖像的操做，$\lambda_1,\lambda_2,\lambda_3$用於平衡三個約束的佔比。使用$L1$範數是由於它能生成噪聲更少的圖像。另外，爲了防止梯度過大，代碼中計算的各項範數會除以元素數量，公式中沒有標明。代碼以下：

#%%
import torch
import matplotlib.pyplot as plt
from torch import optim
from torch.utils.tensorboard import SummaryWriter 


path_lr = r'E:\DataSets\SRTest\TestImages\LR\0003.jpg'
path_hr = r'E:\DataSets\SRTest\TestImages\HR\0003.jpg'
img_lr = plt.imread(path_lr) 
img_hr = plt.imread(path_hr) 

Dl = torch.load('dictionaries/Dic_LR')
Dh = torch.load('dictionaries/Dic_HR')
def img_SR(LR_img, lambda1=0.5,lambda2=1,lambda3=1,epoch=100):
  '''
  LR_img取值須在[0,255],形狀爲[64,64,3]
  ''' 
  LR_img = torch.tensor(LR_img,device='cuda',dtype=torch.float32)  
  LR_img_r = LR_img[:,2:-2]
  LR_img_c = LR_img[2:-2,:] 
  def img2patches(img):  
    patch_r = int(img.shape[0]/4)
    patch_c = int(img.shape[1]/4)
    patches = torch.zeros([4*4*3, patch_r*patch_c],device='cuda') 
    for i in range(patch_r): 
      for j in range(patch_c): 
        patches[:,i * patch_c + j] = torch.flatten(img[i*4:(i+1)*4,j*4:(j+1)*4])  
    return patches
  def patches2img(patches,row,col,ps=16):
    img = torch.zeros([row*ps, col*ps, 3],device='cuda')
    for i in range(row):
      for j in range(col):
        img[i*ps:(i+1)*ps, j*ps:(j+1)*ps] = patches[:,i*col+j].reshape([ps,ps,3])
    return img
   
  alpha = torch.normal(0,1,[2560,16*16],requires_grad=True,device='cuda') 
  alpha_r = torch.normal(0,1,[2560,16*15],requires_grad=True,device='cuda') 
  alpha_c = torch.normal(0,1,[2560,15*16],requires_grad=True,device='cuda') 
  y = img2patches(LR_img)
  y_r = img2patches(LR_img_r)
  y_c = img2patches(LR_img_c) 
 
  opt = optim.RMSprop([alpha,alpha_r,alpha_c])

  writer = SummaryWriter('InferLogs2/')
  for i in range(1, epoch):
    l_alpha = torch.mean(torch.abs(alpha))
    l_alpha_r = torch.mean(torch.abs(alpha_r))
    l_alpha_c = torch.mean(torch.abs(alpha_c))
    L1 = l_alpha + l_alpha_r + l_alpha_c

    l_rec1 = torch.mean(torch.abs(torch.matmul(Dl,alpha)-y))
    l_rec2 = torch.mean(torch.abs(torch.matmul(Dl,alpha_r)-y_r))
    l_rec3 = torch.mean(torch.abs(torch.matmul(Dl,alpha_c)-y_c))
    L2 = l_rec1 + l_rec2 + l_rec3

    l_comp1 = torch.mean(torch.abs(patches2img(torch.matmul(Dh,alpha),16,16,16)[:,8:-8] - patches2img(torch.matmul(Dh,alpha_r),16,15,16)))
    l_comp2 = torch.mean(torch.abs(patches2img(torch.matmul(Dh,alpha),16,16,16)[8:-8,:] - patches2img(torch.matmul(Dh,alpha_c),15,16,16)))
    L3 = l_comp1 + l_comp2

    Loss = lambda1 * L1 + lambda2 * L2 + lambda3 * L3

    opt.zero_grad()
    Loss.backward()
    opt.step()

    writer.add_scalar('L1',L1,i)
    writer.add_scalar('L2',L2,i)
    writer.add_scalar('L3',L3,i)
    writer.add_scalar('Loss',Loss,i)
    if i % 50 == 0:
      print(i, Loss)
      plt.imshow(patches2img(torch.matmul(Dh,alpha),16,16,16).detach().cpu()/255)
      plt.show()
      plt.imshow(patches2img(torch.matmul(Dl,alpha),16,16,4).detach().cpu()/255)
      plt.show()
    if i % 300 == 0:
      for i in opt.param_groups:
        i['lr'] *= 0.5 

  return patches2img(torch.matmul(Dl,alpha),16,16,4),patches2img(torch.matmul(Dh,alpha),16,16,16)
recon_LR_img,SR_img = img_SR(img_lr,100,1,1,epoch=1500) 
fig = plt.figure(figsize=(15,15))
ax1,ax2,ax3,ax4 = fig.add_subplot(221),fig.add_subplot(222),fig.add_subplot(223),fig.add_subplot(224)
ax1.imshow(recon_LR_img.detach().cpu()/255),ax1.set_title('Reconstructed LR',fontsize=20)
ax2.imshow(SR_img.detach().cpu()/255),ax2.set_title('SR',fontsize=20)
ax3.imshow(img_lr),ax3.set_title('LR',fontsize=20)
ax4.imshow(img_hr),ax4.set_title('HR',fontsize=20)
plt.show()

　　不過實際效果也並非很好：

總結

　　綜上，單純使用稀疏表示作SR效果並不如人意。由於論文中還加了其它的方式和技巧來減小重建圖像的噪聲，而我這個實驗沒有加入，而且，論文的實驗是放大3倍，區塊大小爲3x3，我這裏與其並不相同，因此沒能重現出論文的效果。另外，還多是優化算法的緣故，論文使用的是凸優化（可能有某種方式算出解析解，但我看到L1範數就放棄了），我則是梯度降低。

　　主要仍是不想再作了。。。論文做者沒有給出源代碼，本身敲代碼加寫博客用了4天時間，想看其它論文了。