[TOC]python
論文連接 Generative Adversarial Netsgit
問題:數據x分佈爲 $P_{data}(x)$,有樣本{${x_1,x_2,...,x_m}$}。如今咱們有生成器 $G$ ,但願生成器 $G$生成這些樣本的機率最大。似然是github
$L = \sum_{i=1}^{m}P_G{(x_i;\theta)}$, $\theta$爲G的參數。算法
極大似然估計:$\theta^* = arg\ \underset{\theta}{max}\prod_{i=1}^{m}P_G({x_i;\theta})$dom
第四行假設樣本獨立同分布,$m$越大越好。ide
第五行加了一個與$\theta$無關的項,至關於$D(p||q) = H(p,q) - H(p)$,其實不必。函數
這樣當$P_{data}(x)=P_G (x)$ 時,似然最大。學習
接下來GAN登場,
ui
固定$G$,求得最好的$D$spa
因此當 $P_{data}=P_{G}$時,$G$最優,原來的極大似然估計,轉變成了GAN
上面都是理論,明白就行。
具體算法:
可是 $log(1-D(x))$在$D(x)=0$處太平滑,$D(x)$接近1時反而太大,接近剛開始$G$很弱,$D(G(z))$很小,用
$-log(D(x))$代替正好合適。
很直觀的描述訓練過程
#Pytorch 實現loss adversarial_loss=torch.nn.BCELoss() g_loss = adversarial_loss(discriminator(gen_imgs), valid) # valid全1序列,fake全0序列 real_loss = adversarial_loss(discriminator(real_imgs), valid) fake_loss = adversarial_loss(discriminator(gen_imgs.detach()), fake) #detach()在這裏無論 d_loss = (real_loss + fake_loss) / 2
BCELoss: $$\ell(x, y) = mean(L) = mean({l_1,\dots,l_N}^\top), \quad l_n = - w_n \left[ y_n \cdot \log x_n + (1 - y_n) \cdot \log (1 - x_n) \right]$$
可以控制生成類別,
G第一層:self.label_emb = nn.Embedding(opt.n_classes, opt.latent_dim)
使用embedding的方法。
$D$ is trained to maximize $L_S + L_C$ while $G$ is trained to maximize $L_C − L_S$. AC-GANs learn a representation for $z$ that is independent of class label
validity, pred_label = discriminator(gen_imgs) g_loss = 0.5 * (adversarial_loss(validity, valid) + auxiliary_loss(pred_label, gen_labels))
real_pred, real_aux = discriminator(real_imgs) d_real_loss=(adversarial_loss(real_pred, valid) + auxiliary_loss(real_aux, labels))/2 # Loss for fake images fake_pred, fake_aux = discriminator(gen_imgs.detach()) d_fake_loss=(adversarial_loss(fake_pred, fake) + auxiliary_loss(fake_aux, gen_labels))/2 # Total discriminator loss d_loss = (d_real_loss + d_fake_loss) / 2
CrossEntropyLoss: $$ \text{loss}(x, class) = -\log\left(\frac{\exp(x[class])}{\sum_j \exp(x[j])}\right) = -x[class] + \log\left(\sum_j \exp(x[j])\right)$$
encoded_imgs = encoder(real_imgs) decoded_imgs = decoder(encoded_imgs) # Loss measures generator's ability to fool the discriminator g_loss = 0.001 * adversarial_loss(discriminator(encoded_imgs), valid) + 0.999 * pixelwise_loss(decoded_imgs, real_imgs)
z = Variable(Tensor(np.random.normal(0, 1, (imgs.shape[0], opt.latent_dim)))) # Measure discriminator's ability to classify real from generated samples real_loss = adversarial_loss(discriminator(z), valid) fake_loss = adversarial_loss(discriminator(encoded_imgs.detach()), fake) d_loss = 0.5 * (real_loss + fake_loss)
def boundary_seeking_loss(y_pred, y_true): """ Boundary seeking loss. Reference: https://wiseodd.github.io/techblog/2017/03/07/boundary-seeking-gan/ """ return 0.5 * torch.mean((torch.log(y_pred) - torch.log(1 - y_pred)) ** 2) g_loss = boundary_seeking_loss(discriminator(gen_imgs), valid)
適用於離散數據
BEGAN: Boundary Equilibrium Generative Adversarial Networks
兩個貢獻:
1.使用autoencoder做爲D
$L : R^{N_x} \to R^+$ the loss for training a pixel-wise autoencoder as:
使用Wasserstein loss來衡量real_loss和fake_loss分佈之間的差距
選擇上面的b由於,一個好的D是對real友好的。
g_loss = torch.mean(torch.abs(discriminator(gen_imgs) - gen_imgs))
2.使用了 Equilibrium
收斂的時候咱們但願$E(L(x))=E(L(G(z)))$,可是咱們能夠relax一下這個條件
$\gamma = \frac{E(L(G(z))}{E(L(x))}$ ,$\gamma\in[0,1]$
(由於G不強,因此autoencoder能夠輕鬆模擬G生成的圖像,$L(G(z))$很小)
用了Proportional Control Theory使得$E [L(G(z))] = γE [L(x)]$
$M_{global} = L(x) + |γL(x) − L(G(zG))|$用來衡量是否收斂
d_real = discriminator(real_imgs) d_fake = discriminator(gen_imgs.detach()) d_loss_real = torch.mean(torch.abs(d_real - real_imgs)) d_loss_fake = torch.mean(torch.abs(d_fake - gen_imgs.detach())) d_loss = d_loss_real - k * d_loss_fake diff = torch.mean(gamma * d_loss_real - d_loss_fake) # Update weight term for fake samples k = k + lambda_k * diff.item() k = min(max(k, 0), 1) # Constraint to interval [0, 1] # Update convergence metric M = (d_loss_real + torch.abs(diff)).item()
訓練完成後,使用G,給定A,微調z就能夠生成不一樣的圖像。
沒什麼重點,訓練的時候注意一下更新參數的順序就好了
利用離散連續混合採樣,平衡聚類和插值。
損失函數$q(x)$ 是可使$q(x)=log(x)$或者是$q(x)=x$(WGAN)
D須要y的輸入,由於D須要知道這個條件,不然G能夠生成隨便的高質量的圖片
$\underset{G}{Min} \underset{D}{Min}V(D, G) = E_{x∼ p_{data}(x)}[log D(x|y)] + E_{z∼p_z(z)}[log(1 − D(G(z|y)))]. $
G loss
z = Variable(FloatTensor(np.random.normal(0, 1, (batch_size, opt.latent_dim)))) gen_labels = Variable(LongTensor(np.random.randint(0, opt.n_classes, batch_size))) # Generate a batch of images gen_imgs = generator(z, gen_labels) # Loss measures generator's ability to fool the discriminator validity = discriminator(gen_imgs, gen_labels) g_loss = adversarial_loss(validity, valid)
D loss
validity_real = discriminator(real_imgs, labels) d_real_loss = adversarial_loss(validity_real, valid) # Loss for fake images validity_fake = discriminator(gen_imgs.detach(), gen_labels) d_fake_loss = adversarial_loss(validity_fake, fake) # Total discriminator loss d_loss = (d_real_loss + d_fake_loss) / 2
SEMI-SUPERVISED LEARNING WITH CONTEXT-CONDITIONAL GENERATIVE ADVERSARIAL NETWORKS
第二個版本,更加劇視對fake的判斷
第三個版本
半監督學習的思想:
將D看作是一個分類器,(x,y)帶標籤的數據正常分類,假的生成的fake看做是第k+1類,真的image無標籤,判斷其不是第k+1類的機率
跟上面一篇基本同樣,沒什麼可說的
經過參數共享,經過兩個邊緣分佈,就能夠學習到兩個分佈的聯合分佈
另外一種形式,也能夠用下面的形式,或者,加參數共享,cycle,等保證中間的latent-space分佈一致
# Set model input real_A = Variable(batch["A"].type(Tensor)) real_B = Variable(batch["B"].type(Tensor)) # Adversarial ground truths valid = Variable(Tensor(np.ones((real_A.size(0), *D_A.output_shape))), requires_grad=False) fake = Variable(Tensor(np.zeros((real_A.size(0), *D_A.output_shape))), requires_grad=False) # ------------------ # Train Generators # ------------------ G_AB.train() G_BA.train() optimizer_G.zero_grad() # Identity loss loss_id_A = criterion_identity(G_BA(real_A), real_A) loss_id_B = criterion_identity(G_AB(real_B), real_B) discriminator_loss loss_identity = (loss_id_A + loss_id_B) / 2 # GAN loss fake_B = G_AB(real_A) loss_GAN_AB = criterion_GAN(D_B(fake_B), valid) fake_A = G_BA(real_B) loss_GAN_BA = criterion_GAN(D_A(fake_A), valid) loss_GAN = (loss_GAN_AB + loss_GAN_BA) / 2 # Cycle loss recov_A = G_BA(fake_B) loss_cycle_A = criterion_cycle(recov_A, real_A) recov_B = G_AB(fake_A) loss_cycle_B = criterion_cycle(recov_B, real_B) loss_cycle = (loss_cycle_A + loss_cycle_B) / 2 # Total loss loss_G = loss_GAN + opt.lambda_cyc * loss_cycle + opt.lambda_id * loss_identity loss_G.backward() optimizer_G.step()
G的loss分三種,傳統的foolD的loss,重建的loss,encoder的loss,保證域變換
跟cyclegan一個東西就是用了兩個D
跟DiscoGAN如出一轍
ENERGY-BASED GENERATIVE ADVERSARIAL NETWORKS
$[·]^+= max(0, ·).$
當G足夠好的時候,就不使用D(G(z))產生的loss了
$EBGAN-PT$,$L_G(z)=D_{img}(G(z))+pullaway(D_{embedding}(G(z)))$
keeping the model from producing samples that are clustered in one or only few modes of pdata.
ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks
生成器loss的三部分,知覺損失,perceptual loss使用vgg前35層產生的隱藏變量(features before the activation layers 信息更多),衡量fake和real之間的差距。第二個是fool G的loss,第三個是fake和real之間的l1loss.
相似acgan
最大化互信息c,和x使得能夠經過控制c來控制生成的圖片
使用輔助分佈,$H(c)$能夠看作是常數
損失函數把交叉熵損失函數改成了最小二乘loss.好處讓G儘量靠近decision boundary
BGAN只對G使用了最小二乘loss,LSGAN對G和D都用了。
好像就是cyclegan的擴展,多個model多個G,D......
The relativistic discriminator: a key element missing from standard GAN
Semi-Supervised Learning with Generative Adversarial Networks
D是一個分類器,fake是N+1類數據
跟cogan同樣
GAN的問題
判別器越好,生成器梯度消失越嚴重
定理:$p_{data}$與$p_g$的支撐集是高維空間的低維流形(manifold)時,$p_{data}$與$p_g$重疊部分的測度(measure)爲0的機率爲1
改變的第二種loss不合理
由公式7能夠看出來,kl,和js都是衡量分佈距離,一正一負,糾結
KL散度是非對稱的,$p_g\rightarrow 0,p_{data}\rightarrow 1$時,$KL(p_g||p_{data})\rightarrow 0$,反過來倒是$KL(p_g||p_{data}) \rightarrow \infty$,直觀上理解就是,當生成錯誤樣本時,懲罰是巨大的;可是沒生成真實樣本的懲罰卻很小,這樣會致使GAN會產生一些重複且懲罰低的樣本,而不會產生多樣性的樣本,致使懲罰很高
用wassertein距離代替kl divergence.
挖了一個L約束的坑,論文中的weight-clip的方法,就是湊上去的。
使用了GP的方法
不能在全部樣本空間採樣計算D(x)的梯度,就用了一個真實樣本到採樣樣本之間插值這樣一個區間來採樣,進行約束,約束讓梯度越接近1越好,實現結果表示很好,缺少理論支撐。一樣存才問題