深度有趣 | 18 二次元頭像生成

簡介

和CelebA相似,使用ACGAN生成二次元頭像,而且用多個條件進行控制git

數據

圖片爬取自getchu.com/,是一個日本二次元遊戲網站,包含大量遊戲人物立繪,共爬取31,970張github

二次元頭像示例

頭像截取

以前介紹的dlib可用於提取人臉,但不適用於二次元頭像json

使用OpenCV從每張圖片中截取頭像部分,用到如下項目,github.com/nagadomi/lb…bash

對於檢測結果適當放大範圍,以包含更多人物細節服務器

# -*- coding: utf-8 -*-

import cv2

cascade = cv2.CascadeClassifier('lbpcascade_animeface.xml')

image = cv2.imread('imgs/二次元頭像示例.jpg')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
gray = cv2.equalizeHist(gray)
faces = cascade.detectMultiScale(gray, scaleFactor=1.1, minNeighbors=5, minSize=(64, 64))
for i, (x, y, w, h) in enumerate(faces):
    cx = x + w // 2
    cy = y + h // 2
    x0 = cx - int(0.75 * w)
    x1 = cx + int(0.75 * w)
    y0 = cy - int(0.75 * h)
    y1 = cy + int(0.75 * h)
    if x0 < 0:
        x0 = 0
    if y0 < 0:
        y0 = 0
    if x1 >= image.shape[1]:
        x1 = image.shape[1] - 1
    if y1 >= image.shape[0]:
        y1 = image.shape[0] - 1
    w = x1 - x0
    h = y1 - y0
    if w > h:
        x0 = x0 + w // 2 - h // 2
        x1 = x1 - w // 2 + h // 2
        w = h
    else:
        y0 = y0 + h // 2 - w // 2
        y1 = y1 - h // 2 + w // 2
        h = w
    face = image[y0: y0 + h, x0: x0 + w, :]
    face = cv2.resize(face, (128, 128))
    cv2.imwrite('face_%d.jpg' % i, face)
複製代碼

二次元頭像截取結果示例

標籤提取

使用Illustration2Vec從二次元圖片中抽取豐富的標籤,github.com/rezoo/illus…網絡

Illustration2Vec用到chainer這個深度學習框架,以及一些其餘庫,若是沒有則安裝app

pip install chainer Pillow scikit-image
複製代碼

Illustration2Vec能夠完成如下三項功能:框架

  • 將每張圖片表示爲一個4096維的向量
  • 指定閾值,並提取機率高於閾值的標籤
  • 指定一些標籤,並返回對應的機率

舉個例子,提取所有可能的標籤,以0.5爲閾值dom

# -*- coding: utf-8 -*-

import i2v
from imageio import imread

illust2vec = i2v.make_i2v_with_chainer('illust2vec_tag_ver200.caffemodel', 'tag_list.json')

img = imread('imgs/二次元頭像示例.jpg')
tags = illust2vec.estimate_plausible_tags([img], threshold=0.5)
print(tags)
tags = illust2vec.estimate_specific_tags([img], ['blue eyes', 'red hair'])
print(tags)
複製代碼

也能夠指定標籤並獲取對應的機率ide

[{'blue eyes': 0.9488178491592407, 'red hair': 0.0025324225425720215}]
複製代碼

預處理

在服務器上處理所有圖片,即截取頭像、提取標籤

加載庫

# -*- coding: utf-8 -*-

import i2v
import cv2
import glob
import os
from imageio import imread
from tqdm import tqdm
import matplotlib.pyplot as plt
%matplotlib inline
import numpy as np
import pickle
複製代碼

讀取圖片路徑

images = glob.glob('characters/*.jpg')
print(len(images))
複製代碼

加載兩個模型

illust2vec = i2v.make_i2v_with_chainer('illust2vec_tag_ver200.caffemodel', 'tag_list.json')
cascade = cv2.CascadeClassifier('lbpcascade_animeface.xml')
OUTPUT_DIR = 'faces/'
if not os.path.exists(OUTPUT_DIR):
    os.mkdir(OUTPUT_DIR)
複製代碼

提取所有頭像,共檢測到27772張

num = 0
for x in tqdm(range(len(images))):
    img_path = images[x]
    image = cv2.imread(img_path)
    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    gray = cv2.equalizeHist(gray)
    faces = cascade.detectMultiScale(gray, scaleFactor=1.1, minNeighbors=5, minSize=(64, 64))
    for (x, y, w, h) in faces:
        cx = x + w // 2
        cy = y + h // 2
        x0 = cx - int(0.75 * w)
        x1 = cx + int(0.75 * w)
        y0 = cy - int(0.75 * h)
        y1 = cy + int(0.75 * h)
        if x0 < 0:
            x0 = 0
        if y0 < 0:
            y0 = 0
        if x1 >= image.shape[1]:
            x1 = image.shape[1] - 1
        if y1 >= image.shape[0]:
            y1 = image.shape[0] - 1
        w = x1 - x0
        h = y1 - y0
        if w > h:
            x0 = x0 + w // 2 - h // 2
            x1 = x1 - w // 2 + h // 2
            w = h
        else:
            y0 = y0 + h // 2 - w // 2
            y1 = y1 - h // 2 + w // 2
            h = w
        
        face = image[y0: y0 + h, x0: x0 + w, :]
        face = cv2.resize(face, (128, 128))
        cv2.imwrite(os.path.join(OUTPUT_DIR, '%d.jpg' % num), face)
        num += 1
print(num)
複製代碼

感興趣的標籤包括如下34個:

  • 13種頭髮顏色:blonde hair, brown hair, black hair, blue hair, pink hair, purple hair, green hair, red hair, silver hair, white hair, orange hair, aqua hair, grey hair
  • 5種髮型:long hair, short hair, twintails, drill hair, ponytail
  • 10種眼睛顏色:blue eyes, red eyes, brown eyes, green eyes, purple eyes, yellow eyes, pink eyes, aqua eyes, black eyes, orange eyes
  • 6種其餘屬性:blush, smile, open mouth, hat, ribbon, glasses

頭髮顏色、髮型和眼睛顏色取機率最高的一種,其餘屬性機率高於0.25則以存在處理

fw = open('face_tags.txt', 'w')
tags = ['blonde hair', 'brown hair', 'black hair', 'blue hair', 'pink hair', 'purple hair', 'green hair', 
        'red hair', 'silver hair', 'white hair', 'orange hair', 'aqua hair', 'grey hair',
        'long hair', 'short hair', 'twintails', 'drill hair', 'ponytail',
        'blue eyes', 'red eyes', 'brown eyes', 'green eyes', 'purple eyes', 'yellow eyes', 'pink eyes', 
        'aqua eyes', 'black eyes', 'orange eyes',
        'blush', 'smile', 'open mouth', 'hat', 'ribbon', 'glasses']
fw.write('id,' + ','.join(tags) + '\n')

images = glob.glob(os.path.join(OUTPUT_DIR, '*.jpg'))
for x in tqdm(range(len(images))):
    img_path = images[x]
    image = imread(img_path)
    result = illust2vec.estimate_specific_tags([image], tags)[0]
    
    hair_colors = [[h, result[h]] for h in tags[0:13]]
    hair_colors.sort(key=lambda x:x[1], reverse=True)
    for h in tags[0:13]:
        if h == hair_colors[0][0]:
            result[h] = 1
        else:
            result[h] = 0
    
    hair_styles = [[h, result[h]] for h in tags[13:18]]
    hair_styles.sort(key=lambda x:x[1], reverse=True)
    for h in tags[13:18]:
        if h == hair_styles[0][0]:
            result[h] = 1
        else:
            result[h] = 0
    
    eye_colors = [[h, result[h]] for h in tags[18:28]]
    eye_colors.sort(key=lambda x:x[1], reverse=True)
    for h in tags[18:28]:
        if h == eye_colors[0][0]:
            result[h] = 1
        else:
            result[h] = 0
    
    for h in tags[28:]:
        if result[h] > 0.25:
            result[h] = 1
        else:
            result[h] = 0
        
    fw.write(img_path + ',' + ','.join([str(result[t]) for t in tags]) + '\n')
    
fw.close()
複製代碼

這樣一來,便獲得了27772張二次元頭像,以及每張頭像對應的34個標籤值

獲取每張頭像的4096維向量表示

illust2vec = i2v.make_i2v_with_chainer("illust2vec_ver200.caffemodel")
img_all = []
vec_all = []
for x in tqdm(range(len(images))):
    img_path = images[x]
    image = imread(img_path)
    vector = illust2vec.extract_feature([image])[0]
    img_all.append(image / 255.)
    vec_all.append(vector)
img_all = np.array(img_all)
vec_all = np.array(vec_all)
複製代碼

隨機選擇2000張頭像,進行tSNE降維可視化

from sklearn.manifold import TSNE
from imageio import imsave
data_index = np.arange(img_all.shape[0])
np.random.shuffle(data_index)
data_index = data_index[:2000]

tsne = TSNE(perplexity=30, n_components=2, init='pca', n_iter=5000)
two_d_vectors = tsne.fit_transform(vec_all[data_index, :])
puzzles = np.ones((6400, 6400, 3))
xmin = np.min(two_d_vectors[:, 0])
xmax = np.max(two_d_vectors[:, 0])
ymin = np.min(two_d_vectors[:, 1])
ymax = np.max(two_d_vectors[:, 1])

for i, vector in enumerate(two_d_vectors):
    x, y = two_d_vectors[i, :]
    x = int((x - xmin) / (xmax - xmin) * (6400 - 128) + 64)
    y = int((y - ymin) / (ymax - ymin) * (6400 - 128) + 64)
    puzzles[y - 64: y + 64, x - 64: x + 64, :] = img_all[data_index[i]]
imsave('二次元頭像降維可視化.png', puzzles)
複製代碼

可視化結果以下,類似的頭像確實被聚到了一塊兒

二次元頭像降維可視化

模型

使用ACGAN結構,但和CelebA中用的DCGAN不一樣,此次使用更深更復雜的網絡來實現G和D,參考自SRGAN,arxiv.org/abs/1609.04…

生成器結構以下:

  • 使用16個殘差塊,即ResNet中的shortcut思想
  • 使用Sub-pixel CNN代替deconvolution,arxiv.org/abs/1609.05…

生成器網絡結構

Sub-pixel CNN原理以下,把多個層拼接成一個層,從而達到增長高度和寬度、減小深度的目的

Sub-pixel CNN原理

判別器結構以下,使用10個殘差塊,輸出端包括兩支,分別完成判別和分類任務

判別器網絡結構

實現

加載庫

# -*- coding: utf-8 -*-

import tensorflow as tf
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import os
from imageio import imread, imsave, mimsave
import glob
from tqdm import tqdm
複製代碼

加載圖片

images = glob.glob('faces/*.jpg')
print(len(images))
複製代碼

加載標籤

tags = pd.read_csv('face_tags.txt')
tags.index = tags['id']
tags.head()
複製代碼

定義一些常量、網絡tensor、輔助函數,批大小設爲2的冪比較合適,這裏設爲64,考慮學習率衰減

batch_size = 64
z_dim = 128
WIDTH = 128
HEIGHT = 128
LABEL = 34
LAMBDA = 0.05
BETA = 3

OUTPUT_DIR = 'samples'
if not os.path.exists(OUTPUT_DIR):
    os.mkdir(OUTPUT_DIR)

X = tf.placeholder(dtype=tf.float32, shape=[batch_size, HEIGHT, WIDTH, 3], name='X')
X_perturb = tf.placeholder(dtype=tf.float32, shape=[batch_size, HEIGHT, WIDTH, 3], name='X_perturb')
Y = tf.placeholder(dtype=tf.float32, shape=[batch_size, LABEL], name='Y')
noise = tf.placeholder(dtype=tf.float32, shape=[batch_size, z_dim], name='noise')
noise_y = tf.placeholder(dtype=tf.float32, shape=[batch_size, LABEL], name='noise_y')
is_training = tf.placeholder(dtype=tf.bool, name='is_training')

global_step = tf.Variable(0, trainable=False)
add_global = global_step.assign_add(1)
initial_learning_rate = 0.0002
learning_rate = tf.train.exponential_decay(initial_learning_rate, global_step=global_step, decay_steps=20000, decay_rate=0.5)

def lrelu(x, leak=0.2):
    return tf.maximum(x, leak * x)

def sigmoid_cross_entropy_with_logits(x, y):
    return tf.nn.sigmoid_cross_entropy_with_logits(logits=x, labels=y)

def conv2d(inputs, kernel_size, filters, strides, padding='same', use_bias=True):
    return tf.layers.conv2d(inputs=inputs, kernel_size=kernel_size, filters=filters, strides=strides, padding=padding, use_bias=use_bias)
    
def batch_norm(inputs, is_training=is_training, decay=0.9):
    return tf.contrib.layers.batch_norm(inputs, is_training=is_training, decay=decay)
複製代碼

判別器部分

def d_block(inputs, filters):
    h0 = lrelu(conv2d(inputs, 3, filters, 1))
    h0 = conv2d(h0, 3, filters, 1)
    h0 = lrelu(tf.add(h0, inputs))
    return h0

def discriminator(image, reuse=None):
    with tf.variable_scope('discriminator', reuse=reuse):
        h0 = image
        
        f = 32
        for i in range(5):
            if i < 3:
                h0 = lrelu(conv2d(h0, 4, f, 2))
            else:
                h0 = lrelu(conv2d(h0, 3, f, 2))
            h0 = d_block(h0, f)
            h0 = d_block(h0, f)
            f = f * 2
        
        h0 = lrelu(conv2d(h0, 3, f, 2))
        h0 = tf.contrib.layers.flatten(h0)
        Y_ = tf.layers.dense(h0, units=LABEL)
        h0 = tf.layers.dense(h0, units=1)
        return h0, Y_
複製代碼

生成器部分

def g_block(inputs):
    h0 = tf.nn.relu(batch_norm(conv2d(inputs, 3, 64, 1, use_bias=False)))
    h0 = batch_norm(conv2d(h0, 3, 64, 1, use_bias=False))
    h0 = tf.add(h0, inputs)
    return h0

def generator(z, label):
    with tf.variable_scope('generator', reuse=None):
        d = 16
        z = tf.concat([z, label], axis=1)
        h0 = tf.layers.dense(z, units=d * d * 64)
        h0 = tf.reshape(h0, shape=[-1, d, d, 64])
        h0 = tf.nn.relu(batch_norm(h0))
        shortcut = h0
        
        for i in range(16):
            h0 = g_block(h0)
            
        h0 = tf.nn.relu(batch_norm(h0))
        h0 = tf.add(h0, shortcut)
        
        for i in range(3):
            h0 = conv2d(h0, 3, 256, 1, use_bias=False)
            h0 = tf.depth_to_space(h0, 2)
            h0 = tf.nn.relu(batch_norm(h0))
        
        h0 = tf.layers.conv2d(h0, kernel_size=9, filters=3, strides=1, padding='same', activation=tf.nn.tanh, name='g', use_bias=True)
        return h0
複製代碼

損失函數,這裏的gp項來自DRAGAN,arxiv.org/abs/1705.07…,WGAN使用真實樣本和合成樣本的插值,而DRAGAN使用真實樣本和干擾樣本的插值

g = generator(noise, noise_y)
d_real, y_real = discriminator(X)
d_fake, y_fake = discriminator(g, reuse=True)

loss_d_real = tf.reduce_mean(sigmoid_cross_entropy_with_logits(d_real, tf.ones_like(d_real)))
loss_d_fake = tf.reduce_mean(sigmoid_cross_entropy_with_logits(d_fake, tf.zeros_like(d_fake)))
loss_g_fake = tf.reduce_mean(sigmoid_cross_entropy_with_logits(d_fake, tf.ones_like(d_fake)))

loss_c_real = tf.reduce_mean(sigmoid_cross_entropy_with_logits(y_real, Y))
loss_c_fake = tf.reduce_mean(sigmoid_cross_entropy_with_logits(y_fake, noise_y))

loss_d = loss_d_real + loss_d_fake + BETA * loss_c_real
loss_g = loss_g_fake + BETA * loss_c_fake

alpha = tf.random_uniform(shape=[batch_size, 1, 1, 1], minval=0., maxval=1.)
interpolates = alpha * X + (1 - alpha) * X_perturb
grad = tf.gradients(discriminator(interpolates, reuse=True)[0], [interpolates])[0]
slop = tf.sqrt(tf.reduce_sum(tf.square(grad), axis=[1]))
gp = tf.reduce_mean((slop - 1.) ** 2)
loss_d += LAMBDA * gp

vars_g = [var for var in tf.trainable_variables() if var.name.startswith('generator')]
vars_d = [var for var in tf.trainable_variables() if var.name.startswith('discriminator')]
複製代碼

定義優化器

update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
    optimizer_d = tf.train.AdamOptimizer(learning_rate=learning_rate, beta1=0.5).minimize(loss_d, var_list=vars_d)
    optimizer_g = tf.train.AdamOptimizer(learning_rate=learning_rate, beta1=0.5).minimize(loss_g, var_list=vars_g)
複製代碼

合成圖片的函數

def montage(images):    
    if isinstance(images, list):
        images = np.array(images)
    img_h = images.shape[1]
    img_w = images.shape[2]
    n_plots = int(np.ceil(np.sqrt(images.shape[0])))
    if len(images.shape) == 4 and images.shape[3] == 3:
        m = np.ones(
            (images.shape[1] * n_plots + n_plots + 1,
             images.shape[2] * n_plots + n_plots + 1, 3)) * 0.5
    elif len(images.shape) == 4 and images.shape[3] == 1:
        m = np.ones(
            (images.shape[1] * n_plots + n_plots + 1,
             images.shape[2] * n_plots + n_plots + 1, 1)) * 0.5
    elif len(images.shape) == 3:
        m = np.ones(
            (images.shape[1] * n_plots + n_plots + 1,
             images.shape[2] * n_plots + n_plots + 1)) * 0.5
    else:
        raise ValueError('Could not parse image shape of {}'.format(images.shape))
    for i in range(n_plots):
        for j in range(n_plots):
            this_filter = i * n_plots + j
            if this_filter < images.shape[0]:
                this_img = images[this_filter]
                m[1 + i + i * img_h:1 + i + (i + 1) * img_h,
                  1 + j + j * img_w:1 + j + (j + 1) * img_w] = this_img
    return m
複製代碼

整理數據

X_all = []
Y_all = []
for i in tqdm(range(len(images))):
    image = imread(images[i])
    image = (image / 255. - 0.5) * 2
    X_all.append(image)
    
    y = list(tags.loc[images[i]])
    Y_all.append(y[1:])

X_all = np.array(X_all)
Y_all = np.array(Y_all)
print(X_all.shape, Y_all.shape)
複製代碼

定義隨機產生標籤的函數,原始數據中標籤分佈不均勻,但咱們但願G能學到各類標籤,因此均勻地生成各種標籤

def get_random_tags():
    y = np.random.uniform(0.0, 1.0, [batch_size, LABEL]).astype(np.float32)
    y[y > 0.75] = 1
    y[y <= 0.75] = 0
    for i in range(batch_size):
        hc = np.random.randint(0, 13)
        hs = np.random.randint(13, 18)
        ec = np.random.randint(18, 28)
        y[i, :28] = 0
        y[i, hc] = 1 # hair color
        y[i, hs] = 1 # hair style
        y[i, ec] = 1 # eye color
    return y
複製代碼

訓練模型,CelebA中男女比例均衡,所以每次迭代隨機取一批數據訓練便可。但如今因爲原始數據中各種標籤分佈不均勻,因此須要完整地迭代數據

sess = tf.Session()
sess.run(tf.global_variables_initializer())
z_samples = np.random.uniform(-1.0, 1.0, [batch_size, z_dim]).astype(np.float32)
y_samples = get_random_tags()
for i in range(batch_size):
    y_samples[i, :28] = 0
    y_samples[i, i // 8 % 13] = 1 # hair color
    y_samples[i, i // 8 % 5 + 13] = 1 # hair style
    y_samples[i, i // 8 % 10 + 18] = 1 # eye color
samples = []
loss = {'d': [], 'g': []}

offset = 0
for i in tqdm(range(60000)):
    if offset + batch_size > X_all.shape[0]:
        offset = 0
    if offset == 0:
        data_index = np.arange(X_all.shape[0])
        np.random.shuffle(data_index)
        X_all = X_all[data_index, :, :, :]
        Y_all = Y_all[data_index, :]
    X_batch = X_all[offset: offset + batch_size, :, :, :]
    Y_batch = Y_all[offset: offset + batch_size, :]
    X_batch_perturb = X_batch + 0.5 * X_batch.std() * np.random.random(X_batch.shape)
    offset += batch_size
    
    n = np.random.uniform(-1.0, 1.0, [batch_size, z_dim]).astype(np.float32)
    ny = get_random_tags()
    _, d_ls = sess.run([optimizer_d, loss_d], feed_dict={X: X_batch, X_perturb: X_batch_perturb, Y: Y_batch, noise: n, noise_y: ny, is_training: True})    
    
    n = np.random.uniform(-1.0, 1.0, [batch_size, z_dim]).astype(np.float32)
    ny = get_random_tags()
    _, g_ls = sess.run([optimizer_g, loss_g], feed_dict={noise: n, noise_y: ny, is_training: True})
    
    loss['d'].append(d_ls)
    loss['g'].append(g_ls)
    
    _, lr = sess.run([add_global, learning_rate])
    
    if i % 500 == 0:
        print(i, d_ls, g_ls, lr)
        gen_imgs = sess.run(g, feed_dict={noise: z_samples, noise_y: y_samples, is_training: False})
        gen_imgs = (gen_imgs + 1) / 2
        imgs = [img[:, :, :] for img in gen_imgs]
        gen_imgs = montage(imgs)
        plt.axis('off')
        plt.imshow(gen_imgs)
        imsave(os.path.join(OUTPUT_DIR, 'sample_%d.jpg' % i), gen_imgs)
        plt.show()
        samples.append(gen_imgs)

plt.plot(loss['d'], label='Discriminator')
plt.plot(loss['g'], label='Generator')
plt.legend(loc='upper right')
plt.savefig('Loss.png')
plt.show()
mimsave(os.path.join(OUTPUT_DIR, 'samples.gif'), samples, fps=10)
複製代碼

生成的二次元頭像以下,每一行對應的頭髮顏色、髮型、眼睛顏色相同,其餘屬性隨機。少部分結果不太好,多是某些噪音或條件的問題

二次元頭像生成結果

保存模型

saver = tf.train.Saver()
saver.save(sess, './anime_acgan', global_step=60000)
複製代碼

在單機上加載模型,進行如下三項嘗試:

  • 按原始標籤分佈隨機生成樣本
  • 生成指定標籤的樣本
  • 固定噪音,按原始標籤分佈生成樣本
# -*- coding: utf-8 -*-

import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
from imageio import imsave

batch_size = 64
z_dim = 128
LABEL = 34

def montage(images):    
    if isinstance(images, list):
        images = np.array(images)
    img_h = images.shape[1]
    img_w = images.shape[2]
    n_plots = int(np.ceil(np.sqrt(images.shape[0])))
    if len(images.shape) == 4 and images.shape[3] == 3:
        m = np.ones(
            (images.shape[1] * n_plots + n_plots + 1,
             images.shape[2] * n_plots + n_plots + 1, 3)) * 0.5
    elif len(images.shape) == 4 and images.shape[3] == 1:
        m = np.ones(
            (images.shape[1] * n_plots + n_plots + 1,
             images.shape[2] * n_plots + n_plots + 1, 1)) * 0.5
    elif len(images.shape) == 3:
        m = np.ones(
            (images.shape[1] * n_plots + n_plots + 1,
             images.shape[2] * n_plots + n_plots + 1)) * 0.5
    else:
        raise ValueError('Could not parse image shape of {}'.format(images.shape))
    for i in range(n_plots):
        for j in range(n_plots):
            this_filter = i * n_plots + j
            if this_filter < images.shape[0]:
                this_img = images[this_filter]
                m[1 + i + i * img_h:1 + i + (i + 1) * img_h,
                  1 + j + j * img_w:1 + j + (j + 1) * img_w] = this_img
    return m

def get_random_tags():
    y = np.random.uniform(0.0, 1.0, [batch_size, LABEL]).astype(np.float32)
    p_other = [0.6, 0.6, 0.25, 0.04488882, 0.3, 0.05384738]
    for i in range(batch_size):
        for j in range(len(p_other)):
            if y[i, j + 28] < p_other[j]:
                y[i, j + 28] = 1
            else:
                y[i, j + 28] = 0

    phc = [0.15968645, 0.21305391, 0.15491921, 0.10523116, 0.07953927, 0.09508879, 0.03567429, 0.07733163, 0.03157895, 0.01833307, 0.02236442, 0.00537514, 0.00182371]
    phs = [0.52989922,  0.37101264,  0.12567589,  0.00291153,  0.00847864]
    pec = [0.28350664, 0.15760678, 0.17862742, 0.13412254, 0.14212126, 0.0543913, 0.01020637, 0.00617501, 0.03167493, 0.00156775]
    for i in range(batch_size):
        y[i, :28] = 0

        hc = np.random.random()
        for j in range(len(phc)):
            if np.sum(phc[:j]) < hc < np.sum(phc[:j + 1]):
                y[i, j] = 1
                break

        hs = np.random.random()
        for j in range(len(phs)):
            if np.sum(phs[:j]) < hs < np.sum(phs[:j + 1]):
                y[i, j + 13] = 1
                break

        ec = np.random.random()
        for j in range(len(pec)):
            if np.sum(pec[:j]) < ec < np.sum(pec[:j + 1]):
                y[i, j + 18] = 1
                break
    return y

sess = tf.Session()
sess.run(tf.global_variables_initializer())

saver = tf.train.import_meta_graph('./anime_acgan-60000.meta')
saver.restore(sess, tf.train.latest_checkpoint('./'))

graph = tf.get_default_graph()
g = graph.get_tensor_by_name('generator/g/Tanh:0')
noise = graph.get_tensor_by_name('noise:0')
noise_y = graph.get_tensor_by_name('noise_y:0')
is_training = graph.get_tensor_by_name('is_training:0')

# 隨機生成樣本
z_samples = np.random.uniform(-1.0, 1.0, [batch_size, z_dim]).astype(np.float32)
y_samples = get_random_tags()
gen_imgs = sess.run(g, feed_dict={noise: z_samples, noise_y: y_samples, is_training: False})
gen_imgs = (gen_imgs + 1) / 2
imgs = [img[:, :, :] for img in gen_imgs]
gen_imgs = montage(imgs)
gen_imgs = np.clip(gen_imgs, 0, 1)
imsave('1_二次元頭像隨機生成.jpg', gen_imgs)

# 生成指定標籤的樣本
all_tags = ['blonde hair', 'brown hair', 'black hair', 'blue hair', 'pink hair', 'purple hair', 'green hair', 'red hair', 'silver hair', 'white hair', 'orange hair', 'aqua hair', 'grey hair', 'long hair', 'short hair', 'twintails', 'drill hair', 'ponytail', 'blue eyes', 'red eyes', 'brown eyes', 'green eyes', 'purple eyes', 'yellow eyes', 'pink eyes', 'aqua eyes', 'black eyes', 'orange eyes', 'blush', 'smile', 'open mouth', 'hat', 'ribbon', 'glasses']
for i, tags in enumerate([['blonde hair', 'twintails', 'blush', 'smile', 'ribbon', 'red eyes'], ['silver hair', 'long hair', 'blush', 'smile', 'open mouth', 'blue eyes']]):
    z_samples = np.random.uniform(-1.0, 1.0, [batch_size, z_dim]).astype(np.float32)
    y_samples = np.zeros([1, LABEL])
    for tag in tags:
        y_samples[0, all_tags.index(tag)] = 1
    y_samples = np.repeat(y_samples, batch_size, 0)
    gen_imgs = sess.run(g, feed_dict={noise: z_samples, noise_y: y_samples, is_training: False})
    gen_imgs = (gen_imgs + 1) / 2
    imgs = [img[:, :, :] for img in gen_imgs]
    gen_imgs = montage(imgs)
    gen_imgs = np.clip(gen_imgs, 0, 1)
    imsave('%d_二次元頭像指定標籤.jpg' % (i + 2), gen_imgs)

# 固定噪音隨機標籤
z_samples = np.random.uniform(-1.0, 1.0, [1, z_dim]).astype(np.float32)
z_samples = np.repeat(z_samples, batch_size, 0)
y_samples = get_random_tags()
gen_imgs = sess.run(g, feed_dict={noise: z_samples, noise_y: y_samples, is_training: False})
gen_imgs = (gen_imgs + 1) / 2
imgs = [img[:, :, :] for img in gen_imgs]
gen_imgs = montage(imgs)
gen_imgs = np.clip(gen_imgs, 0, 1)
imsave('4_二次元頭像固定噪音.jpg', gen_imgs)
複製代碼

按原始標籤分佈隨機生成樣本

二次元頭像隨機生成

生成金髮、雙馬尾、臉紅、微笑、繫絲帶、紅眼睛的頭像

二次元頭像指定標籤

生成銀髮、長髮、臉紅、微笑、張嘴、藍眼睛的頭像

二次元頭像指定標籤

固定噪音隨機標籤,使得頭像主體大體相同但各類細節不同

二次元頭像固定噪音

掌握以上內容後,也能夠在CelebA上訓練受40個01屬性控制的ACGAN模型,並且比二次元頭像更簡單一些

參考

視頻講解課程

深度有趣(一)

相關文章
相關標籤/搜索