machine learning - Convolutional GAN with MNIST data not converging -
i have been working on trying convolutional gan working on mnist data (which should easiest thing in world) reason having convergence issues. if discriminator , generator connected nns have no problem in convergence, when try , change these functions use conv. nets of sudden bad convergence issues (discriminator driven 0 rapidly, generator tends infinity).
i cannot life of me figure out going wrong @ moment, wondering if on here me pinpoint issue (and if see else wrong don't hesitate - let me know anyway).
here code discriminator , generator:
################## discriminator ################### tf.name_scope("weights_discriminator"): d_w1 = tf.get_variable(initializer = xavier_init([2, 2, 1, 128]),name='d_w1') d_w2 = tf.get_variable(initializer = xavier_init([2,2, 128,256]),name='d_w2') d_w3 = tf.get_variable(initializer = xavier_init([7*7*256,1]),name='d_w3') theta_d = [d_w1, d_w2, d_w3] # stuff optimised def discriminator(x,smpl_size): x = tf.reshape(x,[smpl_size,28,28,1]) # make sure correct size # size of: [-1,28,28,1] going in conv1 = tf.nn.conv2d(input=x, filter=d_w1, strides=[1,1,1,1], padding="same") conv1 = lrelu(conv1) conv1 = tf.nn.max_pool(value = conv1, ksize = [1,2,2,1],strides = [1,2,2,1],padding="same") # size of: [-1,14,14,128] going in conv2 = tf.nn.conv2d(input=conv1, filter=d_w2, strides=[1,1,1,1], padding="same") conv2 = lrelu(conv2) conv2 = tf.nn.max_pool(value = conv2, ksize = [1,2,2,1],strides = [1,2,2,1],padding="same") # size of: [-1,7,7,256] going in out3 = tf.reshape(conv2,[smpl_size,7*7*256]) out3 = tf.matmul(out3,d_w3) return out3 ####################### generator ############################### tf.name_scope("weights_generator"): g_w0 = tf.get_variable(initializer = xavier_init([dim_z,4*4*1024]),name='g_w0') # 4*4*1024 = 16384 units # initialised as: [stride height, stride width, output, input] g_w1 = tf.get_variable(initializer = xavier_init([2,2,256,1024]),name='g_w1') g_w2 = tf.get_variable(initializer = xavier_init([2,2,128,256]),name='g_w2') g_w3 = tf.get_variable(initializer = xavier_init([2,2,1,128]),name='g_w3') tf.name_scope("biases_generator"): g_b0 = tf.get_variable(initializer = tf.zeros(shape=[4*4*1024]),name='g_b0') g_b1 = tf.get_variable(initializer = tf.zeros(shape=[7,7,256]),name='g_b1') g_b2 = tf.get_variable(initializer = tf.zeros(shape=[14,14,128]),name='g_b2') theta_g = [g_w0, g_w1, g_w2, g_b0, g_b1, g_b2] # stuff optimised def generator(z,smpl_size): g_h0 = tf.nn.relu(tf.matmul(z, g_w0) + g_b0) # linear transform reshaped = tf.reshape(g_h0, [smpl_size, 4,4,1024]) g_conv1 = tf.nn.conv2d_transpose(value = reshaped, filter = g_w1, strides = [1,2,2,1], output_shape = [smpl_size,7,7,256]) g_h1 = tf.nn.relu(g_conv1 + g_b1) g_conv2 = tf.nn.conv2d_transpose(value = g_h1, filter = g_w2, strides = [1,2,2,1], output_shape = [smpl_size,14,14,128]) g_h2 = tf.nn.relu(g_conv2 + g_b2) g_conv3 = tf.nn.conv2d_transpose(value = g_h2, filter = g_w3, strides = [1,2,2,1], output_shape = [smpl_size,28,28,1]) g_out = tf.nn.tanh(g_conv3) return g_out ###################### gan losss ######################### # loss function definitions g_sample_imgs = generator(z,smpling_size) # generator used purely save images (cheap cop out non-dynamic size variable) g_sample = generator(z,mb_size) d_real_vals = discriminator(x,mb_size) d_fake_vals = discriminator(g_sample,mb_size) # stable gan loss (because can use sigmoid_cross_entropy fn.) d_loss_real = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits = d_real_vals, labels = tf.ones_like(d_real_vals))) d_loss_fake = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits = d_fake_vals, labels = tf.zeros_like(d_fake_vals))) d_loss = d_loss_real + d_loss_fake g_loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits = d_fake_vals, labels = tf.ones_like(d_fake_vals))) # solvers d_solver = tf.train.adamoptimizer().minimize(d_loss, var_list=theta_d) g_solver = tf.train.adamoptimizer().minimize(g_loss, var_list=theta_g) ##########################################################
now know loss function correct because works if discriminator , generator functions re-coded fully-connected nns. said earlier, when tried re-write them convolutionnal networks loss convergences doesn't work , images output black-white checkerboard pattern , don't move there.
also using tensorboard i've had @ gradient magnitudes , in discriminator reach ~ 10^4 in magnitude, in generator reach ~10^2 in magnitude.
lastly, if can copy-paste entire code here if feel may analyse issue in anyway. let me know. keen bottom of issue.
here example output code:
extracting ../../mnist_data/train-images-idx3-ubyte.gz extracting ../../mnist_data/train-labels-idx1-ubyte.gz extracting ../../mnist_data/t10k-images-idx3-ubyte.gz extracting ../../mnist_data/t10k-labels-idx1-ubyte.gz iter: 0 d loss: 71.64 g_loss: 346.9 iter: 10 d loss: 3.604e-33 g_loss: 1.046e+03 iter: 20 d loss: 0.0 g_loss: 1.155e+03
Comments
Post a Comment