machine learning - Convolutional GAN with MNIST data not converging -


i have been working on trying convolutional gan working on mnist data (which should easiest thing in world) reason having convergence issues. if discriminator , generator connected nns have no problem in convergence, when try , change these functions use conv. nets of sudden bad convergence issues (discriminator driven 0 rapidly, generator tends infinity).

i cannot life of me figure out going wrong @ moment, wondering if on here me pinpoint issue (and if see else wrong don't hesitate - let me know anyway).

here code discriminator , generator:

################## discriminator ################### tf.name_scope("weights_discriminator"):      d_w1 = tf.get_variable(initializer = xavier_init([2, 2, 1, 128]),name='d_w1')      d_w2 = tf.get_variable(initializer = xavier_init([2,2, 128,256]),name='d_w2')      d_w3 = tf.get_variable(initializer = xavier_init([7*7*256,1]),name='d_w3')   theta_d = [d_w1, d_w2, d_w3]  # stuff optimised    def discriminator(x,smpl_size):      x = tf.reshape(x,[smpl_size,28,28,1]) # make sure correct size       # size of: [-1,28,28,1] going in      conv1 = tf.nn.conv2d(input=x, filter=d_w1, strides=[1,1,1,1], padding="same")      conv1 = lrelu(conv1)      conv1 = tf.nn.max_pool(value = conv1, ksize = [1,2,2,1],strides = [1,2,2,1],padding="same")       # size of: [-1,14,14,128] going in      conv2 = tf.nn.conv2d(input=conv1, filter=d_w2, strides=[1,1,1,1], padding="same")      conv2 = lrelu(conv2)      conv2 = tf.nn.max_pool(value = conv2, ksize = [1,2,2,1],strides = [1,2,2,1],padding="same")       # size of: [-1,7,7,256] going in      out3 = tf.reshape(conv2,[smpl_size,7*7*256])      out3 = tf.matmul(out3,d_w3)       return out3     ####################### generator ###############################  tf.name_scope("weights_generator"):      g_w0 = tf.get_variable(initializer = xavier_init([dim_z,4*4*1024]),name='g_w0')  # 4*4*1024 = 16384 units      # initialised as: [stride height, stride width, output, input]      g_w1 = tf.get_variable(initializer = xavier_init([2,2,256,1024]),name='g_w1')      g_w2 = tf.get_variable(initializer = xavier_init([2,2,128,256]),name='g_w2')      g_w3 = tf.get_variable(initializer = xavier_init([2,2,1,128]),name='g_w3')   tf.name_scope("biases_generator"):      g_b0 = tf.get_variable(initializer = tf.zeros(shape=[4*4*1024]),name='g_b0')      g_b1 = tf.get_variable(initializer = tf.zeros(shape=[7,7,256]),name='g_b1')      g_b2 = tf.get_variable(initializer = tf.zeros(shape=[14,14,128]),name='g_b2')   theta_g = [g_w0, g_w1, g_w2, g_b0, g_b1, g_b2]  # stuff optimised   def generator(z,smpl_size):      g_h0 = tf.nn.relu(tf.matmul(z, g_w0) + g_b0)  # linear transform      reshaped = tf.reshape(g_h0, [smpl_size, 4,4,1024])       g_conv1 = tf.nn.conv2d_transpose(value = reshaped, filter = g_w1, strides = [1,2,2,1], output_shape = [smpl_size,7,7,256])      g_h1 = tf.nn.relu(g_conv1 + g_b1)       g_conv2 = tf.nn.conv2d_transpose(value = g_h1,  filter = g_w2, strides = [1,2,2,1], output_shape = [smpl_size,14,14,128])      g_h2 = tf.nn.relu(g_conv2 + g_b2)       g_conv3 = tf.nn.conv2d_transpose(value = g_h2, filter = g_w3, strides = [1,2,2,1], output_shape = [smpl_size,28,28,1])        g_out = tf.nn.tanh(g_conv3)      return g_out    ###################### gan losss #########################  # loss function definitions  g_sample_imgs = generator(z,smpling_size)  # generator used purely save images (cheap cop out non-dynamic size variable)  g_sample    = generator(z,mb_size)  d_real_vals = discriminator(x,mb_size)  d_fake_vals = discriminator(g_sample,mb_size)    # stable gan loss (because can use sigmoid_cross_entropy fn.)  d_loss_real = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits =   d_real_vals, labels = tf.ones_like(d_real_vals)))  d_loss_fake = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits =   d_fake_vals, labels = tf.zeros_like(d_fake_vals)))  d_loss = d_loss_real + d_loss_fake  g_loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits = d_fake_vals, labels = tf.ones_like(d_fake_vals)))    # solvers   d_solver = tf.train.adamoptimizer().minimize(d_loss, var_list=theta_d)  g_solver = tf.train.adamoptimizer().minimize(g_loss, var_list=theta_g) ########################################################## 

now know loss function correct because works if discriminator , generator functions re-coded fully-connected nns. said earlier, when tried re-write them convolutionnal networks loss convergences doesn't work , images output black-white checkerboard pattern , don't move there.

also using tensorboard i've had @ gradient magnitudes , in discriminator reach ~ 10^4 in magnitude, in generator reach ~10^2 in magnitude.

lastly, if can copy-paste entire code here if feel may analyse issue in anyway. let me know. keen bottom of issue.

here example output code:

 extracting ../../mnist_data/train-images-idx3-ubyte.gz  extracting ../../mnist_data/train-labels-idx1-ubyte.gz  extracting ../../mnist_data/t10k-images-idx3-ubyte.gz  extracting ../../mnist_data/t10k-labels-idx1-ubyte.gz  iter: 0  d loss: 71.64  g_loss: 346.9   iter: 10  d loss: 3.604e-33  g_loss: 1.046e+03   iter: 20  d loss: 0.0  g_loss: 1.155e+03 


Comments

Popular posts from this blog

Is there a better way to structure post methods in Class Based Views -

performance - Why is XCHG reg, reg a 3 micro-op instruction on modern Intel architectures? -

c# - Asp.net web api : redirect unauthorized requst to forbidden page -