neural network - TensorFlow post-LSTM fully connected layer outputs return the same values as each other -


i trying train sequence-to-sequence lstm model dataset 3 labels: [1, 0] detection of class 1, [0, 1] detection of class 2, , [0, 0] detection of nothing. after getting outputs lstm network, applied connected layer each cell's output following way:

outputs, state = tf.nn.dynamic_rnn(cell, input) # shape of outputs [batch_size, n_time_steps, n_hidden]  # matmul works on matrices, reshape # time dimension batch dimension outputs = tf.reshape(outputs, [-1, n_hidden]) # shape [batch_size * n_time_steps, n_hidden]  w = tf.variable(tf.truncated_normal(shape=[n_hidden, 2], stddev=0.1)) b = tf.variable(tf.constant(0.1, shape=[2])) logit = tf.add(tf.matmul(outputs, w), b, name='logit')  # reshape [batch_size, n_time_steps, 2] logit = tf.reshape(logit, [batch_size, -1, 2]) 

on output, apply tf.nn.sigmoid_cross_entropy_with_logits , reduce mean. model seems work fine achieving high accuracy , recall, except fact in cases outputs either [0, 0], or [1, 1]. 2 logit outputs connected layer have similar values (but not same). puts hard-cap on precision of 50%, model converges (but not fraction of percent above).

now, intuition tell me must wrong training step , both connected outputs trained on same data, curiously enough when replace own implementation prepackaged 1 tf.contrib:

outputs, state = tf.nn.dynamic_rnn(cell, input) logit = tf.contrib.layers.fully_connected(outputs, 2, activation_fn=none) 

without changing single other thing, model starts training properly. now, obvious solution use implementation, why doesn't first 1 work?


Comments

Popular posts from this blog

Is there a better way to structure post methods in Class Based Views -

performance - Why is XCHG reg, reg a 3 micro-op instruction on modern Intel architectures? -

c# - Asp.net web api : redirect unauthorized requst to forbidden page -