neural network - TensorFlow post-LSTM fully connected layer outputs return the same values as each other -
i trying train sequence-to-sequence lstm model dataset 3 labels: [1, 0]
detection of class 1, [0, 1]
detection of class 2, , [0, 0]
detection of nothing. after getting outputs lstm network, applied connected layer each cell's output following way:
outputs, state = tf.nn.dynamic_rnn(cell, input) # shape of outputs [batch_size, n_time_steps, n_hidden] # matmul works on matrices, reshape # time dimension batch dimension outputs = tf.reshape(outputs, [-1, n_hidden]) # shape [batch_size * n_time_steps, n_hidden] w = tf.variable(tf.truncated_normal(shape=[n_hidden, 2], stddev=0.1)) b = tf.variable(tf.constant(0.1, shape=[2])) logit = tf.add(tf.matmul(outputs, w), b, name='logit') # reshape [batch_size, n_time_steps, 2] logit = tf.reshape(logit, [batch_size, -1, 2])
on output, apply tf.nn.sigmoid_cross_entropy_with_logits
, reduce mean. model seems work fine achieving high accuracy , recall, except fact in cases outputs either [0, 0]
, or [1, 1]
. 2 logit outputs connected layer have similar values (but not same). puts hard-cap on precision of 50%, model converges (but not fraction of percent above).
now, intuition tell me must wrong training step , both connected outputs trained on same data, curiously enough when replace own implementation prepackaged 1 tf.contrib
:
outputs, state = tf.nn.dynamic_rnn(cell, input) logit = tf.contrib.layers.fully_connected(outputs, 2, activation_fn=none)
without changing single other thing, model starts training properly. now, obvious solution use implementation, why doesn't first 1 work?
Comments
Post a Comment