Tensorflow: Recurrent neural network training pairs & the effect on the loss function -
i looking @ code rnn language model. confused 1) how training pairs (x,y) constructed , subsequently 2) how loss computed. code borrows tensorflow rnn tutorial ( reader module ).
within reader module, generator, ptb_iterator, defined. takes in data 1 sequence , yields x,y pairs in accordance batch size , number of steps wish 'unroll' rnn. best @ entire definition first part confused me this:
for in range(epoch_size): x = data[:, i*num_steps:(i+1)*num_steps] y = data[:, i*num_steps+1:(i+1)*num_steps+1] yield (x, y)
which documented as:
*yields: pairs of batched data, each matrix of shape [batch_size, num_steps]. second element of tuple same data time-shifted right one.*
so if understand correctly, data sequence [1 2 3 4 5 6]
, num_steps = 2
stochastic gradient descent(i.e. batch_size=1) following pairs generated:
- x=[1,2] , y=[2,3]
- x=[3,4] , y=[5,6]
1) correct way this? should not done pairs are:
- x=[1,2] , y=[2,3]
- x=[2,3] , y=[3,4] ... # allows more datapoints
or
- x=[1,2] , y=[3]
- x=[2,3] , y=[4] ... # ensures predictions made context length = num_steps
2) lastly, given pairs generated in reader module, when comes training, loss computed not reflect rnn's performance on range of unrolled steps instead of num_steps
specified?
for example, model make prediction x=3 (from x=[3,4]) without considering 2 came before (i.e. unrolling rnn 1 step instead of two).
re (1), goal sequence size bigger 2, , don't want replicate entire dataset n times don't statistical power. re (2) it's approximation use @ training time; @ prediction time should predict entire sequence.
Comments
Post a Comment