Tensorflow: Recurrent neural network training pairs & the effect on the loss function -


i looking @ code rnn language model. confused 1) how training pairs (x,y) constructed , subsequently 2) how loss computed. code borrows tensorflow rnn tutorial ( reader module ).

within reader module, generator, ptb_iterator, defined. takes in data 1 sequence , yields x,y pairs in accordance batch size , number of steps wish 'unroll' rnn. best @ entire definition first part confused me this:

for in range(epoch_size):   x = data[:, i*num_steps:(i+1)*num_steps]   y = data[:, i*num_steps+1:(i+1)*num_steps+1]   yield (x, y) 

which documented as:

*yields:  pairs of batched data, each matrix of shape [batch_size, num_steps].  second element of tuple same data time-shifted  right one.* 

so if understand correctly, data sequence [1 2 3 4 5 6] , num_steps = 2 stochastic gradient descent(i.e. batch_size=1) following pairs generated:

  1. x=[1,2] , y=[2,3]
  2. x=[3,4] , y=[5,6]

1) correct way this? should not done pairs are:

  1. x=[1,2] , y=[2,3]
  2. x=[2,3] , y=[3,4] ... # allows more datapoints

or

  1. x=[1,2] , y=[3]
  2. x=[2,3] , y=[4] ... # ensures predictions made context length = num_steps

2) lastly, given pairs generated in reader module, when comes training, loss computed not reflect rnn's performance on range of unrolled steps instead of num_steps specified?

for example, model make prediction x=3 (from x=[3,4]) without considering 2 came before (i.e. unrolling rnn 1 step instead of two).

re (1), goal sequence size bigger 2, , don't want replicate entire dataset n times don't statistical power. re (2) it's approximation use @ training time; @ prediction time should predict entire sequence.


Comments

Popular posts from this blog

serialization - Convert Any type in scala to Array[Byte] and back -

matplotlib support failed in PyCharm on OSX -

python - Matplotlib: TypeError: 'AxesSubplot' object is not callable -