You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have tried to run the copy task with the default parameters (controller_size=100, controller_layers=1, num_heads=1, sequence_width=8, sequence_min_len=1, sequence_max_len=20, memory_n=128, memory_m=20, batch_size=1), the result is similar to the one in the notebook. However, when I changed the sequence length to a smaller one (sequence_min_len=1, sequence_max_len=5), the fitting rate is really slow (like the figure below) which is unexpected since smaller sequence should be learned faster. Do you have any idea why this happen and how to train smaller sequences properly? Any suggestion is welcomed.
The text was updated successfully, but these errors were encountered:
Interesting question. IMHO there a few things that may affect this behavior.
First of all, the NTM is trying to learn a "for loop" and showing only short examples makes it hard for the NTM to generalize.
Second, the parameters of the networks are used more frequently with larger sequence lengths, yielding more stable gradients and making it harder for the network to converge to a local minima that basically memorize the patterns.
Third, the capacity of the network (the number of parameters) plays a role in this as well, in the extreme, with a sequence length of 1 and large capacity, the easiest thing for network to learn is to memorize the inputs instead of learning the rule.
I increased the batch size and decreased the NTM's memory size, and the network with sequence lengths of 1 to 5 converged in less than 30K training samples. There are still fluctuations since the gradients are not stable enough.
Hi,
I have tried to run the copy task with the default parameters (controller_size=100, controller_layers=1, num_heads=1, sequence_width=8, sequence_min_len=1, sequence_max_len=20, memory_n=128, memory_m=20, batch_size=1), the result is similar to the one in the notebook. However, when I changed the sequence length to a smaller one (sequence_min_len=1, sequence_max_len=5), the fitting rate is really slow (like the figure below) which is unexpected since smaller sequence should be learned faster. Do you have any idea why this happen and how to train smaller sequences properly? Any suggestion is welcomed.
The text was updated successfully, but these errors were encountered: