decode过程mask问题 #7

yuenoble · 2020-09-25T01:39:58Z

您好，我在看代码时，训练阶段decode过程中，计算Multihead-self attention时，我看只用了对未来词的掩码信息，没有使用pad部分的掩码，这里不是特别理解，请指教。

def transformer_prepare_decoder(targets_l2r, targets_r2l, hparams):
"""Prepare one shard of the model for the decoder.
"""
decoder_self_attention_bias = (
common_attention.attention_bias_lower_triangle(tf.shape(targets_l2r)[1])) ## [1, 1, length, length]
decoder_input_l2r = common_layers.shift_left_3d(targets_l2r)
decoder_input_r2l = common_layers.shift_left_3d(targets_r2l)
if hparams.pos == "timing":
decoder_input_l2r = common_attention.add_timing_signal_1d(decoder_input_l2r)
decoder_input_r2l = common_attention.add_timing_signal_1d(decoder_input_r2l)
decoder_input = tf.concat([tf.expand_dims(decoder_input_l2r, 0), tf.expand_dims(decoder_input_r2l, 0)], axis=0) ## [2, batch, length, hidden_size]
return (decoder_input, decoder_self_attention_bias)
这个函数得到的decoder_self_attention_bias 好像只是未来词的掩码，传递给decoder过程Multihead-self attention

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

decode过程mask问题 #7

decode过程mask问题 #7

yuenoble commented Sep 25, 2020

decode过程mask问题 #7

decode过程mask问题 #7

Comments

yuenoble commented Sep 25, 2020