head dim >64 #16

JlexZhong · 2024-09-01T10:45:04Z

RuntimeError: FlashAttention backward for head dim > 64 requires A100 or H100 GPUs as the implementation needs a large amount of shared memory.
Are it referring to the head dimension of vicuna-7b being more than 64?

The text was updated successfully, but these errors were encountered:

JlexZhong · 2024-09-01T10:45:55Z

My environment = A40*8

ChenRunjin · 2024-09-01T18:34:53Z

I used A6000 for training, but I didn't have this issue, which flash-attention version are you using?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

head dim >64 #16

head dim >64 #16

JlexZhong commented Sep 1, 2024

JlexZhong commented Sep 1, 2024

ChenRunjin commented Sep 1, 2024

head dim >64 #16

head dim >64 #16

Comments

JlexZhong commented Sep 1, 2024

JlexZhong commented Sep 1, 2024

ChenRunjin commented Sep 1, 2024