weird criterion to decide if needed to adjust the padding size #35599

hyusterr · 2025-01-09T18:08:47Z

transformers/src/transformers/trainer_pt_utils.py

Line 93 in 241c04d

if len(tensor1.shape) == 1 or tensor1.shape[1] == tensor2.shape[1]:

When working with Trainer and model output with TokenClassiferOutput, the Trainer collect outputs batch by batch in evaluate_loop with nested_concat.
The criterion to decide whether it is needed to adjust the shape is by tensor.shape[1].
However, if user does not exclude attentions with ignore_key beforehand, the shape[1] of the attentions tensor is always the same. Leading to error occurred with torch.cat.
I think this design is kind of weird, especially because this behavior is not emphasized in the token classification tutorial: https://huggingface.co/docs/transformers/tasks/token_classification

The text was updated successfully, but these errors were encountered:

Rocketknight1 · 2025-01-10T14:33:14Z

cc @SunMarc @muellerzr

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

weird criterion to decide if needed to adjust the padding size #35599

weird criterion to decide if needed to adjust the padding size #35599

hyusterr commented Jan 9, 2025

Rocketknight1 commented Jan 10, 2025

weird criterion to decide if needed to adjust the padding size #35599

weird criterion to decide if needed to adjust the padding size #35599

Comments

hyusterr commented Jan 9, 2025

Rocketknight1 commented Jan 10, 2025