Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Target is not converted to 'Long' error #302

Open
mrT23 opened this issue Nov 19, 2019 · 7 comments
Open

Target is not converted to 'Long' error #302

mrT23 opened this issue Nov 19, 2019 · 7 comments

Comments

@mrT23
Copy link

mrT23 commented Nov 19, 2019

i am trying to run simple PETS dataset training.

when i use the following simple learner with label smooth:
learn = cnn_learner(dbunch, resnet34, metrics=[accuracy, top_k_accuracy], loss_func=LabelSmoothingCrossEntropy()) learn.fit_one_cycle(4)

i get the following error immediately when the training starts:

File "...\fastai2\layers.py", line 254, in forward
return loss*self.eps/c + (1-self.eps) * F.nll_loss(log_preds, target, reduction=self.reduction)
File "D:\Anaconda3\envs\torch_conda\lib\site-packages\torch\nn\functional.py", line 1824, in nll_loss
ret = torch._C._nn.nll_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
RuntimeError: Expected object of scalar type Long but got scalar type Int for argument #2 'target'

if i use CrossEntropyLossFlat loss instead, i get an error at the validation phase:

File "...\fastai2\metrics.py", line 76, in accuracy
return (pred == targ).float().mean()
File "...\fastai2\torch_core.py", line 176, in _f
res = getattr(super(TensorBase, self), fn)(*args, **kwargs)
RuntimeError: Expected object of scalar type Int but got scalar type Long for argument #2 'other'

Thanks

@muellerzr
Copy link
Contributor

How are you making your databunch?

@mrT23
Copy link
Author

mrT23 commented Nov 19, 2019

` def get_y_fun(input):
im_name = os.path.basename(input) # 'staffordshire_bull_terrier_54.jpg'
class_name = im_name[:im_name.rfind('_')]
return class_name

pets = DataBlock(blocks=(ImageBlock, CategoryBlock),
                 get_items=get_image_files,
                 splitter=RandomSplitter(),
                 get_y=get_y_fun)

dbunch = pets.databunch(untar_data(URLs.PETS) / "images", item_tfms=Resize(args.input_size),
                        batch_tfms=aug_transforms(), num_workers=args.num_workers)

`

btw, the regexpr
pat = r'/([^/]+)_\d+.jpg$'
is not cross platform

@mrT23
Copy link
Author

mrT23 commented Nov 20, 2019

I did some testing.
this happens in windows, but not in linux.

its a dataloader issue, nowhere in the datalodaer the targets are casted directly to int64

@sgugger
Copy link
Contributor

sgugger commented Nov 20, 2019

Are you sure you are using PyTorch 1.3? The type-promotion should get rid of those errors. On Linux, I can do x==y with a tensor of type Int and a tensor of type Long.

@sgugger
Copy link
Contributor

sgugger commented Nov 20, 2019

The first error should be fixed now btw.

@mrT23
Copy link
Author

mrT23 commented Nov 21, 2019

@sgugger, thanks or the feedback.

Are you sure you are using PyTorch 1.3? The type-promotion should get rid of those errors. On Linux, I can do x==y with a tensor of type Int and a tensor of type Long.

i upgraded my pytorch to 1.3.1 (requirements are currently 1.2.0).
the first problem remains.
i will pull the latest version of fastaiV2 with your commit that fixes the label smoothing,

maybe it is better to explicitly convert the targets to int64 in the collate function?
it is a common practice in some repositories, for examples
https://github.com/NVIDIA/apex/blob/master/examples/imagenet/main_amp.py

def fast_collate(batch):
imgs = [img[0] for img in batch]
targets = torch.tensor([target[1] for target in batch], dtype=torch.int64)

@sgugger
Copy link
Contributor

sgugger commented Nov 21, 2019

We don't want to automatically convert to int64 tensors for users because it takes twice the space in GPU memory and sometimes they don't need the int64.

I can't reproduce the error on windows with PyTorch 1.3.1. When asking for the accuracy between a tensor of type Int (int32) and a tensor of type Long (int64), I don't have any error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants