You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
Really great and helpful code!
I was trying to run train.py on the nimrod-uk-1km-test data and encountered the following error, it says "RuntimeError: Serialization of parametrized modules is only supported through state_dict()." I searched on torch's website and found an earlier commit, so I downgraded torch to v1.12.0 but this did not go away.
Torch Link: pytorch/pytorch#69413
Can you guys help in debugging this issue? I am planning to use this on another dataset
**To Reproduce**
Steps to reproduce the behavior:
1. installing dependencies
2. execute train/run.py
and the above error shows in the terminal
The text was updated successfully, but these errors were encountered:
Hi, are you using multiple GPUs? By default the run.py tries to use 6 GPUs, although it should be changed to 1. The spectrally normalized layers in PyTorch don't seem to work in multi-GPU setting as far as I have been able to get them. So if you do change it to 1 GPU, it should start training
I was earlier using cpu's, to sort the issue started using 1 gpu, but the training fills virtual memory of upto 200 GB(my system's limit) and the dataloader worker is killed. Can you suggest a way to bypass this.
I met the same issue, the memory keeps increasing to 256GB in the data loading process until it got killed by the system, any solution to solve this?
Updates: My problem is solved by setting streaming=True in TFDataset as follows for my own dataset, by doing this, data are not first loaded into memory.
Hi,
Really great and helpful code!
I was trying to run train.py on the nimrod-uk-1km-test data and encountered the following error, it says "RuntimeError: Serialization of parametrized modules is only supported through state_dict()." I searched on torch's website and found an earlier commit, so I downgraded torch to v1.12.0 but this did not go away.
Torch Link: pytorch/pytorch#69413
Can you guys help in debugging this issue? I am planning to use this on another dataset
**To Reproduce** Steps to reproduce the behavior: 1. installing dependencies 2. execute train/run.py and the above error shows in the terminal
The text was updated successfully, but these errors were encountered: