Trying to execute run.py in train folder renders an error #45

bhardwaj-garvit · 2023-02-01T11:39:40Z

Hi,
Really great and helpful code!
I was trying to run train.py on the nimrod-uk-1km-test data and encountered the following error, it says "RuntimeError: Serialization of parametrized modules is only supported through state_dict()." I searched on torch's website and found an earlier commit, so I downgraded torch to v1.12.0 but this did not go away.
Torch Link: pytorch/pytorch#69413

Can you guys help in debugging this issue? I am planning to use this on another dataset

**To Reproduce** Steps to reproduce the behavior: 1. installing dependencies 2. execute train/run.py and the above error shows in the terminal

jacobbieker · 2023-02-01T11:49:11Z

Hi, are you using multiple GPUs? By default the run.py tries to use 6 GPUs, although it should be changed to 1. The spectrally normalized layers in PyTorch don't seem to work in multi-GPU setting as far as I have been able to get them. So if you do change it to 1 GPU, it should start training

bhardwaj-garvit · 2023-02-05T18:35:19Z

I was earlier using cpu's, to sort the issue started using 1 gpu, but the training fills virtual memory of upto 200 GB(my system's limit) and the dataloader worker is killed. Can you suggest a way to bypass this.

Chevolier · 2024-05-02T14:01:03Z

I met the same issue, the memory keeps increasing to 256GB in the data loading process until it got killed by the system, any solution to solve this?

Chevolier · 2024-05-06T07:40:40Z

I met the same issue, the memory keeps increasing to 256GB in the data loading process until it got killed by the system, any solution to solve this?

Updates: My problem is solved by setting streaming=True in TFDataset as follows for my own dataset, by doing this, data are not first loaded into memory.

class TFDataset(torch.utils.data.dataset.Dataset):
def init(self, data_path, split):
super().init()
# self.reader = load_dataset(
# "openclimatefix/nimrod-uk-1km", "sample", split=split, streaming=True
# )
self.reader = load_dataset(data_path, split=split, streaming=True)

bhardwaj-garvit added the bug Something isn't working label Feb 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trying to execute run.py in train folder renders an error #45

Trying to execute run.py in train folder renders an error #45

bhardwaj-garvit commented Feb 1, 2023

jacobbieker commented Feb 1, 2023

bhardwaj-garvit commented Feb 5, 2023

Chevolier commented May 2, 2024

Chevolier commented May 6, 2024

Trying to execute run.py in train folder renders an error #45

Trying to execute run.py in train folder renders an error #45

Comments

bhardwaj-garvit commented Feb 1, 2023

jacobbieker commented Feb 1, 2023

bhardwaj-garvit commented Feb 5, 2023

Chevolier commented May 2, 2024

Chevolier commented May 6, 2024