PytorchDatasetCaching

This repository provides a wrapper for any pytorch dataset that allows caching sample augmentations. This speeds up training when augmentations are compute intensive -such as 3D Images or extensive augmentations.

What it does: For each sample, it will apply and save a settable number of transformation to a cache directory. After the number has been reached, it will radomly load one of the saved samples, without applying any other transformations. This would limit the size of your dataset, but if you set a large enough number of transformations than this should not be a problem.

How to use:

Clone this repository into your project

git clone https://github.com/hhroberthdaniel/PytorchDatasetCaching.git

or just copy the code from cache_dataset.py

trainset = None # set this to your own Pytorch Dataset
cached_trainset = CacheDataset(trainset, augmentations_per_sample=AUGMENTATIONS_PER_SAMPLE, cache_dir="./tmp")
trainloader = torch.utils.data.DataLoader(cached_trainset, batch_size=batch_size, shuffle=True, num_workers=NUM_WORKERS)

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitattributes		.gitattributes
README.md		README.md
cache_dataset.py		cache_dataset.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PytorchDatasetCaching

About

Releases

Packages

Languages

hhroberthdaniel/PytorchDatasetCaching

Folders and files

Latest commit

History

Repository files navigation

PytorchDatasetCaching

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages