Skip to content

Multiscale Generation for Beginners

ProGamerGov edited this page Nov 22, 2019 · 9 revisions

Multiscale Generation

When starting off, you want to maximize quality by using the L-BFGS optimizer and a non-pruned Visual Geometry Group (VGG) model. After the image size gets too large and runs out of memory, then you start using Adam. After that runs out of memory, then you transition to the channel pruning/NIN models with Adam.

If you want to repeat the multiscale resolution script a few times to enhance output image quality, then you go back to step 1 when L-BFGS and your non-pruned VGG model run out of memory. This helps maximize output image quality.


The Basics

For a simple example, this is what you are essentially doing with multires:

python3 neural_style.py -output_image out1.png -image_size 512

python3 neural_style.py -output_image out2.png -init image -init_image out1.png -image_size 720

python3 neural_style.py -output_image out3.png -init image -init_image out2.png -image_size 1024

python3 neural_style.py -output_image out4.png -init image -init_image out3.png -image_size 1536

Basically the closer the image size to the training data image size used to train a model, the more change that will occur in the output image. If you start at the maximum possible image size, then it's going to look like a weak filter. So, you start closer to the size of the images used to train the model (ex: 224, 512), then you slowly make the image size larger, so that details can be properly formed. By the time you hit the maximum image size that you do, things are only changing on a smaller level (relative to the rest of the image) and you may have to zoom in on an image to see the difference. You can also play around with how many steps you use, and what image size each step has.

  • You can find the other models that show up in others multires scripts for neural-style, converted to PyTorch for neural-style-pt here.

So, multires is essentially running a style transfer script repeatedly to increment the output image to the desired size.


Avoiding Out of Memory Errors

For this example, imagine that you run out of memory with -image_size 1024. To resolve this you change the optimizer to Adam, and this lets you use the 1024px image size. But then you are back to running out of memory again with -image_size 1536. To resolve this a second time, you change the model to a VGG16 model that requires less resources. However, you still want to get to 1920px, so you eventually end up using the NIN model because it uses the least resources. It's also important to note that you have to specify layers when using NIN, because it's layer names are different. You can find a list of all NIN's layers here. Note that you may have to play around with other parameters like content and style weights when changing the model or optimizer.

python3 neural_style.py -output_image out1.png -image_size 512

python3 neural_style.py -output_image out2.png -init image -init_image out1.png -image_size 720

python3 neural_style.py -output_image out3.png -init image -init_image out2.png -image_size 1024 -optimizer adam

python3 neural_style.py -output_image out4.png -init image -init_image out3.png -image_size 1536 -optimizer adam -model_file models/vgg16-00b39a1b.pth

python3 neural_style.py -output_image out5.png -init image -init_image out4.png -image_size 1920 -optimizer adam -model_file models/nin_imagenet.pth -content_layers relu0,relu3,relu7,relu12 -style_layers relu0,relu3,relu7,relu12

When you run the code yourself, the points where you run out of memory may be different.

For the above example, instead of the NIN model, you could instead use the pruned VGG-16 model. The prunned VGG-16 uses less memory than the other VGG-16 models, and seems have an output quality a bit higher than the NIN model (though quality is still lower than the other non pruned models). The pruned VGG-16 model also uses the same layers as a standard VGG-16 model.

python3 neural_style.py -output_image out5.png -init image -init_image out4.png -image_size 1920 -optimizer adam -model_file models/channel_pruning.pth


Multiple Runs

Sometimes, I like to run my multires scripts multiple times like this, because I find it makes the results look better:

python3 neural_style.py -output_image out1.png -image_size 512

python3 neural_style.py -output_image out2.png -init image -init_image out1.png -image_size 720

python3 neural_style.py -output_image out3.png -init image -init_image out2.png -image_size 1024

python3 neural_style.py -output_image out4.png -init image -init_image out3.png -image_size 1536

Then followed by:

python3 neural_style.py -output_image out1.png -init image -init_image out4.png -image_size 512

python3 neural_style.py -output_image out2.png -init image -init_image out1.png -image_size 720

python3 neural_style.py -output_image out3.png -init image -init_image out2.png -image_size 1024

python3 neural_style.py -output_image out4.png -init image -init_image out3.png -image_size 1536

The second, third, forth, etc... runs use the previous run's output image as the initialization image for step 1. The most amount of times I've run an output image through a multires script, was 7 times. This seems to help make smaller details look better and better resemble the smaller details from the style image.



Histogram Matching

It was also discovered that matching input image histograms, improve style transfer results. You can find histogram matching scripts here: https://github.com/ProGamerGov/Neural-Tools

On Linux you can download the histogram matching scripts via:

wget -c https://raw.githubusercontent.com/ProGamerGov/Neural-Tools/master/linear-color-transfer.py

wget -c https://raw.githubusercontent.com/ProGamerGov/Neural-Tools/master/lum-transfer.py

You can do histogram matching after every style transfer script like this:

python3 neural_style.py -output_image out1.png -image_size 512

python linear-color-transfer.py --target_image out1.png --source_image style_image.png --output_image out1_hist.png

python3 neural_style.py -output_image out2.png -init image -init_image out1_hist.png -image_size 720

python linear-color-transfer.py --target_image out2.png --source_image style_image.png --output_image out2_hist.png

python3 neural_style.py -output_image out3.png -init image -init_image out2_hist.png -image_size 1024

python linear-color-transfer.py --target_image out3.png --source_image style_image.png --output_image out3_hist.png

python3 neural_style.py -output_image out4.png -init image -init_image out3_hist.png -image_size 1536

And you can also do histogram matching on your input images like this:

python linear-color-transfer.py --target_image content_image.jpg --source_image style_image.png --output_image content_hist.png

python3 neural_style.py -content_image content_hist.png -style_image style_image.png -output_image out1.png -image_size 512

python3 neural_style.py -content_image content_hist.png -style_image style_image.png -output_image out2.png -init image -init_image out1.png -image_size 720

python3 neural_style.py -content_image content_hist.png -style_image style_image.png -output_image out3.png -init image -init_image out2.png -image_size 1024

python3 neural_style.py -content_image content_hist.png -style_image style_image.png -output_image out4.png -init image -init_image out3.png -image_size 1536

You can also combine the two ways of histogram matching, like this:

python linear-color-transfer.py --target_image content_image.jpg --source_image style_image.png --output_image content_hist.png

python3 neural_style.py -content_image content_hist.png -style_image style_image.png -output_image out1.png -image_size 512

python linear-color-transfer.py --target_image out1.png --source_image style_image.png --output_image out1_hist.png

python3 neural_style.py -content_image content_hist.png -style_image style_image.png -output_image out2.png -init image -init_image out1_hist.png -image_size 720

python linear-color-transfer.py --target_image out2.png --source_image style_image.png --output_image out2_hist.png

python3 neural_style.py -content_image content_hist.png -style_image style_image.png -output_image out3.png -init image -init_image out2_hist.png -image_size 1024

Note that the above examples are meant to be simple so that individuals can see how things work. They are missing important parameters like -backend cudnn and -cudnn_autotune to lower GPU memory usage and speed things up.


Other Examples

You find other examples of multiscale generation scripts at these links: