Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem when using NCPNormalOutputwith mini-batches #415

Open
krasserm opened this issue Sep 27, 2020 · 0 comments
Open

Problem when using NCPNormalOutputwith mini-batches #415

krasserm opened this issue Sep 27, 2020 · 0 comments

Comments

@krasserm
Copy link

krasserm commented Sep 27, 2020

Environment

I installed Edward2 with tf-nightly as dependency

pip install edward2[tf-nightly]@"git+https://github.com/google/edward2.git#egg=edward2"

which set up the following dependencies:

tensorflow==2.4.0.dev20200926
tensorflow-probability==0.12.0.dev20200926
edward2==0.0.2
...

Python version is 3.7.9.

Problem

I tried to run the NCP example that is part of the documentation in noise.py (with minor additions to get a runnable program):

import edward2 as ed
import tensorflow as tf

batch_size, dataset_size = 128, 1000

# some random data
features = tf.random.normal((dataset_size, 25))
labels = tf.random.normal((dataset_size, 1))

inputs = tf.keras.layers.Input(shape=(25,))
x = ed.layers.NCPNormalPerturb()(inputs)  # double input batch
x = tf.keras.layers.Dense(64, activation='relu')(x)
x = tf.keras.layers.Dense(64, activation='relu')(x)
means = ed.layers.DenseVariationalDropout(1, activation=None)(x)  # get mean
means = ed.layers.NCPNormalOutput(labels)(means)  # halve input batch
stddevs = tf.keras.layers.Dense(1, activation='softplus')(x[:batch_size])
outputs = tf.keras.layers.Lambda(lambda x: ed.Normal(x[0], x[1]))([means, stddevs])
model = tf.keras.Model(inputs=inputs, outputs=outputs)

optimizer = tf.optimizers.Adam(learning_rate=1e-3)

# Run training loop.
num_steps = 1000
for _ in range(num_steps):
    with tf.GradientTape() as tape:
        predictions = model(features)
        loss = -tf.reduce_mean(predictions.distribution.log_prob(labels))
        loss += model.losses[0] / dataset_size  # KL regularizer for output layer
        loss += model.losses[-1]

    trainable_vars = model.trainable_variables
    gradients = tape.gradient(loss, trainable_vars)
    optimizer.apply_gradients(zip(gradients, trainable_vars))

and ran into:

ValueError: Arguments `loc` and `scale` must have compatible shapes; loc.shape=(1000, 1), scale.shape=(128, 1).

That's clear because the training loop runs full-batch updates and stddevs = tf.keras.layers.Dense(1, activation='softplus')(x[:batch_size]) only uses the first batch_size elements. But that's not related to my main question. Changing the training loop to use mini-batches

...
ds = tf.data.Dataset.from_tensor_slices((features, labels)).batch(batch_size)

# Run training loop.
num_steps = 1000
for i in range(num_steps):
    print(i)
    for features_batch, labels_batch in ds:
        with tf.GradientTape() as tape:
            predictions = model(features_batch)
            loss = tf.reduce_mean(predictions.distribution.log_prob(labels_batch))
            ...

fixes the above problem but introduces a new problem in NCPNormalOutput:

tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [1000,1] vs. [128,1] [Op:SquaredDifference]

i.e. the shape of labels passed as constructor argument to NCPNormalOutput is incompatible with the shape of the mini-batch. Does NCPNormalOutput (when centering at the labels) not support mini-batch updates at the moment?

Semi-related: layers NCPNormalPerturb and NCPNormalOutput are only needed during training. At test/prediction
time they seem to have no influence on the result. So why is NCP-related functionality designed as layers at all? Shouldn't this be a concern of the loss function only? Edit: Ok, I see that NCPNormalOutput creates a distribution from its input and samples from that distribution, hence has an influence on the result. Nevertheless, this behavior doesn't seem to be related to NCPs, so my previous question remains.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant