Stochastic gradient estimate + grad_and_val? #1077

gil2rok · 2024-09-30T17:27:10Z

gil2rok
Sep 30, 2024

I'm interested in trying to compute a stochastic gradient estimate with one of Optax's optimizers (and potentially adding control variates). However, I want access to both the gradient estimates AND the evaluated loss function, similar to jax.value_and_grads(). Is there a recommended way to do this?

I saw that under the hood some stochastic gradient estimate methods use jax.jcbfwd. However, as far as I can tell, jax.jcbfwd does not give access to the evaluations of the loss function either.

Any help would be immensely appreciated!

Answered by fabianp

Oct 1, 2024

I don't think there's currently a way to do that using the methods from optax.monte_carlo. As a side note, note that we're deprecating that module (#1076)

View full answer

vroulet · 2024-09-30T17:30:58Z

vroulet
Sep 30, 2024
Maintainer

Hello @gil2rok
Do you mean letting a gradient transform access additional arguments, like the function value?
If yes, you can create your own gradient-transformation that can access additional information on the function as done for polyak_sgd. If that's not what you want, can you detail a bit more what you have in mind? (maybe some pseudocode would help)

0 replies

gil2rok · 2024-09-30T18:05:10Z

gil2rok
Sep 30, 2024
Author

Thanks so much for the fast answer! With vanilla jax we can get the loss and gradient:

# mean squared error loss
loss_fn = lambda y_true, y_pred: jnp.mean((y_true - y_pred) ** 2)
loss_val, grads = jax.value_and_grad(loss_fn)(model(X) - y)

But I have a stochastic loss function (used in variational inference):

true_dist = dist_builder(true_params) # true dist with intractable sample method
def neg_elbo(approx_params):
    approx_dist = dist_builder(approx_params)
    sample = approx_dist.sample(key) # generates a single sample
    log_q = approx_dist.logdensity(sample)
    log_p = true_dist.logdensity(sample)
    return log_p - log_q

I can compute the gradient with Monte Carlo estimates, e.g. using pathwise Jacobians:

# contains expectation w.r.t. approx_params
loss_fn = neg_elbo(approx_params)
jacobians = optax.monte_carlo.pathwise_jacobians(
    function=loss_fn,
    params=approx_params,
    dist_builder=dist_builder,
    rng=jax.random.key(0),
    num_samples=100
)
grads = jnp.mean(jacobians, axis=0)

But how do I get the loss value that generated this Jacobian/grads? Is this what Pollak SGD does? Also how would I add control variates to this -- I couldn't find any examples in the doc?

4 replies

fabianp Oct 1, 2024
Maintainer

I don't think there's currently a way to do that using the methods from optax.monte_carlo. As a side note, note that we're deprecating that module (#1076)

Answer selected by gil2rok

gil2rok Oct 1, 2024
Author

Got it! Thanks for the clarification.

Out of curiosity, why are you depreciating optax.monte_carlo? Not enough ppl using it?

vroulet Oct 1, 2024
Maintainer

Yes, not enough adoption, and even more importantly, the original contributors cannot maintain it anymore.
So we end up with the problem you got: we do not know how this module should be used.
If we had a notebook explaining the usage of this module, we would keep it since users would know how to use it and maintainers could debug it.

If someone was willing to make usage docs (i.e. notebook) for this module, we could keep it. But it's a burden that we cannot undertake yet and we are not sure some external contributors are willing to do that given the lack of adoption (unless you want to do it ^^ but again it could be a burden).

gil2rok Oct 1, 2024
Author

Understood -- that's a tricky situation to be in. Unfortunately, I do not have time to maintain the monte_carlo module.

If I do end up using it, I would definitely consider making a quick notebook. However, at the moment, I don't think I will be using it.

gil2rok · 2024-10-02T02:19:43Z

gil2rok
Oct 2, 2024
Author

If anyone else needs is curious about alternatives, since optax.monte_carlo will be depreciated, take a look at TensorFlow probability: https://www.tensorflow.org/probability/api_docs/python/tfp/vi/GradientEstimators.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stochastic gradient estimate + grad_and_val? #1077

{{title}}

Replies: 3 comments 4 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Stochastic gradient estimate + grad_and_val? #1077

gil2rok Sep 30, 2024

Replies: 3 comments · 4 replies

vroulet Sep 30, 2024 Maintainer

gil2rok Sep 30, 2024 Author

fabianp Oct 1, 2024 Maintainer

gil2rok Oct 1, 2024 Author

vroulet Oct 1, 2024 Maintainer

gil2rok Oct 1, 2024 Author

gil2rok Oct 2, 2024 Author

gil2rok
Sep 30, 2024

Replies: 3 comments 4 replies

vroulet
Sep 30, 2024
Maintainer

gil2rok
Sep 30, 2024
Author

fabianp Oct 1, 2024
Maintainer

gil2rok Oct 1, 2024
Author

vroulet Oct 1, 2024
Maintainer

gil2rok Oct 1, 2024
Author

gil2rok
Oct 2, 2024
Author