Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revisit sweep feature #41

Open
lassepe opened this issue Aug 10, 2024 · 6 comments
Open

Revisit sweep feature #41

lassepe opened this issue Aug 10, 2024 · 6 comments
Labels
good first issue Good for newcomers help wanted Extra attention is needed

Comments

@lassepe
Copy link
Contributor

lassepe commented Aug 10, 2024

It may be worth re-evaluating the sweep feature. My understanding is that it was previously dropped due to segfaults but those seem likely to be fixed with #40

@avik-pal avik-pal added help wanted Extra attention is needed good first issue Good for newcomers labels Aug 13, 2024
@RomeoV
Copy link

RomeoV commented Jan 16, 2025

I'm also interested in this. Can you elaborate what you mean by "revisit"? Is there prior work or an overview what would need to get done?

@avik-pal
Copy link
Owner

It would entail checking if Wandb.wandb.sweep(...) works without segfaulting julia and if it does remove https://github.com/avik-pal/Wandb.jl/blob/main/src/sweep.jl with direct API calls.

@RomeoV
Copy link

RomeoV commented Jan 16, 2025

There still seems to be problems with segfaults, although occasionally I am able to make it work (but not with running any complex code).

My basic setup is:

using Wandb, Wandb.PythonCall, Logging

cfg = @pyeval `
       {
           "name": "sweepdemo",
           "method": "grid",
           "metric": {"goal": "minimize", "name": "validation_loss"},
           "parameters": {
               "batch_size": {"values": [16, 32, 64]},
               "epochs": {"values": [5, 10, 15]},
           },
       }
`

function foo()
    # do nothing for now
 end

sweep_id = Wandb.wandb.sweep(cfg, project="Wandb.jl")

Wandb.wandb.agent(sweep_id, foo)

This worked once for me (but foo isn't doing anything), but after changing foo to try to include logging, it doesn't work anymore, and then even going back to this doesn't work anymore either. Probably some connection that is not cleaned up or something.

For now I will stick to the example in https://avik-pal.github.io/Wandb.jl/v0.5.6/examples/hparams.

@RomeoV
Copy link

RomeoV commented Jan 16, 2025

In general though, I think something like this could work:

function foo()
    run = Wandb.wandb.run
    lg = WandbLogger(run, 1, 0, Info)
    global_logger(lg)
    @info "metrics" validation_loss=1
    close(lg)
end

@lassepe
Copy link
Contributor Author

lassepe commented Jan 16, 2025

It would be worth filing an issue with PythonCall for any remaining segfaults

@RomeoV
Copy link

RomeoV commented Jan 16, 2025

Yeah. To shine a bit more light, it seems that calling Wandb.wandb.agent(sweep_id, function) ultimately get's passed within python to multiprocessing.Process here.

So internally python is spawning a new process, which is then again calling the Julia function, where we try to access properties of the process such as logging through the run. It's quite a convoluted setup, so I'm not sure if PythonCall is really to blame here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants