Revisit sweep feature #41

lassepe · 2024-08-10T13:51:25Z

It may be worth re-evaluating the sweep feature. My understanding is that it was previously dropped due to segfaults but those seem likely to be fixed with #40

RomeoV · 2025-01-16T14:33:09Z

I'm also interested in this. Can you elaborate what you mean by "revisit"? Is there prior work or an overview what would need to get done?

avik-pal · 2025-01-16T15:12:05Z

It would entail checking if Wandb.wandb.sweep(...) works without segfaulting julia and if it does remove https://github.com/avik-pal/Wandb.jl/blob/main/src/sweep.jl with direct API calls.

RomeoV · 2025-01-16T21:39:47Z

There still seems to be problems with segfaults, although occasionally I am able to make it work (but not with running any complex code).

My basic setup is:

using Wandb, Wandb.PythonCall, Logging

cfg = @pyeval `
       {
           "name": "sweepdemo",
           "method": "grid",
           "metric": {"goal": "minimize", "name": "validation_loss"},
           "parameters": {
               "batch_size": {"values": [16, 32, 64]},
               "epochs": {"values": [5, 10, 15]},
           },
       }
`

function foo()
    # do nothing for now
 end

sweep_id = Wandb.wandb.sweep(cfg, project="Wandb.jl")

Wandb.wandb.agent(sweep_id, foo)

This worked once for me (but foo isn't doing anything), but after changing foo to try to include logging, it doesn't work anymore, and then even going back to this doesn't work anymore either. Probably some connection that is not cleaned up or something.

For now I will stick to the example in https://avik-pal.github.io/Wandb.jl/v0.5.6/examples/hparams.

RomeoV · 2025-01-16T21:41:19Z

In general though, I think something like this could work:

function foo()
    run = Wandb.wandb.run
    lg = WandbLogger(run, 1, 0, Info)
    global_logger(lg)
    @info "metrics" validation_loss=1
    close(lg)
end

lassepe · 2025-01-16T21:57:07Z

It would be worth filing an issue with PythonCall for any remaining segfaults

RomeoV · 2025-01-16T22:02:08Z

Yeah. To shine a bit more light, it seems that calling Wandb.wandb.agent(sweep_id, function) ultimately get's passed within python to multiprocessing.Process here.

So internally python is spawning a new process, which is then again calling the Julia function, where we try to access properties of the process such as logging through the run. It's quite a convoluted setup, so I'm not sure if PythonCall is really to blame here.

avik-pal added help wanted Extra attention is needed good first issue Good for newcomers labels Aug 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revisit sweep feature #41

Revisit sweep feature #41

lassepe commented Aug 10, 2024

RomeoV commented Jan 16, 2025

avik-pal commented Jan 16, 2025

RomeoV commented Jan 16, 2025 •

edited

Loading

RomeoV commented Jan 16, 2025 •

edited

Loading

lassepe commented Jan 16, 2025

RomeoV commented Jan 16, 2025 •

edited

Loading

Revisit sweep feature #41

Revisit sweep feature #41

Comments

lassepe commented Aug 10, 2024

RomeoV commented Jan 16, 2025

avik-pal commented Jan 16, 2025

RomeoV commented Jan 16, 2025 • edited Loading

RomeoV commented Jan 16, 2025 • edited Loading

lassepe commented Jan 16, 2025

RomeoV commented Jan 16, 2025 • edited Loading

RomeoV commented Jan 16, 2025 •

edited

Loading

RomeoV commented Jan 16, 2025 •

edited

Loading

RomeoV commented Jan 16, 2025 •

edited

Loading