You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
I have been trying to reproduce the results of the paper using the OpenOOD scripts and saved checkpoints, and have run into a strange behavior. I can't seem to reproduce the post-processing OOD results (FPR@95, AUROC, etc.) reliably.
I am testing the ResNets, and did the following steps:
Clone OpenOOD
Set up python env.
run download script:
python scripts/download/download.py --contents 'datasets''checkpoints' \
--datasets 'ood_v1.5' \
--checkpoints 'ood_v1.5' \
--save_dir './data''./results' \
--dataset_mode 'benchmark'```4. Run a benchmark (in this case ResNet18 OOD with ash)```bashpython main.py --config ./configs/configs/datasets/cifar10/cifar10.yml \ configs/datasets/cifar10/cifar10_ood.yml \ configs/networks/resnet18_32x32.yml \ configs/pipelines/test/test_ood.yml \ configs/preprocessors/base_preprocessor.yml \ configs/postprocessors/ash.yml \ --num_workers 8 \ --network.checkpoint 'results/cifar10_resnet18_32x32_base_e100_lr0.1_default/s0/best.ckpt' \ --mark 1
Repeat the process with all three checkpoints.
After running this, I get values that are much smaller than the reported values:
For example, on the farood metric, this run reports:
FPR@95
AUROC
AUPR_IN
AUPR_OUT
ACC
40.41
91.80
79.26
94.30
95.22
35.82
91.98
81.84
94.43
94.63
48.16
89.98
71.78
94.25
95.32
Interestingly, if I follow the same commands but add --seed n to the arguments (n being the seed in the saved checkpoints), the values become closer to the reported ones.
Any ideas as to my mistake or what is happening?
The text was updated successfully, but these errors were encountered:
In general, a few methods (with random components in their design) indeed can be sensitive to random seeds. But this shouldn't be the case for ASH if I remember correctly.
Also, it's interesting that the results you have shown here are actually a lot higher (rather than lower) than what we report for ASH on CIFAR-10. For example, see from this full table, the farood AUROC (averaged over three checkpoints) is only 78.49.
I cannot think of a cause for this. Would you mind trying the new evaluation surface which is the eval_ood.py to see if you can reproduce the results? Nearly all numbers reported in OpenOOD v1.5 are obtained by running that py file instead of the old interface (python main.py --config ...)
Hi,
I have been trying to reproduce the results of the paper using the OpenOOD scripts and saved checkpoints, and have run into a strange behavior. I can't seem to reproduce the post-processing OOD results (FPR@95, AUROC, etc.) reliably.
I am testing the ResNets, and did the following steps:
After running this, I get values that are much smaller than the reported values:
For example, on the farood metric, this run reports:
Interestingly, if I follow the same commands but add --seed n to the arguments (n being the seed in the saved checkpoints), the values become closer to the reported ones.
Any ideas as to my mistake or what is happening?
The text was updated successfully, but these errors were encountered: