Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Postprocessing method reproduction issue #266

Open
markplagge opened this issue Dec 10, 2024 · 1 comment
Open

Postprocessing method reproduction issue #266

markplagge opened this issue Dec 10, 2024 · 1 comment

Comments

@markplagge
Copy link

Hi,
I have been trying to reproduce the results of the paper using the OpenOOD scripts and saved checkpoints, and have run into a strange behavior. I can't seem to reproduce the post-processing OOD results (FPR@95, AUROC, etc.) reliably.

I am testing the ResNets, and did the following steps:

  1. Clone OpenOOD
  2. Set up python env.
  3. run download script:
python scripts/download/download.py --contents 'datasets' 'checkpoints' \
	--datasets 'ood_v1.5' \
	--checkpoints 'ood_v1.5' \
	--save_dir './data' './results' \
	--dataset_mode 'benchmark' ```
4. Run a benchmark (in this case ResNet18 OOD with ash)
```bash
python main.py --config ./configs/configs/datasets/cifar10/cifar10.yml \
    configs/datasets/cifar10/cifar10_ood.yml \
    configs/networks/resnet18_32x32.yml \
    configs/pipelines/test/test_ood.yml \
    configs/preprocessors/base_preprocessor.yml \
    configs/postprocessors/ash.yml \
    --num_workers 8 \
    --network.checkpoint 'results/cifar10_resnet18_32x32_base_e100_lr0.1_default/s0/best.ckpt' \
    --mark 1
  1. Repeat the process with all three checkpoints.
    After running this, I get values that are much smaller than the reported values:
    For example, on the farood metric, this run reports:
FPR@95 AUROC AUPR_IN AUPR_OUT ACC
40.41 91.80 79.26 94.30 95.22
35.82 91.98 81.84 94.43 94.63
48.16 89.98 71.78 94.25 95.32

Interestingly, if I follow the same commands but add --seed n to the arguments (n being the seed in the saved checkpoints), the values become closer to the reported ones.

Any ideas as to my mistake or what is happening?

@zjysteven
Copy link
Collaborator

zjysteven commented Dec 11, 2024

In general, a few methods (with random components in their design) indeed can be sensitive to random seeds. But this shouldn't be the case for ASH if I remember correctly.

Also, it's interesting that the results you have shown here are actually a lot higher (rather than lower) than what we report for ASH on CIFAR-10. For example, see from this full table, the farood AUROC (averaged over three checkpoints) is only 78.49.

I cannot think of a cause for this. Would you mind trying the new evaluation surface which is the eval_ood.py to see if you can reproduce the results? Nearly all numbers reported in OpenOOD v1.5 are obtained by running that py file instead of the old interface (python main.py --config ...)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants