Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

could you provide the training script for the other datasets of pwcnet? #50

Open
minghuiwsw opened this issue Apr 26, 2023 · 11 comments
Open

Comments

@minghuiwsw
Copy link

hi, @hurjunhwa , I noticed you provide the training script for the baseline: pwcnet. But I wonder if it is for flycharis_occ dataset only or it's applicable to other datasets. I mean, if I want to train pwcnet for other datasets to achieve the expected performance in paper, do I need to change the hypermeters in the config file of pwcnet.sh? If yes, can you provide me with those files? Thanks a lot!

@hurjunhwa
Copy link
Collaborator

Yes, it's possible to train on other datasets.

Please specify the directory of the custom dataset here as well as its name.
Write a custom dataset file here and define the name of the dataset here.

Maybe it's easier to look at the existing example and start from there.

@minghuiwsw
Copy link
Author

Thanks a lot! Actually I tried exactly as you said yesterday, and the training is on going. Besides, I have another question. If I want to use another gpu for training instead of the default cuda0, what should I do? I tried to add --cuda 5 in the command line but it did not work. And what should I do if I want to use multiple gpus for training? I noticed you annotated some codes in main.py line 47 are they for that?

@minghuiwsw
Copy link
Author

irr/main.py

Line 47 in dacd07b

# # Multi-GPU automation
it is here

@hurjunhwa
Copy link
Collaborator

Yes, you could uncomment those lines

irr/main.py

Lines 47 to 53 in dacd07b

# # Multi-GPU automation
# with logger.LoggingBlock("Multi GPU", emph=True):
# if torch.cuda.device_count() > 1:
# logging.info("Let's use %d GPUs!" % torch.cuda.device_count())
# model_and_loss._model = torch.nn.DataParallel(model_and_loss._model)
# else:
# logging.info("Let's use %d GPU!" % torch.cuda.device_count())

and run the script with CUDA_VISIBLE_DEVICES.
If you would like to use 4 GPUs in your machine, the command would be:
CUDA_VISIBLE_DEVICES=0, 1, 2, 3 IRR-FlowNet_flyingChairsOcc.sh

@minghuiwsw
Copy link
Author

Thanks!
But I have another two questions.
1.Currently I am reproducing your pwc-irr network and training it with Sintel dataset. There is result for Sintel dataset in your paper. But in your provided code I find that there is only training script for flow_occ_v5. Considering that in other model like IRR-PWC, the training script of Sintel is two stages which is different from the script of flychairs, I wonder what is the training script for pwc-irr Sintel?
2.The training is two-stage in IRR-PWC Sintel training script. But I find that in the second stage, you did not use the checkpoint got from the first stage, but still use the original checkpoint, why? In my opinion, the second stage is finetuning based on the first stage's result, so the checkpoint in the second stage should inherit from the first stage.
Thanks again!

@minghuiwsw
Copy link
Author

Hi, Jun @hurjunhwa
I tried to train on Sintel with pwc-irr using the pwc-irr.sh script and I only change the items relative to dataset according to without any change to the training strategy. Finally for training I get the best_epe_avg of 5.5894 which I think is not good enough compared to the result in your paper. What's wrong with it? The training strategy or training&validation dataset?
python ../main.py
--batch_size=$SIZE_OF_BATCH
--batch_size_val=$SIZE_OF_BATCH
--checkpoint=$CHECKPOINT
--lr_scheduler=MultiStepLR
--lr_scheduler_gamma=0.5
--lr_scheduler_milestones="[108, 144, 180]"
--model=$MODEL
--num_workers=4
--optimizer=Adam
--optimizer_lr=1e-4
--optimizer_weight_decay=4e-4
--save=$SAVE_PATH
--total_epochs=216
--training_augmentation=RandomAffineFlowOccSintel
--training_augmentation_crop="[384,768]"
--training_dataset=SintelTrainingCombFull
--training_dataset_photometric_augmentations=True
--training_dataset_root=$SINTEL_HOME
--training_key=total_loss
--training_loss=$EVAL_LOSS
--validation_dataset=SintelTrainingCombValid
--validation_dataset_photometric_augmentations=False
--validation_dataset_root=$SINTEL_HOME
--validation_key=epe
--validation_loss=$EVAL_LOSS

@hurjunhwa
Copy link
Collaborator

Hi,

In the paper, we first train the model on the FlyingChairsOcc dataset from the scratch. This is a pretraining step.

Then we finetune the model on Sintel or KITTI. This finetuning step consists of two steps: (1) train the model on train & valid split to figure out the number of iteration steps for finetuning and (2) train the model using the all images for the number of iteration steps found at (1).

Did you first pretrain the model on FlyingChairsOcc? or did you train the model on Sintel from the scratch?

@minghuiwsw
Copy link
Author

yes, I trained on Sintel from the scratch at the betginning. And these days after I first trained on flychairs and finetuned on Sintel, it performs better. So I think it is the reason. But I have another two questions:
1.How did you get the checkpoint_best.ckpt in the foler /saved_check_point/PWCNet-irr? Currently I am using your PWCNet-irr rather than IRR-PWC, but I found there is only one ckpt. Is this ckpt the best one for flychairs or Sintel or all datasets? And how did you train the model to get it?
2.I understand your two-step training strategy for Sintel, but why didn't you train the model using the all images directly? Since the model will find the best ckpt and save it as long as you set the iteration steps large enough.
I am a newcomer to this field and my oponion is possibly wrong. So I am very glad that you are so patient to reply to me so many times. Thanks a lot!

@hurjunhwa
Copy link
Collaborator

Oh, actually the full training pipeline was FlyingChairs -> FlyingThings3D -> and then Sintel or KITTI finetuning.

I think the PWCNet-irr checkpoint is trained on FlyingChairs only. But probably you could doublecheck by running an inference on Sintel/KITTI and comparing the numbers in the paper

If training all images directly, it's hard to know when the model overfits because it always minimizes the loss. So the first stage is about finding a stopping point where the validation EPE is the lowest.

@minghuiwsw
Copy link
Author

Thanks for your timely reply! I got it.
Sorry to bother you again but I have another two questions.
1.I saw one difference between pwcnet.py and pwcnet_irr.py is that pwcnet_irr.py added two rescale_flow operations. So I wonder why do we need to rescale the flow before and after the flow estimation module.
2.How did you test the model on test dataset like SintelTestClean? I did this according to the way to do validation which means I only changed dataset name in .sh file, but it did not work and it seems that there is no target flow to do comparison for the test dataset. Maybe I need to save the inference result and submit it to the Sintel official website? Can you show me the detailed process to do this?
Waiting for your reply
Best

@minghuiwsw
Copy link
Author

I did some experiments. When I delete the two rescale_flow operations, the training can still converge. But if I reduce the search range from 4 to 2 additionly, the training can not converge. Specifically, the training epe is nearly the same after 40 epoches( the progress right now, still on going). Is the rescale operation related to search range?
The recalce operation I mentioned is this:
flow = rescale_flow(flow, self._div_flow, width_im, height_im, to_local=True)
Best

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants