could you provide the training script for the other datasets of pwcnet? #50

minghuiwsw · 2023-04-26T12:47:20Z

hi, @hurjunhwa , I noticed you provide the training script for the baseline: pwcnet. But I wonder if it is for flycharis_occ dataset only or it's applicable to other datasets. I mean, if I want to train pwcnet for other datasets to achieve the expected performance in paper, do I need to change the hypermeters in the config file of pwcnet.sh? If yes, can you provide me with those files? Thanks a lot!

hurjunhwa · 2023-04-27T01:09:45Z

Yes, it's possible to train on other datasets.

Please specify the directory of the custom dataset here as well as its name.
Write a custom dataset file here and define the name of the dataset here.

Maybe it's easier to look at the existing example and start from there.

minghuiwsw · 2023-04-27T01:27:49Z

Thanks a lot! Actually I tried exactly as you said yesterday, and the training is on going. Besides, I have another question. If I want to use another gpu for training instead of the default cuda0, what should I do? I tried to add --cuda 5 in the command line but it did not work. And what should I do if I want to use multiple gpus for training? I noticed you annotated some codes in main.py line 47 are they for that?

minghuiwsw · 2023-04-27T01:30:15Z

irr/main.py

Line 47 in dacd07b

# # Multi-GPU automation

it is here

hurjunhwa · 2023-04-30T18:04:09Z

Yes, you could uncomment those lines

irr/main.py

Lines 47 to 53 in dacd07b

    
           # # Multi-GPU automation     
        
           # with logger.LoggingBlock("Multi GPU", emph=True): 
        
           #     if torch.cuda.device_count() > 1: 
        
           #         logging.info("Let's use %d GPUs!" % torch.cuda.device_count()) 
        
           #         model_and_loss._model = torch.nn.DataParallel(model_and_loss._model) 
        
           #     else: 
        
           #         logging.info("Let's use %d GPU!" % torch.cuda.device_count())

and run the script with CUDA_VISIBLE_DEVICES.
If you would like to use 4 GPUs in your machine, the command would be:
CUDA_VISIBLE_DEVICES=0, 1, 2, 3 IRR-FlowNet_flyingChairsOcc.sh

minghuiwsw · 2023-05-01T14:21:36Z

Thanks!
But I have another two questions.
1.Currently I am reproducing your pwc-irr network and training it with Sintel dataset. There is result for Sintel dataset in your paper. But in your provided code I find that there is only training script for flow_occ_v5. Considering that in other model like IRR-PWC, the training script of Sintel is two stages which is different from the script of flychairs, I wonder what is the training script for pwc-irr Sintel?
2.The training is two-stage in IRR-PWC Sintel training script. But I find that in the second stage, you did not use the checkpoint got from the first stage, but still use the original checkpoint, why? In my opinion, the second stage is finetuning based on the first stage's result, so the checkpoint in the second stage should inherit from the first stage.
Thanks again!

minghuiwsw · 2023-05-02T02:33:11Z

Hi, Jun @hurjunhwa
I tried to train on Sintel with pwc-irr using the pwc-irr.sh script and I only change the items relative to dataset according to without any change to the training strategy. Finally for training I get the best_epe_avg of 5.5894 which I think is not good enough compared to the result in your paper. What's wrong with it? The training strategy or training&validation dataset?
python ../main.py
--batch_size=$SIZE_OF_BATCH
--batch_size_val=$SIZE_OF_BATCH
--checkpoint=$CHECKPOINT
--lr_scheduler=MultiStepLR
--lr_scheduler_gamma=0.5
--lr_scheduler_milestones="[108, 144, 180]"
--model=$MODEL
--num_workers=4
--optimizer=Adam
--optimizer_lr=1e-4
--optimizer_weight_decay=4e-4
--save=$SAVE_PATH
--total_epochs=216
--training_augmentation=RandomAffineFlowOccSintel
--training_augmentation_crop="[384,768]"
--training_dataset=SintelTrainingCombFull
--training_dataset_photometric_augmentations=True
--training_dataset_root=$SINTEL_HOME
--training_key=total_loss
--training_loss=$EVAL_LOSS
--validation_dataset=SintelTrainingCombValid
--validation_dataset_photometric_augmentations=False
--validation_dataset_root=$SINTEL_HOME
--validation_key=epe
--validation_loss=$EVAL_LOSS

hurjunhwa · 2023-05-05T20:49:10Z

Hi,

In the paper, we first train the model on the FlyingChairsOcc dataset from the scratch. This is a pretraining step.

Then we finetune the model on Sintel or KITTI. This finetuning step consists of two steps: (1) train the model on train & valid split to figure out the number of iteration steps for finetuning and (2) train the model using the all images for the number of iteration steps found at (1).

Did you first pretrain the model on FlyingChairsOcc? or did you train the model on Sintel from the scratch?

minghuiwsw · 2023-05-06T06:33:09Z

yes, I trained on Sintel from the scratch at the betginning. And these days after I first trained on flychairs and finetuned on Sintel, it performs better. So I think it is the reason. But I have another two questions:
1.How did you get the checkpoint_best.ckpt in the foler /saved_check_point/PWCNet-irr? Currently I am using your PWCNet-irr rather than IRR-PWC, but I found there is only one ckpt. Is this ckpt the best one for flychairs or Sintel or all datasets? And how did you train the model to get it?
2.I understand your two-step training strategy for Sintel, but why didn't you train the model using the all images directly? Since the model will find the best ckpt and save it as long as you set the iteration steps large enough.
I am a newcomer to this field and my oponion is possibly wrong. So I am very glad that you are so patient to reply to me so many times. Thanks a lot!

hurjunhwa · 2023-05-06T14:16:56Z

Oh, actually the full training pipeline was FlyingChairs -> FlyingThings3D -> and then Sintel or KITTI finetuning.

I think the PWCNet-irr checkpoint is trained on FlyingChairs only. But probably you could doublecheck by running an inference on Sintel/KITTI and comparing the numbers in the paper

If training all images directly, it's hard to know when the model overfits because it always minimizes the loss. So the first stage is about finding a stopping point where the validation EPE is the lowest.

minghuiwsw · 2023-05-06T14:36:30Z

Thanks for your timely reply! I got it.
Sorry to bother you again but I have another two questions.
1.I saw one difference between pwcnet.py and pwcnet_irr.py is that pwcnet_irr.py added two rescale_flow operations. So I wonder why do we need to rescale the flow before and after the flow estimation module.
2.How did you test the model on test dataset like SintelTestClean? I did this according to the way to do validation which means I only changed dataset name in .sh file, but it did not work and it seems that there is no target flow to do comparison for the test dataset. Maybe I need to save the inference result and submit it to the Sintel official website? Can you show me the detailed process to do this?
Waiting for your reply
Best

minghuiwsw · 2023-05-07T02:08:19Z

I did some experiments. When I delete the two rescale_flow operations, the training can still converge. But if I reduce the search range from 4 to 2 additionly, the training can not converge. Specifically, the training epe is nearly the same after 40 epoches( the progress right now, still on going). Is the rescale operation related to search range?
The recalce operation I mentioned is this:
flow = rescale_flow(flow, self._div_flow, width_im, height_im, to_local=True)
Best

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

could you provide the training script for the other datasets of pwcnet? #50

could you provide the training script for the other datasets of pwcnet? #50

minghuiwsw commented Apr 26, 2023

hurjunhwa commented Apr 27, 2023

minghuiwsw commented Apr 27, 2023

minghuiwsw commented Apr 27, 2023

hurjunhwa commented Apr 30, 2023

minghuiwsw commented May 1, 2023

minghuiwsw commented May 2, 2023

hurjunhwa commented May 5, 2023

minghuiwsw commented May 6, 2023

hurjunhwa commented May 6, 2023

minghuiwsw commented May 6, 2023

minghuiwsw commented May 7, 2023

could you provide the training script for the other datasets of pwcnet? #50

could you provide the training script for the other datasets of pwcnet? #50

Comments

minghuiwsw commented Apr 26, 2023

hurjunhwa commented Apr 27, 2023

minghuiwsw commented Apr 27, 2023

minghuiwsw commented Apr 27, 2023

hurjunhwa commented Apr 30, 2023

minghuiwsw commented May 1, 2023

minghuiwsw commented May 2, 2023

hurjunhwa commented May 5, 2023

minghuiwsw commented May 6, 2023

hurjunhwa commented May 6, 2023

minghuiwsw commented May 6, 2023

minghuiwsw commented May 7, 2023