Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error Pre-training CVPPP #4

Open
Khoa-NT opened this issue Dec 5, 2019 · 22 comments
Open

Error Pre-training CVPPP #4

Khoa-NT opened this issue Dec 5, 2019 · 22 comments

Comments

@Khoa-NT
Copy link

Khoa-NT commented Dec 5, 2019

Hi @arnike ,

You forgot to change the CFG in /runs/cvppp_preproc_save.sh
CFG=001_preproc

After moved the aug_data, I run the ./runs/cvppp_pretrain.sh and I have some error:
log_git.txt
The forst problem is the dataloader was given wrong data path. The rest I don't know.

I still can't find a way to fix it.
Can you check it?

@Khoa-NT
Copy link
Author

Khoa-NT commented Dec 19, 2019

Hi @arnike
Do you have any suggestions?
Thank you

@arnike
Copy link
Collaborator

arnike commented Dec 20, 2019

Hi @shaolinkhoa,
./runs/cvppp_pretrain.sh runs a sequence of commands. The first fails and the rest of commands cannot find the snapshot that should have been created by the first. So, please, check that you specify the data path correctly. The error says that you're loading
/home/khoa/acis_master/code/data/cvppp/A1_AUG/train/plant102_rgb.png, and it's not there.
Best,
Nikita

@Khoa-NT
Copy link
Author

Khoa-NT commented Dec 25, 2019

Hi @arnike
Thank you for your reply and Merry Christmas 🎄
Can you check the code in cvppp.lua ?
When I run the code, I got:

iPath: plant065_rgb.png
mPath: plant065_label.png

There are no file name plant065_rgb.png and plant065_label.png in my
self.dir : /acis_master/code/data/cvppp/A1_AUG/train
My AUG images files have the format name like this: 038398_rgb.png

I think there is a problem in the self.imageInfo.imagePath and also self.imageInfo.maskPath
Because plant065_rgb.png is in the A1_RAW dataset.

If there is no problem in the self.imageInfo.imagePath and self.imageInfo.maskPath, then do we need raw image plant065_rgb.png is also in /acis_master/code/data/cvppp/A1_AUG/train ?

Can you check the name of the images in your AUG dataset?
Or the difference between the format name?

Update: I copied all the raw image into the self.dir : /acis_master/code/data/cvppp/A1_AUG/train but I still get the error:
log_git_2.txt

@Khoa-NT
Copy link
Author

Khoa-NT commented Jan 3, 2020

hi @arnike
Happy new year.
Would you mind checking again?

@Khoa-NT
Copy link
Author

Khoa-NT commented Jan 15, 2020

@arnike
I'm sorry for bothering you, but can you check it again?
And can I have your email so that we can discuss it?

@anhtuanhsgs
Copy link

anhtuanhsgs commented Jan 15, 2020

I got the same issue, does anyone have the solution for this?

@arnike
Copy link
Collaborator

arnike commented Jan 15, 2020

Hi guys,
apologies for a delay, a bit overwhelmed here... Have you tried removing the cache files in gen/*.t7? When you run the code, the dataloader checks for the cache files to load the image list and loads the files from that list; otherwise it will scan the directory and create new cache files. Since you've generated new files, cache needs to be updated. Unfortunately, this currently works only if you manually delete the cache data.
Best,
Nikita

@Khoa-NT
Copy link
Author

Khoa-NT commented Jan 20, 2020

Hi @arnike ,
I deleted all files in gen but it sill has the problem when loading dataset
log_git.txt

@arnike
Copy link
Collaborator

arnike commented Jan 21, 2020

Hi @shaolinkhoa,
seems like when you generate gen/cvppp_*.t7, the provided path is off, and the script doesn't find any images.

  1. When you generate these file lists (the first run after you delete them) you should see in the log output " | found [NUMBER] image-label pairs" and where NUMBER is likely 0 in your case. To see which directory gets searched, you can print dir variable here.
  2. If the cvppp-gen.lua finds your new images, but you still get an error loading it, please, make sure you full path paths.concat(self.dir, iPath) is specified correctly here.

Same for #5.
Best,
Nikita

@Khoa-NT
Copy link
Author

Khoa-NT commented Jan 21, 2020

Hi @arnike ,
Thank you for replying.

It seems the error is the
find -L /khoa/acis/code/data/cvppp/A1_AUG/train/* -maxdepth 5 -iname "*_rgb.png" command:
-bash: /usr/bin/find: Argument list too long
There are a lot of files in A1_AUG/train/ and A1_AUG/val/ so find can't return anything.

How did you run find in A1_AUG/train/ and A1_AUG/val/ ?

@arnike
Copy link
Collaborator

arnike commented Jan 21, 2020

Does this command list your files: find -L /khoa/acis/code/data/cvppp/A1_AUG/train/ -maxdepth 5 -iname "*_rgb.png"? If so, could you, please, try removing * in the line here?

@Khoa-NT
Copy link
Author

Khoa-NT commented Jan 22, 2020

Hi @arnike
I'm sorry but I got another error about The server at localhost:6039 does not appear to be up
log_6039.txt

Is it related to Crayon? Do I have to install it?

@Khoa-NT
Copy link
Author

Khoa-NT commented Jan 22, 2020

Hi @arnike ,
Can you check it for me?

@arnike
Copy link
Collaborator

arnike commented Jan 22, 2020

Hi @shaolinkhoa
yes, you need Crayon server running. Please, see this README to set it up.
Best,
Nikita

@Khoa-NT
Copy link
Author

Khoa-NT commented Jan 23, 2020

Hi @arnike
I can't access the given link.
Is it ok to follow the guide from here?

What I did:

  1. I pulled the docker image from here.

  2. I started docker by this command: docker run -d -p 8888:8888 -p 8889:8889 --name crayon alband/crayon

  3. I go to <acis>/code and then run the nohup tensorboard --logdir tensorboard --port 6038 > tensorboard/tb.log 2>&1 &

I tried both tensorboard==1.13 and tensorboard==2.1 but still get the error from tensorboard/tb.log:
AttributeError: module 'tensorflow.python.estimator.api.estimator' has no attribute 'SessionRunHook'

What is the version of your tensorboard?
What should I do with the docker container from step 2 ?

@arnike
Copy link
Collaborator

arnike commented Jan 23, 2020

What is the version of your tensorboard?

Tensorboard 1.10 worked for me.

What should I do with the docker container from step 2 ?

Docker should stay running.

@Khoa-NT
Copy link
Author

Khoa-NT commented Jan 23, 2020

Hi @arnike ,

To run the nohup python crayon/server/server.py --port 6039 --logdir tensorboard --tb-port 6038 > tensorboard/crayon.log 2>&1 &
we have to replace this with from urllib.request import urlopen and replace urllib2.urlopen with urlopen

After that, I run the ./runs/cvppp_pretrain.sh again but I get this error:
/home/khoa/torch/install/bin/lua: /home/khoa/torch/install/share/lua/5.2/crayon.lua:91: Something went wrong. Server sent: Experiment name should be a non-empty string or unicode instead of '<class 'str'>'. stack traceback: [C]: in function 'error' /home/khoa/torch/install/share/lua/5.2/crayon.lua:91: in function 'remove_experiment' main.lua:240: in function 'createExperiment' main.lua:253: in main chunk [C]: in function 'dofile' ...khoa/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk [C]: in ?
log_str.txt

In the log file, opt.seq_length = seq_length = 1 but I can't find the value of opt.config_id or config_id

Can you check it for me?
thank you

@Khoa-NT
Copy link
Author

Khoa-NT commented Feb 1, 2020

hi @arnike
Do you have any solutions?

@arnike
Copy link
Collaborator

arnike commented Feb 2, 2020

Hi @shaolinkhoa,
it seems that removing the existing record in crayon doesn't work properly. It first tried to create a new record here, it couldn't, so it tried to remove it here, which it couldn't either. Does it work for your setup when you execute it manually (see "Managing experiments" here). If so, before the run, please, make sure there are no experiments with the same name in crayon (e.g. cc:get_experiment_names()).

@Khoa-NT
Copy link
Author

Khoa-NT commented Feb 5, 2020

hi @arnike
I would like to ask how to run the Managing experiments? Where I run those commands?
Because I still don't understand about crayon

@Khoa-NT
Copy link
Author

Khoa-NT commented Feb 10, 2020

hi @arnike ,
Would you mind giving me more clues?

@arnike
Copy link
Collaborator

arnike commented Feb 18, 2020

Hi @shaolinkhoa,

Where I run those commands?

you run the commands in the torch shell: just execute th in the directory where Crayon saves the log files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants