-
Notifications
You must be signed in to change notification settings - Fork 599
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python client fails to start when receiving info or warning message from imported python modules #22
Comments
Hi, a few guesses and suggestions: On the terminal log, if you see the
Otherwise, if it fails to join in time (you will see the log stating that), that means the python process needs more time to load. Check this issue and comment to see how to increase the wait time. If the The py dependencies are installed inside a virtual env? The If running the tensorflow code by itself in pure py (that is, just Just to test that tf is working from inside
Then, on starting aiva, and talking to it "hello-io py" it will return "[[ 12.]]". If this works too, good. Overall:
Let me know at which point they fail. |
Hi! Thank you for the quick reply. From reading your post, it looks like the culprit is the wait time (the py client doesn't join). I will try this weekend and report back (I'm away from my computer at this time :'( On a side note, I didn't encounter any trouble making KB graph with spaCy while chatting with the bot (and it's an awesome piece of work you have accomplished!). |
Good to see that works for u :) |
sorry for getting back that late: it turns out it wasn't a timeout issue (although it's a very good parameter to keep in mind, thanks for sharing that): I'm really sorry about that false alarm, I am closing the issue, apologies for opening it in the first place. |
No worries :) |
:/ Sorry to go back and forth, it looks like there is still an issue actually:
on the other hand the GPU
(note that here I had a timeout set to
I am afraid to go back to warning/info messages from python treated as errors :S |
Hmm that's quite curious. It seems in the latter case the python process just crashes and quits, because there's no
and later socketIO server just shuts it down as expectedly as it knows the py client is missing. So, we can focus on that. Perhaps it's the python socketIO client misbehaving with the GPU TF. One way to see is to actually run Let me know if running that causes it to explode. |
it is getting stranger by the day:
I ran another experiment, this time not importing tensorflow but
AIVA has no problem starting, and doesn't consider it as an error. I'm puzzled :/ I will try environment with previous tensorflow version to see if it makes any difference, I doubt it but better double-check. |
Hmm strange indeed. I have a few ideas:
On Nov 1, 2016, 10:40 AM -0400, clavicule [email protected], wrote:
|
You are probably onto something with this idea. I modified the lang = _.replace(lang, 'python3', PYPATH)
global.log.info(`AYA Starting socketIO client for ${lang} at ${process.env.IOPORT}`) // you'll see below why I put AYA as a marker and I got this output:
Note that the but then we see this one: meaning it's starting a new python process, using the default system env (which doesn't have modules installed, hence the websocket import error). So the question is: where is this new python instruction starter coming from? It must be coming from somewhere right? |
More test results:
I get the same: Next test: earlier version of tensorflow :) |
last test:
==> same issue: I am not sure how to go from there, do you have any other idea? Do you also encounter the problem on your end? I assumed from the beginning it could not be a problem with the machine (since I can train and infer tensorflow CNN on GPU) but maybe it is? I can still run the CPU-version of tensorflow, which is largely enough to test interactions between tensorflow and AIVA but I assume that eventually it could be a problem for others too if they try to run GPU intensive tensorflow script, right? I guess an alternative would be to run tensorflow script as a service on other machine (e.g. AWS) and make the call from AIVA over https-request, would that be a good practice approach in your opinion? |
Yeah I haven't done it with such heavy tensorflow process yet. I'm afraid that once CUDA gets involved there'll be a lot more going on, and for example the environment variables dont get passed into the spawned bash process from node -> CUDA/the GPU TF backend might go crazy as it will miss some things. Ohh I got an idea that will likely work well.
Give it a try! |
yay! It works! Thanks! |
Awesome, glad to know. I'll close this, but will find a new way around the nodejs spawn shell process so this doesn't happen again. Thanks for discovering the issue! |
thanks a lot for working through it! |
I'm running into something similar, but the proposed solution didn't work for me. When I follow install and setup instructions exactly and run
When I flip the python client to
So the duplicate python client line is gone but the dependency error is still present. As mentioned in #45,
|
hi @kengz!
(hopefully I am not mistaken) it looks like
client.py
will fail to start and return an error as soon as any message is received, regardless of the type of message (warning, info, not necessarily errors).Here is how I got to this conclusion:
dnn_titanic_train.py
(actually all AI scripts too) fromAIVA v3
and brought them over to my local clone ofAIVA v4
tensorflow
with GPU support==> When starting the node app (
npm start --debug
), I get the following:ERROR I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcublas.so locally
This actually not an error message, when starting a python console and running
import tensorflow
, several messages telling me the successfully loaded cuda libs. I can also run the MNIST training example from TF, etc.So, to double-check it wasn't caused by the GPU-TF version, I made another virtual env, this time with the CPU
tensorflow
version. Then I got warning messages fromtensorflow
:ERROR <...>/cross_validation.py:44: DeprecationWarning: This module was deprecated in version 0.18 <...>, DeprecationWarning)
which I took care of by upgrading the
dnn_titanic_train.py
code so that it works ontensorflow v0.11RC1
(alsoDNNClassifier
class has changed, etc.).Then I got another
tensorflow
warning I couldn't get past:ERROR WARNING:tensorflow:float64 is not supported by many models, consider casting to float32
which should be released in the officialtensorflow v0.11
(link to issue).In theory, to make it work, I would have to go back to
tensorflow v0.9
or earlier (provided I don't encounter other warning messages). Also we can forget about the GPUtensorflow
.Bottomline, these messages are not error messages but are interpreted as such. Do you confirm this problem?
Thanks for your insight!
(btw, I'm on the
cgkb
branch but it shouldn't matter I think)The text was updated successfully, but these errors were encountered: