Shape [-1] has negative dimensions #1

bzamecnik · 2017-08-10T13:04:20Z

Running on 2 GPUs (GTX 1070):

CUDA_VISIBLE_DEVICES=0,1 python data_parallel_mnist_cnn.py

Train on 60000 samples, validate on 10000 samples
Epoch 1/10
2017-08-10 14:55:47.483599: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-08-10 14:55:47.483631: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-08-10 14:55:48.831409: W tensorflow/core/framework/op_kernel.cc:1148] Invalid argument: Shape [-1,-1] has negative dimensions
2017-08-10 14:55:48.831460: E tensorflow/core/common_runtime/executor.cc:644] Executor failed to create kernel. Invalid argument: Shape [-1,-1] has negative dimensions
	 [[Node: replica_1_1/model_1_target = Placeholder[dtype=DT_FLOAT, shape=[?,?], _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
2017-08-10 14:55:48.849021: W tensorflow/core/framework/op_kernel.cc:1148] Invalid argument: Shape [-1,-1] has negative dimensions
2017-08-10 14:55:48.849064: E tensorflow/core/common_runtime/executor.cc:644] Executor failed to create kernel. Invalid argument: Shape [-1,-1] has negative dimensions
	 [[Node: replica_0_1/model_1_target = Placeholder[dtype=DT_FLOAT, shape=[?,?], _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
2017-08-10 14:55:48.865190: W tensorflow/core/framework/op_kernel.cc:1148] Invalid argument: Shape [-1] has negative dimensions
2017-08-10 14:55:48.865233: E tensorflow/core/common_runtime/executor.cc:644] Executor failed to create kernel. Invalid argument: Shape [-1] has negative dimensions
	 [[Node: replica_0_1/model_1_sample_weights = Placeholder[dtype=DT_FLOAT, shape=[?], _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
Traceback (most recent call last):
  File "/Users/bzamecnik/anaconda/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 1139, in _do_call
    return fn(*args)
  File "/Users/bzamecnik/anaconda/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 1121, in _run_fn
    status, run_metadata)
  File "/Users/bzamecnik/anaconda/lib/python3.4/contextlib.py", line 66, in __exit__
    next(self.gen)
  File "/Users/bzamecnik/anaconda/lib/python3.4/site-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Shape [-1] has negative dimensions
	 [[Node: replica_0_1/model_1_sample_weights = Placeholder[dtype=DT_FLOAT, shape=[?], _device="/job:localhost/replica:0/task:0/cpu:0"]()]]

The text was updated successfully, but these errors were encountered:

bzamecnik · 2017-08-10T14:32:03Z

It appears for placeholders model_1_sample_weights and model_1_targets. It seems all calls to K.placeholder have shape fully specified, while in these two cases only ndim is given, which results in shape (None) or (None, None).

bzamecnik · 2017-08-10T15:06:34Z

A hypothesis is that it might be caused by mismatch of size of predictions and targets within each replica (tower). Now we provide inputs and targets of full mini-batch size, but extract slices and compute tower predictions of of sub-batch size. Since we compute the loss within each tower (compared to the baseline solution - make_parallel()) the size of predictions and targets might be different.

We would have to slice also the targets/sample weights. Or another solution would be perform sub-batch slicing in Keras and feed slices for each tower separately.

bzamecnik · 2017-08-11T14:43:39Z

It seems that there are some placeholders that are not assigned in session.run() via feed_dict.

Placeholders:

>>> [op for op in g.get_operations() if op.type == 'Placeholder']
[<tf.Operation 'input_1' type=Placeholder>,
 <tf.Operation 'dropout_1/keras_learning_phase' type=Placeholder>,
 <tf.Operation 'replica_0_1/model_1_sample_weights' type=Placeholder>,
 <tf.Operation 'replica_0_1/model_1_target' type=Placeholder>,
 <tf.Operation 'replica_1_1/model_1_sample_weights' type=Placeholder>,
 <tf.Operation 'replica_1_1/model_1_target' type=Placeholder>,
 <tf.Operation 'concatenate_1_sample_weights' type=Placeholder>,
 <tf.Operation 'concatenate_1_target' type=Placeholder>]

The above error is raised when a placeholder with dynamic dimensions (marked as ? or None) is not assigned a value. Error from incompatible shapes looks differently (see a small experiment).

In Model.compile() placeholders for sample_weights and targets are created. Since we call compile for the replicas and also for the wrapping model we make several sets of these placeholders. However, during training we call fit() only on the wrapper model and thus do not feed values to the the placeholders in the replica models.

bzamecnik mentioned this issue Aug 10, 2017

[WIP] Correct data-parallel SGD implementation in Keras keras-team/keras#7515

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shape [-1] has negative dimensions #1

Shape [-1] has negative dimensions #1

bzamecnik commented Aug 10, 2017

bzamecnik commented Aug 10, 2017 •

edited

Loading

bzamecnik commented Aug 10, 2017

bzamecnik commented Aug 11, 2017 •

edited

Loading

Shape [-1] has negative dimensions #1

Shape [-1] has negative dimensions #1

Comments

bzamecnik commented Aug 10, 2017

bzamecnik commented Aug 10, 2017 • edited Loading

bzamecnik commented Aug 10, 2017

bzamecnik commented Aug 11, 2017 • edited Loading

bzamecnik commented Aug 10, 2017 •

edited

Loading

bzamecnik commented Aug 11, 2017 •

edited

Loading