-
Notifications
You must be signed in to change notification settings - Fork 6.8k
MKLdnn Integration Patch to improve issue #2986(call for cpu performance) #3438
Conversation
@@ -1,6 +1,6 @@ | |||
[submodule "mshadow"] | |||
path = mshadow | |||
url = https://github.com/dmlc/mshadow.git | |||
url = https://github.com/xmchen1987/mshadow.git |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
revert this and make a pr to mshadow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove it.
@@ -0,0 +1,138 @@ | |||
#------------------------------------------------------------------------------- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. I will create one separated repo for this build scripts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't mean a separate repo. There should be a option in config.mk called USE_MKLDNN (and MKLDNN_ROOT) and all the other flags should be added in Makefile if you see USE_MKLDNN=1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, thanks. It's done.
@@ -0,0 +1,1153 @@ | |||
#ifndef _MKL_DNN_CPPWRAPPER_H |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put this file as while as other mkl related files in src/operator/mkl/
in the future we will move both cudnn and mkl layers into plugins/.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, done. I have put all the files into operator mkldnn.
#include "mkl_dnn.h" | ||
#include "mkl_version.h" | ||
|
||
#define TEMPLATE_PREFIX template <typename Dtype> inline |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
macros contaminates the global namespace.
Consider removing them and expand inplace or rename to MXNET_MKLDNN_*
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
expanding in place is done.
@@ -20,18 +20,21 @@ | |||
# choice of compiler | |||
#-------------------- | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
revert changes in this file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reverted and add mkldnn option. done.
@@ -199,8 +208,13 @@ class PoolingProp : public OperatorProperty { | |||
CHECK(param_.kernel[0] <= dshape[2] + 2 * param_.pad[0] | |||
&& param_.kernel[1] <= dshape[3] + 2 * param_.pad[1]) | |||
<< "kernel size exceed input"; | |||
oshape[2] = 1 + (dshape[2] + 2 * param_.pad[0] - param_.kernel[0]) / param_.stride[0]; | |||
oshape[3] = 1 + (dshape[3] + 2 * param_.pad[1] - param_.kernel[1]) / param_.stride[1]; | |||
if (param_.pooling_convention == pool_enum::kValid) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are these changes from another pr? do a rebase?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's from #3392.
That PR is rebased to this commits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok so that PR is abandoned right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Thanks.
this->param_.kernel[0]) / this->param_.stride[0])); | ||
if ((full_model_output_width == output_width) && | ||
(full_model_output_height == output_height)) { | ||
support_mkldnn_ = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can determine this in createoperaterex.
Instead of inheriting pooling, just return it if it's not supported by cudnn.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot. I will determine it in createoperaterex.
} // namespace pool_enum | ||
|
||
struct PoolingParam : public dmlc::Parameter<PoolingParam> { | ||
TShape kernel; | ||
TShape stride; | ||
TShape pad; | ||
int pool_type; | ||
int pooling_convention; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please rebase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay
@@ -1481,13 +1481,13 @@ def test_roipooling(): | |||
test_maximum_minimum_scalar() | |||
test_abs() | |||
test_round_ceil_floor() | |||
test_deconvolution() | |||
#test_deconvolution() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
revert these changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, reverted.
@@ -0,0 +1,42 @@ | |||
/******************************************************************************* | |||
* Copyright 1999-2016 Intel Corporation All Rights Reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's this? can we remove it? inserting this into a Apache software doesn't feel right
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed. It's one test files. No much use.
Thanks a lot for the contribution. It's greatly appreciated.
|
also add author names into all files:
|
namespace mxnet { | ||
#if MXNET_USE_MKLDNN == 1 | ||
template <typename DType> | ||
struct MKLMemoryDescriptorBase { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this hold actual input/output buffers or just descriptors? What's the memory used for?
If it's actual data, consider using mxnet's tempspace so that it can be shared to save memory.
This is probably a hard fix, we can do it in the next release
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This holds actual buffers.
MKLDNN's layer output results stores as different layout and memory size(I guess more strict memory alignment), which cannot be saved directly into ndarray/tblob.
We also do some first trial to fix this. After this PR, hope we can discuss about it. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after nnvm you can try to do this in passes. i.e. insert layout swap layer before and after all mkldnn layers
CHECK_EQ(dnnExecute<DType>(poolingFwd_, pooling_res_), E_SUCCESS); | ||
fwd_out_data_->get_output_ptr(data.dptr_); | ||
} else { | ||
PoolingOp<cpu, Reducer, DType>::Forward(ctx, in_data, req, out_data, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you determine support_mkldnn_ at operator creation time? If so please directly create a PoolingOp instead of forwarding inside mkldnn_pooling
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Thanks for your advice. Done.
this->param_.kernel[0]) / this->param_.stride[0])); | ||
if ((full_model_output_width == output_width) && | ||
(full_model_output_height == output_height)) { | ||
support_mkldnn_ = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is basically equivalent to if(param.pooling_convention == kFull) in createoperatorex right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No.
As our experiments, MKLDNN only support rounds up pooling.
But sometimes, rounds up pooling and rounds down pooling has same output size.
That piece of codes determines whether the output size is same as rounds up pooling.
I know it's confusing so update the pooling code. Hope it can be better.
@@ -199,8 +208,13 @@ class PoolingProp : public OperatorProperty { | |||
CHECK(param_.kernel[0] <= dshape[2] + 2 * param_.pad[0] | |||
&& param_.kernel[1] <= dshape[3] + 2 * param_.pad[1]) | |||
<< "kernel size exceed input"; | |||
oshape[2] = 1 + (dshape[2] + 2 * param_.pad[0] - param_.kernel[0]) / param_.stride[0]; | |||
oshape[3] = 1 + (dshape[3] + 2 * param_.pad[1] - param_.kernel[1]) / param_.stride[1]; | |||
if (param_.pooling_convention == pool_enum::kValid) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok so that PR is abandoned right?
Mostly looks good to me now. Please address these issues and we can merge:
|
@@ -51,6 +51,12 @@ USE_CUDNN = 0 | |||
# whether use cuda runtime compiling for writing kernels in native language (i.e. Python) | |||
USE_NVRTC = 0 | |||
|
|||
# whether use MKLDNN library | |||
USE_MKLDNN = 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggest to use 0 in default, otherwise i guess people cannot compile it without install mkldnn
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Set MKLDNN to 0.
Thanks |
Are you sure mshadow pr has been proposed? I can't find it. |
Thanks @piiswrong |
build is still failing. Please make sure it compiles on a machine without mkldnn when USE_MKLDNN=0 |
42102e4
to
d7ad65b
Compare
@piiswrong And the following is what the CI reports. |
cpp tests segfaults randomly on travis. Can't be reproduced locally so no fix yet. Try rerunning it |
Thanks. After rerunning, the issue disappears. Could you help to review it again? |
this looks promising. any updates? |
Hi, @piiswrong
Mxnet's CPU performance is not as good as GPU's.
please see #2986. @xmchen1987
This PR integrates MKLDNN into mxnet including all the major operations(conv, pool, lrn, bn and fc).
It turns out we can achieve 750+ fps for alexnet scoring and 280+ fps for Googlenet-v1 scoring with this patch in two-socket broadwell server and the backward also can benefit greatly from this patch.
Please review the patches then give us feedbacks.
Thanks.