Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

MKLdnn Integration Patch to improve issue #2986(call for cpu performance) #3438

Closed
wants to merge 9 commits into from

Conversation

Darwin2011
Copy link

@Darwin2011 Darwin2011 commented Oct 4, 2016

Hi, @piiswrong

Mxnet's CPU performance is not as good as GPU's.
please see #2986. @xmchen1987

This PR integrates MKLDNN into mxnet including all the major operations(conv, pool, lrn, bn and fc).
It turns out we can achieve 750+ fps for alexnet scoring and 280+ fps for Googlenet-v1 scoring with this patch in two-socket broadwell server and the backward also can benefit greatly from this patch.

Please review the patches then give us feedbacks.
Thanks.

@@ -1,6 +1,6 @@
[submodule "mshadow"]
path = mshadow
url = https://github.com/dmlc/mshadow.git
url = https://github.com/xmchen1987/mshadow.git
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert this and make a pr to mshadow

Copy link
Author

@Darwin2011 Darwin2011 Oct 4, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove it.

@@ -0,0 +1,138 @@
#-------------------------------------------------------------------------------
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be removed?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. I will create one separated repo for this build scripts.

Copy link
Contributor

@piiswrong piiswrong Oct 4, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't mean a separate repo. There should be a option in config.mk called USE_MKLDNN (and MKLDNN_ROOT) and all the other flags should be added in Makefile if you see USE_MKLDNN=1

Copy link
Author

@Darwin2011 Darwin2011 Oct 4, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, thanks. It's done.

@@ -0,0 +1,1153 @@
#ifndef _MKL_DNN_CPPWRAPPER_H
Copy link
Contributor

@piiswrong piiswrong Oct 4, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put this file as while as other mkl related files in src/operator/mkl/
in the future we will move both cudnn and mkl layers into plugins/.

Copy link
Author

@Darwin2011 Darwin2011 Oct 4, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, done. I have put all the files into operator mkldnn.

#include "mkl_dnn.h"
#include "mkl_version.h"

#define TEMPLATE_PREFIX template <typename Dtype> inline
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

macros contaminates the global namespace.
Consider removing them and expand inplace or rename to MXNET_MKLDNN_*

Copy link
Author

@Darwin2011 Darwin2011 Oct 4, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.
expanding in place is done.

@@ -20,18 +20,21 @@
# choice of compiler
#--------------------


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert changes in this file

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverted and add mkldnn option. done.

@@ -199,8 +208,13 @@ class PoolingProp : public OperatorProperty {
CHECK(param_.kernel[0] <= dshape[2] + 2 * param_.pad[0]
&& param_.kernel[1] <= dshape[3] + 2 * param_.pad[1])
<< "kernel size exceed input";
oshape[2] = 1 + (dshape[2] + 2 * param_.pad[0] - param_.kernel[0]) / param_.stride[0];
oshape[3] = 1 + (dshape[3] + 2 * param_.pad[1] - param_.kernel[1]) / param_.stride[1];
if (param_.pooling_convention == pool_enum::kValid) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these changes from another pr? do a rebase?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's from #3392.
That PR is rebased to this commits.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok so that PR is abandoned right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Thanks.

this->param_.kernel[0]) / this->param_.stride[0]));
if ((full_model_output_width == output_width) &&
(full_model_output_height == output_height)) {
support_mkldnn_ = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can determine this in createoperaterex.
Instead of inheriting pooling, just return it if it's not supported by cudnn.

Copy link
Author

@Darwin2011 Darwin2011 Oct 4, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot. I will determine it in createoperaterex.

} // namespace pool_enum

struct PoolingParam : public dmlc::Parameter<PoolingParam> {
TShape kernel;
TShape stride;
TShape pad;
int pool_type;
int pooling_convention;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please rebase.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay

@@ -1481,13 +1481,13 @@ def test_roipooling():
test_maximum_minimum_scalar()
test_abs()
test_round_ceil_floor()
test_deconvolution()
#test_deconvolution()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert these changes

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, reverted.

@@ -0,0 +1,42 @@
/*******************************************************************************
* Copyright 1999-2016 Intel Corporation All Rights Reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's this? can we remove it? inserting this into a Apache software doesn't feel right

Copy link
Author

@Darwin2011 Darwin2011 Oct 4, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed. It's one test files. No much use.

@piiswrong
Copy link
Contributor

piiswrong commented Oct 4, 2016

Thanks a lot for the contribution. It's greatly appreciated.
Made a few comments. Two general points:

  1. move everthing mkl related into src/operators/mkl/. In the future we want to move it into plugins/
  2. separate mkl installation from mxnet (you can leave a helper script in the root, but it should install mkl to a proper location)

@piiswrong
Copy link
Contributor

also add author names into all files:
+/*!

  • * Copyright (c) 2015 by Contributors
  • * \file mkldnn_pooling-inl.h
  • * \brief
  • * \author Chen, Xiaoming
    +*/

namespace mxnet {
#if MXNET_USE_MKLDNN == 1
template <typename DType>
struct MKLMemoryDescriptorBase {
Copy link
Contributor

@piiswrong piiswrong Oct 4, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this hold actual input/output buffers or just descriptors? What's the memory used for?
If it's actual data, consider using mxnet's tempspace so that it can be shared to save memory.

This is probably a hard fix, we can do it in the next release

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This holds actual buffers.
MKLDNN's layer output results stores as different layout and memory size(I guess more strict memory alignment), which cannot be saved directly into ndarray/tblob.
We also do some first trial to fix this. After this PR, hope we can discuss about it. Thanks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after nnvm you can try to do this in passes. i.e. insert layout swap layer before and after all mkldnn layers

CHECK_EQ(dnnExecute<DType>(poolingFwd_, pooling_res_), E_SUCCESS);
fwd_out_data_->get_output_ptr(data.dptr_);
} else {
PoolingOp<cpu, Reducer, DType>::Forward(ctx, in_data, req, out_data,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you determine support_mkldnn_ at operator creation time? If so please directly create a PoolingOp instead of forwarding inside mkldnn_pooling

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Thanks for your advice. Done.

this->param_.kernel[0]) / this->param_.stride[0]));
if ((full_model_output_width == output_width) &&
(full_model_output_height == output_height)) {
support_mkldnn_ = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is basically equivalent to if(param.pooling_convention == kFull) in createoperatorex right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No.
As our experiments, MKLDNN only support rounds up pooling.
But sometimes, rounds up pooling and rounds down pooling has same output size.
That piece of codes determines whether the output size is same as rounds up pooling.
I know it's confusing so update the pooling code. Hope it can be better.

@@ -199,8 +208,13 @@ class PoolingProp : public OperatorProperty {
CHECK(param_.kernel[0] <= dshape[2] + 2 * param_.pad[0]
&& param_.kernel[1] <= dshape[3] + 2 * param_.pad[1])
<< "kernel size exceed input";
oshape[2] = 1 + (dshape[2] + 2 * param_.pad[0] - param_.kernel[0]) / param_.stride[0];
oshape[3] = 1 + (dshape[3] + 2 * param_.pad[1] - param_.kernel[1]) / param_.stride[1];
if (param_.pooling_convention == pool_enum::kValid) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok so that PR is abandoned right?

@piiswrong
Copy link
Contributor

Mostly looks good to me now. Please address these issues and we can merge:

  1. Submit a PR to dmlc/mshadow https://github.com/dmlc/mshadow/pulls
  2. Update this PR after 1) is merged.
  3. Fix Pooling so it doesn't inherit PoolingOp, return PoolingOp in createoperatorEx if mkldnn doesn't support it
  4. Fix test errors and lint

@@ -51,6 +51,12 @@ USE_CUDNN = 0
# whether use cuda runtime compiling for writing kernels in native language (i.e. Python)
USE_NVRTC = 0

# whether use MKLDNN library
USE_MKLDNN = 1
Copy link
Contributor

@mli mli Oct 4, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest to use 0 in default, otherwise i guess people cannot compile it without install mkldnn

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Set MKLDNN to 0.

@Darwin2011 Darwin2011 changed the title MKLdnn Integration Support Patch for mxnet MKLdnn Integration Support Patch to improve issue #2986 Oct 5, 2016
@Darwin2011 Darwin2011 changed the title MKLdnn Integration Support Patch to improve issue #2986 MKLdnn Integration Support Patch to improve issue #2986(call for cpu performance) Oct 5, 2016
@Darwin2011 Darwin2011 changed the title MKLdnn Integration Support Patch to improve issue #2986(call for cpu performance) MKLdnn Integration Patch to improve issue #2986(call for cpu performance) Oct 5, 2016
@Darwin2011
Copy link
Author

Darwin2011 commented Oct 5, 2016

@piiswrong

  • Fix pooling layer done.
  • PR has proposed to mshadow. Although openmp on can boost some performance, it seems that there's no way to avoid competition from multiple ops. Please see.
    add parallism in base map plan dmlc/mshadow#167
  • I will fix lints and tests soon.
  • I will prepare scritpts and readme.

Thanks

@piiswrong
Copy link
Contributor

Are you sure mshadow pr has been proposed? I can't find it.
I mean the pr with changes related to this patch. as you can see here, it's not compiling because it cannot find your commit to mshadow: https://travis-ci.org/dmlc/mxnet/jobs/165290833

@Darwin2011
Copy link
Author

Darwin2011 commented Oct 6, 2016

Thanks @piiswrong
We have PR the mshadow patch but it's not accepted yet.
So I change the mshadow commit to current mshadow head commit to make it work now.

@piiswrong
Copy link
Contributor

build is still failing. Please make sure it compiles on a machine without mkldnn when USE_MKLDNN=0

@Darwin2011 Darwin2011 force-pushed the mkldnn branch 4 times, most recently from 42102e4 to d7ad65b Compare October 7, 2016 16:39
@Darwin2011
Copy link
Author

Darwin2011 commented Oct 8, 2016

@piiswrong
I have made first commit pass CI test but for README commit, CI tests fails at cpptests.
And from my local machines, I cannot reproduce this issue.
I am taking a look at the issue. And Could you give me some advice also?

And the following is what the CI reports.
[09:05:51] src/engine/./threaded_engine.h:296: ExecuteOprFn
tests/travis/run_test.sh: line 49: 8942 Segmentation fault (core dumped) ./$test
Thanks

@piiswrong
Copy link
Contributor

cpp tests segfaults randomly on travis. Can't be reproduced locally so no fix yet. Try rerunning it

@Darwin2011
Copy link
Author

Darwin2011 commented Oct 9, 2016

Thanks. After rerunning, the issue disappears. Could you help to review it again?

@futurely futurely mentioned this pull request Dec 13, 2016
5 tasks
@howard0su
Copy link
Contributor

this looks promising. any updates?

@piiswrong piiswrong closed this Jan 21, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants