MKLdnn Integration Patch to improve issue #2986(call for cpu performance) #3438

Darwin2011 · 2016-10-04T04:37:40Z

Mxnet's CPU performance is not as good as GPU's.
please see #2986. @xmchen1987

This PR integrates MKLDNN into mxnet including all the major operations(conv, pool, lrn, bn and fc).
It turns out we can achieve 750+ fps for alexnet scoring and 280+ fps for Googlenet-v1 scoring with this patch in two-socket broadwell server and the backward also can benefit greatly from this patch.

Please review the patches then give us feedbacks.
Thanks.

piiswrong · 2016-10-04T04:56:31Z

.gitmodules

@@ -1,6 +1,6 @@
 [submodule "mshadow"]
 	path = mshadow
-	url = https://github.com/dmlc/mshadow.git
+	url = https://github.com/xmchen1987/mshadow.git


revert this and make a pr to mshadow

piiswrong · 2016-10-04T04:57:17Z

config.mk.tmp

@@ -0,0 +1,138 @@
+#-------------------------------------------------------------------------------


should this be removed?

Okay. I will create one separated repo for this build scripts.

I didn't mean a separate repo. There should be a option in config.mk called USE_MKLDNN (and MKLDNN_ROOT) and all the other flags should be added in Makefile if you see USE_MKLDNN=1

ok, thanks. It's done.

piiswrong · 2016-10-04T04:57:51Z

include/mkldnn_cppwrapper.h

@@ -0,0 +1,1153 @@
+#ifndef _MKL_DNN_CPPWRAPPER_H


put this file as while as other mkl related files in src/operator/mkl/
in the future we will move both cudnn and mkl layers into plugins/.

Thanks, done. I have put all the files into operator mkldnn.

piiswrong · 2016-10-04T04:59:50Z

include/mkldnn_cppwrapper.h

+#include "mkl_dnn.h"
+#include "mkl_version.h"
+
+#define TEMPLATE_PREFIX template <typename Dtype> inline


macros contaminates the global namespace.
Consider removing them and expand inplace or rename to MXNET_MKLDNN_*

Thanks.
expanding in place is done.

piiswrong · 2016-10-04T05:00:51Z

make/config.mk

@@ -20,18 +20,21 @@
 # choice of compiler
 #--------------------

+


revert changes in this file

reverted and add mkldnn option. done.

piiswrong · 2016-10-04T05:18:44Z

src/operator/pooling-inl.h

@@ -199,8 +208,13 @@ class PoolingProp : public OperatorProperty {
        CHECK(param_.kernel[0] <= dshape[2] + 2 * param_.pad[0]
              && param_.kernel[1] <= dshape[3] + 2 * param_.pad[1])
            << "kernel size exceed input";
-        oshape[2] = 1 + (dshape[2] + 2 * param_.pad[0] - param_.kernel[0]) / param_.stride[0];
-        oshape[3] = 1 + (dshape[3] + 2 * param_.pad[1] - param_.kernel[1]) / param_.stride[1];
+        if (param_.pooling_convention == pool_enum::kValid) {


are these changes from another pr? do a rebase?

Yes, it's from #3392.
That PR is rebased to this commits.

Ok so that PR is abandoned right?

Yes. Thanks.

piiswrong · 2016-10-04T05:21:01Z

src/operator/mkldnn_pooling-inl.h

+                    this->param_.kernel[0]) / this->param_.stride[0]));
+        if ((full_model_output_width == output_width) &&
+            (full_model_output_height == output_height)) {
+          support_mkldnn_ = true;


You can determine this in createoperaterex.
Instead of inheriting pooling, just return it if it's not supported by cudnn.

Thanks a lot. I will determine it in createoperaterex.

piiswrong · 2016-10-04T05:21:20Z

src/operator/pooling-inl.h

 }  // namespace pool_enum

 struct PoolingParam : public dmlc::Parameter<PoolingParam> {
  TShape kernel;
  TShape stride;
  TShape pad;
  int pool_type;
+  int pooling_convention;


please rebase.

piiswrong · 2016-10-04T05:21:37Z

tests/python/unittest/test_operator.py

@@ -1481,13 +1481,13 @@ def test_roipooling():
    test_maximum_minimum_scalar()
    test_abs()
    test_round_ceil_floor()
-    test_deconvolution()
+    #test_deconvolution()


revert these changes

Thanks, reverted.

piiswrong · 2016-10-04T05:23:49Z

tpp.h

@@ -0,0 +1,42 @@
+/*******************************************************************************
+* Copyright 1999-2016 Intel Corporation All Rights Reserved.


what's this? can we remove it? inserting this into a Apache software doesn't feel right

removed. It's one test files. No much use.

piiswrong · 2016-10-04T05:25:54Z

Thanks a lot for the contribution. It's greatly appreciated.
Made a few comments. Two general points:

move everthing mkl related into src/operators/mkl/. In the future we want to move it into plugins/
separate mkl installation from mxnet (you can leave a helper script in the root, but it should install mkl to a proper location)

piiswrong · 2016-10-04T05:29:24Z

also add author names into all files:
+/*!

* Copyright (c) 2015 by Contributors
* \file mkldnn_pooling-inl.h
* \brief
* \author Chen, Xiaoming
+*/

piiswrong · 2016-10-04T18:45:34Z

src/operator/mkldnn/mkldnn_memory-inl.h

+namespace mxnet {
+#if MXNET_USE_MKLDNN == 1
+template <typename DType>
+struct MKLMemoryDescriptorBase {


Does this hold actual input/output buffers or just descriptors? What's the memory used for?
If it's actual data, consider using mxnet's tempspace so that it can be shared to save memory.

This is probably a hard fix, we can do it in the next release

This holds actual buffers.
MKLDNN's layer output results stores as different layout and memory size(I guess more strict memory alignment), which cannot be saved directly into ndarray/tblob.
We also do some first trial to fix this. After this PR, hope we can discuss about it. Thanks.

after nnvm you can try to do this in passes. i.e. insert layout swap layer before and after all mkldnn layers

piiswrong · 2016-10-04T18:48:11Z

src/operator/mkldnn/mkldnn_pooling-inl.h

+        CHECK_EQ(dnnExecute<DType>(poolingFwd_, pooling_res_), E_SUCCESS);
+        fwd_out_data_->get_output_ptr(data.dptr_);
+      } else {
+        PoolingOp<cpu, Reducer, DType>::Forward(ctx, in_data, req, out_data,


Can you determine support_mkldnn_ at operator creation time? If so please directly create a PoolingOp instead of forwarding inside mkldnn_pooling

Yes. Thanks for your advice. Done.

piiswrong · 2016-10-04T18:49:36Z

src/operator/mkldnn/mkldnn_pooling-inl.h

+                    this->param_.kernel[0]) / this->param_.stride[0]));
+        if ((full_model_output_width == output_width) &&
+            (full_model_output_height == output_height)) {
+          support_mkldnn_ = true;


This is basically equivalent to if(param.pooling_convention == kFull) in createoperatorex right?

No.
As our experiments, MKLDNN only support rounds up pooling.
But sometimes, rounds up pooling and rounds down pooling has same output size.
That piece of codes determines whether the output size is same as rounds up pooling.
I know it's confusing so update the pooling code. Hope it can be better.

piiswrong · 2016-10-04T18:52:57Z

src/operator/pooling-inl.h

@@ -199,8 +208,13 @@ class PoolingProp : public OperatorProperty {
        CHECK(param_.kernel[0] <= dshape[2] + 2 * param_.pad[0]
              && param_.kernel[1] <= dshape[3] + 2 * param_.pad[1])
            << "kernel size exceed input";
-        oshape[2] = 1 + (dshape[2] + 2 * param_.pad[0] - param_.kernel[0]) / param_.stride[0];
-        oshape[3] = 1 + (dshape[3] + 2 * param_.pad[1] - param_.kernel[1]) / param_.stride[1];
+        if (param_.pooling_convention == pool_enum::kValid) {


Ok so that PR is abandoned right?

piiswrong · 2016-10-04T18:59:18Z

Mostly looks good to me now. Please address these issues and we can merge:

Submit a PR to dmlc/mshadow https://github.com/dmlc/mshadow/pulls
Update this PR after 1) is merged.
Fix Pooling so it doesn't inherit PoolingOp, return PoolingOp in createoperatorEx if mkldnn doesn't support it
Fix test errors and lint

mli · 2016-10-04T20:47:25Z

make/config.mk

@@ -51,6 +51,12 @@ USE_CUDNN = 0
 # whether use cuda runtime compiling for writing kernels in native language (i.e. Python)
 USE_NVRTC = 0

+# whether use MKLDNN library
+USE_MKLDNN = 1


suggest to use 0 in default, otherwise i guess people cannot compile it without install mkldnn

Thanks. Set MKLDNN to 0.

Darwin2011 · 2016-10-05T17:21:19Z

@piiswrong

Fix pooling layer done.
PR has proposed to mshadow. Although openmp on can boost some performance, it seems that there's no way to avoid competition from multiple ops. Please see.
add parallism in base map plan dmlc/mshadow#167
I will fix lints and tests soon.
I will prepare scritpts and readme.

Thanks

piiswrong · 2016-10-05T17:34:09Z

Are you sure mshadow pr has been proposed? I can't find it.
I mean the pr with changes related to this patch. as you can see here, it's not compiling because it cannot find your commit to mshadow: https://travis-ci.org/dmlc/mxnet/jobs/165290833

Darwin2011 · 2016-10-06T05:53:23Z

Thanks @piiswrong
We have PR the mshadow patch but it's not accepted yet.
So I change the mshadow commit to current mshadow head commit to make it work now.

piiswrong · 2016-10-06T05:58:15Z

build is still failing. Please make sure it compiles on a machine without mkldnn when USE_MKLDNN=0

Darwin2011 · 2016-10-08T14:24:20Z

@piiswrong
I have made first commit pass CI test but for README commit, CI tests fails at cpptests.
And from my local machines, I cannot reproduce this issue.
I am taking a look at the issue. And Could you give me some advice also?

And the following is what the CI reports.
[09:05:51] src/engine/./threaded_engine.h:296: ExecuteOprFn
tests/travis/run_test.sh: line 49: 8942 Segmentation fault (core dumped) ./$test
Thanks

piiswrong · 2016-10-08T21:50:37Z

cpp tests segfaults randomly on travis. Can't be reproduced locally so no fix yet. Try rerunning it

Darwin2011 · 2016-10-09T02:29:58Z

Thanks. After rerunning, the issue disappears. Could you help to review it again?

howard0su · 2016-12-31T16:59:07Z

this looks promising. any updates?

piiswrong suggested changes Oct 4, 2016

View reviewed changes

Darwin2011 force-pushed the mkldnn branch from fb12d0b to 63d2eda Compare October 4, 2016 15:33

piiswrong suggested changes Oct 4, 2016

View reviewed changes

mli reviewed Oct 5, 2016

View reviewed changes

Darwin2011 changed the title ~~MKLdnn Integration Support Patch for mxnet~~ MKLdnn Integration Support Patch to improve issue #2986 Oct 5, 2016

Darwin2011 changed the title ~~MKLdnn Integration Support Patch to improve issue #2986~~ MKLdnn Integration Support Patch to improve issue #2986(call for cpu performance) Oct 5, 2016

Darwin2011 changed the title ~~MKLdnn Integration Support Patch to improve issue #2986(call for cpu performance)~~ MKLdnn Integration Patch to improve issue #2986(call for cpu performance) Oct 5, 2016

Darwin2011 force-pushed the mkldnn branch from ac13679 to 63d2eda Compare October 5, 2016 16:44

sxjscience mentioned this pull request Oct 6, 2016

Pooling formula seems wrong to cause Googlenetv1 not working #3457

Closed

Darwin2011 force-pushed the mkldnn branch 4 times, most recently from 42102e4 to d7ad65b Compare October 7, 2016 16:39

MKLDNN integration for conv, pool, bn, fc, lrn.

d7ad65b

Darwin2011 force-pushed the mkldnn branch from 7b96e18 to 26c2e17 Compare October 8, 2016 08:06

add README

26c2e17

yandai added 2 commits October 9, 2016 17:43

update readme

16bedeb

revert pooling param_ access

277ea75

piiswrong approved these changes Oct 12, 2016

View reviewed changes

fix mkldnn pooling

77b02f8

yandai added 4 commits October 17, 2016 20:33

fix mkldnn pooling layer issue

a7b2f2c

fix typos

5576619

fix relu for out-of-place computation

059d171

set NULL pointer after primitive release

69bb411

futurely mentioned this pull request Dec 13, 2016

[WIP] Fix amalgamation #4193

Closed

5 tasks

piiswrong closed this Jan 21, 2017

		@@ -0,0 +1,138 @@
		#-------------------------------------------------------------------------------

		@@ -20,18 +20,21 @@
		# choice of compiler
		#--------------------

		@@ -0,0 +1,42 @@
		/*******************************************************************************
		* Copyright 1999-2016 Intel Corporation All Rights Reserved.

MKLdnn Integration Patch to improve issue #2986(call for cpu performance) #3438

MKLdnn Integration Patch to improve issue #2986(call for cpu performance) #3438

Conversation

Darwin2011 commented Oct 4, 2016 • edited Loading

Choose a reason for hiding this comment

Darwin2011 Oct 4, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

piiswrong Oct 4, 2016 • edited Loading

Choose a reason for hiding this comment

Darwin2011 Oct 4, 2016 • edited Loading

Choose a reason for hiding this comment

piiswrong Oct 4, 2016 • edited Loading

Choose a reason for hiding this comment

Darwin2011 Oct 4, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Darwin2011 Oct 4, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Darwin2011 Oct 4, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Darwin2011 Oct 4, 2016 • edited Loading

Choose a reason for hiding this comment

piiswrong commented Oct 4, 2016 • edited Loading

piiswrong commented Oct 4, 2016

piiswrong Oct 4, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

piiswrong commented Oct 4, 2016

mli Oct 4, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Darwin2011 commented Oct 5, 2016 • edited Loading

piiswrong commented Oct 5, 2016

Darwin2011 commented Oct 6, 2016 • edited Loading

piiswrong commented Oct 6, 2016

Darwin2011 commented Oct 8, 2016 • edited Loading

piiswrong commented Oct 8, 2016

Darwin2011 commented Oct 9, 2016 • edited Loading

howard0su commented Dec 31, 2016

Darwin2011 commented Oct 4, 2016 •

edited

Loading

Darwin2011 Oct 4, 2016 •

edited

Loading

piiswrong Oct 4, 2016 •

edited

Loading

Darwin2011 Oct 4, 2016 •

edited

Loading

piiswrong Oct 4, 2016 •

edited

Loading

Darwin2011 Oct 4, 2016 •

edited

Loading

Darwin2011 Oct 4, 2016 •

edited

Loading

Darwin2011 Oct 4, 2016 •

edited

Loading

Darwin2011 Oct 4, 2016 •

edited

Loading

piiswrong commented Oct 4, 2016 •

edited

Loading

piiswrong Oct 4, 2016 •

edited

Loading

mli Oct 4, 2016 •

edited

Loading

Darwin2011 commented Oct 5, 2016 •

edited

Loading

Darwin2011 commented Oct 6, 2016 •

edited

Loading

Darwin2011 commented Oct 8, 2016 •

edited

Loading

Darwin2011 commented Oct 9, 2016 •

edited

Loading