-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CPU] Enable memory reuse for nested graphs #27521
base: master
Are you sure you want to change the base?
[CPU] Enable memory reuse for nested graphs #27521
Conversation
da1d1a0
to
18563de
Compare
4734e61
to
e140018
Compare
@maxnick Ready for review. Could you please take a look? |
e140018
to
d5bda07
Compare
Added a fix for Convolution + Sum fallback graph. |
src/plugins/intel_cpu/src/edge.h
Outdated
@@ -82,6 +82,7 @@ class Edge { | |||
} | |||
|
|||
std::string name() const; | |||
const MemoryDesc& getDesc() const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is private on purpose. As long as it's private, the node implementation may retrieve a tensor descriptor from edge only via the Memory object itself. But moving this method to public scope now introduces two ways of getting a memory descriptor, using getDesc
and using the memory object. Of course these two approaches provide different memory descriptors, but making getDesc
increases the code complexity by providing more degrees of freedom. Is it strictly necessary to extract this method to the public scope?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added asserts, so this method can be called only before memory is allocated.
This is actually more robust way, than just using friend
functionality and relying on the fact the graph is not going to use it in a wrong way.
Also, there is no way to prevent a developer from accessing all these data structures via chain of another existing public methods (getParent()->getSelectedPrimitiveDescriptor()->etc).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is more robust only from the scope change standpoint. But it doesn't help the user of the interface to use the object in a particular way as it gives more degrees of freedom, which results in more possibilities of making a mistake.
It looks like this method has been moved to the public scope to be accessed from the auxiliary static functions, which are established to keep graph code more clear. Even though the downsides of changing the scope of this method outweigh the benefits, it's clear that it would be excessive to solve the problem of accessing the memory descriptor via special routines that would basically do the same thing as this method. So we have two solutions here
- Convert the static routines to private Graph methods, so that
getDesc
call being private still remains accessible as Graph is still a "friend" of Edge class - Rename the method, so that it becomes clear for the user of the interface when to use it. "getDesc" -> "getCompileTimeDesc"/"getInitializationDesc" or something like that, just to help people to use the method properly and avoid mistakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me these all just two different workarounds, neither better nor worse.
The only 'real' fix for this problem is to make selectedPrimitiveDescriptor related data structures temporary (which they actually are).
Collecting those descriptors, processing and applying them should be just an intermediate stage in a graph pipeline, like many other stages.
So, it should not be possible to access them simply because you don't have them at the inference stage at all.
And they should not be named "getCompileTimeDesc", because one will be able to access them only during CompileModel stage.
And still, currently nothing prevents the developer to access all these data structures using other public methods.
Nothing prevents Graph to access them at a wrong stage of the pipeline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then let's help the interface users avoid unnecessary mistakes and give the method more specific name like "getOriginalDesc" or "getInitialDesc", or any other variate that will help to understand the purpose of this call.
src/plugins/intel_cpu/src/graph.h
Outdated
/** | ||
* Obsolete way of creating graph | ||
* To enable layout propagation and global memory reuse | ||
* two-stage creation should be used instead: | ||
* - Init() | ||
* - Allocate() | ||
*/ | ||
template<typename NET> | ||
void CreateGraph(NET &model, const GraphContext::CPtr context); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Can we use
OPENVINO_DEPRECATED("Obsolete way of creating graph")
macro - Why can't we simply remove these methods?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am just not 100% sure that we do not need some wrapper method for Init() + Activate() calls, because, again, I am not sure whether we will be able to utilize this two-step initialization in scope of CompiledModel / InferRequest, before we actually try to do so.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I can see, after the changes made in this PR, there remain only two places where CreateGraph
is called: executor_factory.hpp
and graph_emitter.hpp
, both of which are just place holders for future functionality, so may be easily changed. Therefore, it looks like CreateGraph(NET &model, const GraphContext::CPtr context)
can be just removed. However it may require a few changes to the unit tests. Actually may be done in a separate PR, I'm just afraid we simply forget about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me rephrase it. I am not sure we don't want to keep a simplified version of the graph creation. When one (i.e. unit tests) does not want to call two functions explicitly, because one does not care about layout propagation and memory reuse.
From API point of view, it also seems fine to have two versions:
- Just create me a graph (CreateGraph()) using some default memory configuration
- Prepare this, apply that, setup something and then create me a more proper / optimized version of graph (Init() -> Activate())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are not sure, then just leave this method as it is. If we are sure that this is absolute, then remove it. As simple as that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated the comments for CreateGraph method
virtual bool canBeSkipped() const { | ||
return getSelectedPrimitiveDescriptor()->hasZeroInputDims(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name is rather vague. Where can it be skipped from? Apparently we need to change name to make it more clear.
What about isNop
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's move this naming discussion to the end of the queue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is not the right time?
src/plugins/intel_cpu/tests/functional/cmake/target_per_test.cmake
Outdated
Show resolved
Hide resolved
408e748
to
11a5115
Compare
src/plugins/intel_cpu/src/graph.cpp
Outdated
// @todo can we add some meaningful assert here? | ||
for (auto &edge : cluster) { | ||
if (edge->getParent()->isConstant() && edge->getStatus() == Edge::Status::NeedAllocation) { | ||
allocateConstantEdge(edge); | ||
} | ||
} | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To my understanding, only the base edge may have status NeedAllocation
, and it's already know as the result of the search above. Do we really need to iterate once again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it true for the constants as well? I was trying to replicate the logic of the original loop, where possible, to not break anything.
Will try to remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maxnick Can we apply it for the String logic as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we can. The whole idea of the cluster is that it's a set of edges with the same memory block. And this memory block is located inside a specific memory object of a specific edge. So there can not be two edges with status NeedAllocation
otherwise it's not clear what memory block should be shared across edges in the cluster.
const auto& subgraphInputMemory = subgraphInputNode->getDstMemoryAtPort(0); | ||
auto mem = std::make_shared<Memory>(getEngine(), subgraphInputMemory->getDescPtr(), subgraphInputMemory->getMemoryBlock()); | ||
subgraphMemoryPtrs.push_back(mem); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we then simply store subgraphInputMemory
in the subgraphMemoryPtrs
in order to update it's memory descriptor during inference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought this is one of the ideas behind the workaround - to not use original memory.
Am I missing something?
Or you mean it is not applicable anymore based on this PR changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thing is, on master mem
is used inside the subgraph: we create it first using the the memory block from the parent graph, store it in subgraphMemoryPtrs
and then we put it to the subgraph via the Activate
call. But in the PR we bind the edges, not the memory objects, so that they share the common memory block, but have different memory objects automatically (we don't need to create new memory objects for the subgraph input edges and put it to the subgraph via Activate
). So the mem
is simply stored in subgraphMemoryPtrs
and share the same memory block, but it's not the same memory object from the subgraph. However, we still have to transfer the shape, as the internal subgraph input memory object is different from what is used in the parent graph. Someone needs to transfer the shape. Thus. Instead of creating the separate mem
objects, we need to store the input memory objects of the internal subgraph to access them efficiently and update the shapes.
7f3fadf
to
9e170ae
Compare
This PR will be closed in a week because of 2 weeks of no activity. |
This PR will be closed in a week because of 2 weeks of no activity. |
9e2648a
to
efb84d1
Compare
Sync node indexes must be registered to global allocation context in order.
af3a926
to
705eabf
Compare
Details:
Tickets: