[CPU] Enable memory reuse for nested graphs #27521

EgorDuplensky · 2024-11-12T10:53:06Z

Details:

All the nested graphs are now must be a part of a global memory reuse logic
The core logic of the memory reuse is untouched (a bit refactored)
Instead of solving memory reuse for every graph / subgraph, now all the edges and global execution indices are collected from the "virtually flatten" graph first and then memory reuse is solved once for a model.
All the nodes with nested graphs are updated, including:
1. LoRa
2. Composite
3. If
4. TensorIterator
5. Convolution + Sum fallback subgraph

Tickets:

ticket-id

EgorDuplensky · 2024-11-13T13:47:42Z

@maxnick Ready for review. Could you please take a look?

EgorDuplensky · 2024-11-14T10:51:34Z

Added a fix for Convolution + Sum fallback graph.
Now such graph is also a part of memory reuse.

src/plugins/intel_cpu/src/compiled_model.cpp

maxnick · 2024-11-22T13:27:05Z

src/plugins/intel_cpu/src/edge.h

@@ -82,6 +82,7 @@ class Edge {
    }

    std::string name() const;
+    const MemoryDesc& getDesc() const;


This method is private on purpose. As long as it's private, the node implementation may retrieve a tensor descriptor from edge only via the Memory object itself. But moving this method to public scope now introduces two ways of getting a memory descriptor, using getDesc and using the memory object. Of course these two approaches provide different memory descriptors, but making getDesc increases the code complexity by providing more degrees of freedom. Is it strictly necessary to extract this method to the public scope?

I have added asserts, so this method can be called only before memory is allocated.
This is actually more robust way, than just using friend functionality and relying on the fact the graph is not going to use it in a wrong way.
Also, there is no way to prevent a developer from accessing all these data structures via chain of another existing public methods (getParent()->getSelectedPrimitiveDescriptor()->etc).

This is more robust only from the scope change standpoint. But it doesn't help the user of the interface to use the object in a particular way as it gives more degrees of freedom, which results in more possibilities of making a mistake.
It looks like this method has been moved to the public scope to be accessed from the auxiliary static functions, which are established to keep graph code more clear. Even though the downsides of changing the scope of this method outweigh the benefits, it's clear that it would be excessive to solve the problem of accessing the memory descriptor via special routines that would basically do the same thing as this method. So we have two solutions here

Convert the static routines to private Graph methods, so that getDesc call being private still remains accessible as Graph is still a "friend" of Edge class

Rename the method, so that it becomes clear for the user of the interface when to use it. "getDesc" -> "getCompileTimeDesc"/"getInitializationDesc" or something like that, just to help people to use the method properly and avoid mistakes.

For me these all just two different workarounds, neither better nor worse.
The only 'real' fix for this problem is to make selectedPrimitiveDescriptor related data structures temporary (which they actually are).
Collecting those descriptors, processing and applying them should be just an intermediate stage in a graph pipeline, like many other stages.
So, it should not be possible to access them simply because you don't have them at the inference stage at all.
And they should not be named "getCompileTimeDesc", because one will be able to access them only during CompileModel stage.
And still, currently nothing prevents the developer to access all these data structures using other public methods.
Nothing prevents Graph to access them at a wrong stage of the pipeline.

Then let's help the interface users avoid unnecessary mistakes and give the method more specific name like "getOriginalDesc" or "getInitialDesc", or any other variate that will help to understand the purpose of this call.

src/plugins/intel_cpu/src/edge.h

maxnick · 2024-11-22T13:37:03Z

src/plugins/intel_cpu/src/graph.h

+    /**
+     * Obsolete way of creating graph
+     * To enable layout propagation and global memory reuse
+     * two-stage creation should be used instead:
+     * - Init()
+     * - Allocate()
+     */
    template<typename NET>
    void CreateGraph(NET &model, const GraphContext::CPtr context);


Can we use OPENVINO_DEPRECATED("Obsolete way of creating graph") macro

Why can't we simply remove these methods?

I am just not 100% sure that we do not need some wrapper method for Init() + Activate() calls, because, again, I am not sure whether we will be able to utilize this two-step initialization in scope of CompiledModel / InferRequest, before we actually try to do so.
What do you think?

As I can see, after the changes made in this PR, there remain only two places where CreateGraph is called: executor_factory.hpp and graph_emitter.hpp, both of which are just place holders for future functionality, so may be easily changed. Therefore, it looks like CreateGraph(NET &model, const GraphContext::CPtr context) can be just removed. However it may require a few changes to the unit tests. Actually may be done in a separate PR, I'm just afraid we simply forget about it.

Let me rephrase it. I am not sure we don't want to keep a simplified version of the graph creation. When one (i.e. unit tests) does not want to call two functions explicitly, because one does not care about layout propagation and memory reuse.
From API point of view, it also seems fine to have two versions:

Just create me a graph (CreateGraph()) using some default memory configuration

Prepare this, apply that, setup something and then create me a more proper / optimized version of graph (Init() -> Activate())

If we are not sure, then just leave this method as it is. If we are sure that this is absolute, then remove it. As simple as that.

I have updated the comments for CreateGraph method

src/plugins/intel_cpu/src/graph.h

maxnick · 2024-11-22T15:14:38Z

src/plugins/intel_cpu/src/node.h

+    virtual bool canBeSkipped() const {
+        return getSelectedPrimitiveDescriptor()->hasZeroInputDims();
+    }


The name is rather vague. Where can it be skipped from? Apparently we need to change name to make it more clear.
What about isNop?

Let's move this naming discussion to the end of the queue.

Is not the right time?

src/plugins/intel_cpu/src/nodes/composite.cpp

src/plugins/intel_cpu/src/nodes/conv.cpp

src/plugins/intel_cpu/src/nodes/lora.cpp

src/plugins/intel_cpu/tests/functional/cmake/target_per_test.cmake

src/plugins/intel_cpu/src/graph.cpp

maxnick · 2024-12-03T17:40:14Z

src/plugins/intel_cpu/src/graph.cpp

+                               // @todo can we add some meaningful assert here?
+                               for (auto &edge : cluster) {
+                                   if (edge->getParent()->isConstant() && edge->getStatus() == Edge::Status::NeedAllocation) {
+                                       allocateConstantEdge(edge);
+                                   }
+                               }
+                               return false;


To my understanding, only the base edge may have status NeedAllocation, and it's already know as the result of the search above. Do we really need to iterate once again?

Is it true for the constants as well? I was trying to replicate the logic of the original loop, where possible, to not break anything.
Will try to remove it.

@maxnick Can we apply it for the String logic as well?

Yes, we can. The whole idea of the cluster is that it's a set of edges with the same memory block. And this memory block is located inside a specific memory object of a specific edge. So there can not be two edges with status NeedAllocation otherwise it's not clear what memory block should be shared across edges in the cluster.

src/plugins/intel_cpu/src/graph.cpp

maxnick · 2024-12-04T14:59:26Z

src/plugins/intel_cpu/src/nodes/lora.cpp

+        const auto& subgraphInputMemory = subgraphInputNode->getDstMemoryAtPort(0);
+        auto mem = std::make_shared<Memory>(getEngine(), subgraphInputMemory->getDescPtr(), subgraphInputMemory->getMemoryBlock());
        subgraphMemoryPtrs.push_back(mem);


Can't we then simply store subgraphInputMemory in the subgraphMemoryPtrs in order to update it's memory descriptor during inference?

I thought this is one of the ideas behind the workaround - to not use original memory.
Am I missing something?
Or you mean it is not applicable anymore based on this PR changes?

The thing is, on master mem is used inside the subgraph: we create it first using the the memory block from the parent graph, store it in subgraphMemoryPtrs and then we put it to the subgraph via the Activate call. But in the PR we bind the edges, not the memory objects, so that they share the common memory block, but have different memory objects automatically (we don't need to create new memory objects for the subgraph input edges and put it to the subgraph via Activate). So the mem is simply stored in subgraphMemoryPtrs and share the same memory block, but it's not the same memory object from the subgraph. However, we still have to transfer the shape, as the internal subgraph input memory object is different from what is used in the parent graph. Someone needs to transfer the shape. Thus. Instead of creating the separate mem objects, we need to store the input memory objects of the internal subgraph to access them efficiently and update the shapes.

src/plugins/intel_cpu/src/graph.cpp

src/plugins/intel_cpu/src/nodes/conv.cpp

github-actions · 2024-12-26T00:23:08Z

This PR will be closed in a week because of 2 weeks of no activity.

github-actions · 2025-01-17T00:22:58Z

This PR will be closed in a week because of 2 weeks of no activity.

Sync node indexes must be registered to global allocation context in order.

EgorDuplensky requested review from a team as code owners November 12, 2024 10:53

github-actions bot added category: inference OpenVINO Runtime library - Inference category: CPU OpenVINO CPU plugin category: build OpenVINO cmake script / infra labels Nov 12, 2024

EgorDuplensky changed the title ~~Enable memory reuse for nested graps~~ Enable memory reuse for nested graphs Nov 12, 2024

EgorDuplensky changed the title ~~Enable memory reuse for nested graphs~~ [CPU] Enable memory reuse for nested graphs Nov 12, 2024

EgorDuplensky force-pushed the enable_memory_reuse_for_nested_graphs branch from da1d1a0 to 18563de Compare November 12, 2024 15:27

EgorDuplensky mentioned this pull request Nov 12, 2024

[CPU] Refactor memory control and allocation #27259

Closed

EgorDuplensky force-pushed the enable_memory_reuse_for_nested_graphs branch 2 times, most recently from 4734e61 to e140018 Compare November 13, 2024 13:32

github-actions bot removed the category: inference OpenVINO Runtime library - Inference label Nov 13, 2024

EgorDuplensky assigned maxnick Nov 13, 2024

EgorDuplensky force-pushed the enable_memory_reuse_for_nested_graphs branch from e140018 to d5bda07 Compare November 13, 2024 13:51

maxnick requested changes Nov 22, 2024

View reviewed changes

EgorDuplensky force-pushed the enable_memory_reuse_for_nested_graphs branch 2 times, most recently from 408e748 to 11a5115 Compare November 25, 2024 15:28

maxnick reviewed Dec 3, 2024

View reviewed changes

maxnick reviewed Dec 4, 2024

View reviewed changes

EgorDuplensky force-pushed the enable_memory_reuse_for_nested_graphs branch 2 times, most recently from 7f3fadf to 9e170ae Compare December 10, 2024 17:56

maxnick reviewed Dec 11, 2024

View reviewed changes

src/plugins/intel_cpu/src/graph.cpp Show resolved Hide resolved

src/plugins/intel_cpu/src/nodes/conv.cpp Show resolved Hide resolved

github-actions bot added the Stale label Dec 26, 2024

mg-intel removed the Stale label Jan 2, 2025

github-actions bot added the Stale label Jan 17, 2025

mg-intel removed the Stale label Jan 17, 2025

EgorDuplensky force-pushed the enable_memory_reuse_for_nested_graphs branch 9 times, most recently from 9e2648a to efb84d1 Compare January 21, 2025 19:24

EgorDuplensky added 2 commits January 25, 2025 23:42

[CPU] Enable memory reuse for nested graphs

e14c842

Fix e2e issue

705eabf

Sync node indexes must be registered to global allocation context in order.

EgorDuplensky force-pushed the enable_memory_reuse_for_nested_graphs branch from af3a926 to 705eabf Compare January 25, 2025 22:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CPU] Enable memory reuse for nested graphs #27521

[CPU] Enable memory reuse for nested graphs #27521

EgorDuplensky commented Nov 12, 2024 •

edited

Loading

EgorDuplensky commented Nov 13, 2024

EgorDuplensky commented Nov 14, 2024

maxnick Nov 22, 2024

EgorDuplensky Nov 25, 2024 •

edited

Loading

maxnick Dec 3, 2024

EgorDuplensky Dec 4, 2024 •

edited

Loading

maxnick Dec 11, 2024

maxnick Nov 22, 2024

EgorDuplensky Nov 25, 2024

maxnick Dec 3, 2024

EgorDuplensky Dec 4, 2024 •

edited

Loading

maxnick Dec 11, 2024

EgorDuplensky Dec 11, 2024

maxnick Nov 22, 2024

EgorDuplensky Nov 25, 2024

maxnick Dec 11, 2024

maxnick Dec 3, 2024

EgorDuplensky Dec 4, 2024

EgorDuplensky Dec 9, 2024

maxnick Dec 11, 2024

maxnick Dec 4, 2024

EgorDuplensky Dec 9, 2024

maxnick Dec 11, 2024

github-actions bot commented Dec 26, 2024

github-actions bot commented Jan 17, 2025

[CPU] Enable memory reuse for nested graphs #27521

Are you sure you want to change the base?

[CPU] Enable memory reuse for nested graphs #27521

Conversation

EgorDuplensky commented Nov 12, 2024 • edited Loading

Details:

Tickets:

EgorDuplensky commented Nov 13, 2024

EgorDuplensky commented Nov 14, 2024

Choose a reason for hiding this comment

EgorDuplensky Nov 25, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

EgorDuplensky Dec 4, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

EgorDuplensky Dec 4, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Dec 26, 2024

github-actions bot commented Jan 17, 2025

EgorDuplensky commented Nov 12, 2024 •

edited

Loading

EgorDuplensky Nov 25, 2024 •

edited

Loading

EgorDuplensky Dec 4, 2024 •

edited

Loading

EgorDuplensky Dec 4, 2024 •

edited

Loading