Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Luocheng/vpux/prealloc mem kmb #1

Open
wants to merge 4 commits into
base: releases/vpux/2021/3
Choose a base branch
from

Conversation

luo-cheng2021
Copy link
Owner

@luo-cheng2021 luo-cheng2021 commented Feb 26, 2021

Details:

  • Add prealloc image memory in remote device feature
    When passing -use_remote_mem in command line the inferrequest will allocate remote memory using HddlUnite. Then all infer will use the remote memory and there is no need to copy the image memory from IA to remote device. The creation steps:
    -- load image from the specified folder
    -- allocate remote memory and copy the image to it
    -- set remote memory handle
    -- do the infer request in the benchmark loop

add_definitions(-DUSE_PREALLOC_MEM)
set(HDDL2_DEP "HddlUnite::HddlUnite")
else()
message(WARNING "hddl2_params.hpp could not find. Preallocate in KMB feature is disabled.")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove specified string, such as "KMB".

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -85,6 +85,7 @@ Options:
-t Optional. Time, in seconds, to execute topology.
-progress Optional. Show progress bar (can affect performance measurement). Default values is "false".
-shape Optional. Set shape for input. For example, "input1[1,3,224,224],input2[1,4]" or "[1,3,224,224]" in case of one input size.
-use_prealloc_mem Optional. Prealloc remote memory in xBay to execute infer request.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it will be better if use "-use_remote_mem"?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -97,6 +97,11 @@ static const char load_config_message[] = "Optional. Path to XML/YAML/JSON file
static const char dump_config_message[] = "Optional. Path to XML/YAML/JSON file to dump IE parameters, which were set by application.";
#endif

#ifdef USE_PREALLOC_MEM
// @brief message for preallocing memory option
static const char use_prealloc_mem_message[] = "Optional. Prealloc remote memory in xBay to execute infer request.";

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static const char use_prealloc_mem_message[] = "Optional. Prealloc remote memory in device to execute infer request."

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

size_t width;
size_t height;
remoteIE.GetWxH(width, height);
const size_t nv12Size = width * height * 3 / 2 * batchSize;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need support pure NN without preprocess, so the input can be RGB_Plannar that can feed into NN directly.
NV12 buffer will go into PP first and then to NN.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's first support pure inference without PP.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pure inference without pp done.

@@ -324,3 +326,89 @@ void fillBlobs(const std::vector<std::string>& inputFiles,
}
}
}

#ifdef USE_PREALLOC_MEM
void fillRemoteBlobs(RemoteHelper& remoteIE,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FillRemoteBlobsNV12

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fillRemoteBlobs removed. The function merged into fillBlobs.

auto minputHolder = minput->rmap();
auto inputBlobData = minputHolder.as<uint8_t*>();

BGR2NV12(inputBlobData, width, height, batchSize, data.get());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will do CSC + Resize or only CSC?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Preprocess removed.

using namespace InferenceEngine;

#define REMOTE_IMAGE_WIDTH 1920
#define REMOTE_IMAGE_HEIGHT 1080

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we need assign the input resolution if need do PP, the parameter can be put into benchmark input config file?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Preprocessing removed.

THROW_IE_EXCEPTION << "Could not open file: " << graphPath;
}
std::istream graphBlob(&blobFile);
return ie.ImportNetwork(graphBlob, _contextPtr);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the future we also need support online compiling IR. - call LoadNetwork()

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LoadNetwork supported.

@@ -205,4 +206,27 @@ void load_config(const std::string& filename,
}
}
}

void BGR2NV12(uint8_t* src, size_t width, size_t height, size_t imageNum, uint8_t* dst) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why convert RGB to NV12?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CSC removed.(no needed anymore)

@luo-cheng2021
Copy link
Owner Author

@riverlijunjie Preprocessing removed, please help to review, thanks.

ExecutableNetwork exeNetwork;

#ifdef USE_REMOTE_MEM
RemoteHelper remoteHelper;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get the device_name="VPUX"?

If yes, we can put the the init code block into "if(device == "VPUX")" like "CPU" or "GPU" below.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Move the initialization to the branch of the 'VPUX'.


using namespace InferenceEngine;

class RemoteHelper::Impl {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it better if name it RemoteContextHelper?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@mashoujiang
Copy link

As I understand the benchmark_app is image workload case, right?
If yes, why use workload context?
if use the workload context, the process will only scheduled to single device, considering one test case like: In SRB, we run the benchmark test, it will only be scheduled to one device.
Correct me if I misunderstand.

@riverlijunjie
Copy link

Video workload is used for E2E pipeline, but benchmark didn't provide such test case for it, especially for KPI. So we need some official KPI data for video workload.

@mangguo321
Copy link

Do we need to add CPU_THROUGHPUT_STREAMS and CPU_THREADS_NUM configuration in video case?

@luo-cheng2021
Copy link
Owner Author

luo-cheng2021 commented Mar 12, 2021

We can use the command line 'benchmark_app -load_config config.yml ...' to get support 'VPUX_THROUGHPUT_STREAMS, VPUX_INFERENCE_SHAVES and etc' and the config.yml is just like:

%YAML:1.0

VPUX: { VPUX_THROUGHPUT_STREAMS:"3", VPUX_INFERENCE_SHAVES:"16"}
`

@luo-cheng2021 luo-cheng2021 force-pushed the luocheng/vpux/prealloc_mem_kmb branch from 9528f90 to d7d1ef9 Compare April 15, 2021 07:04
@luo-cheng2021 luo-cheng2021 changed the base branch from releases/vpux/2021/2 to releases/vpux/2021/3 April 15, 2021 07:08
luo-cheng2021 pushed a commit that referenced this pull request Apr 19, 2021
ceciliapeng2011 added a commit that referenced this pull request Apr 22, 2021
Bym/pdpd frontend/op add relu & softmax
luo-cheng2021 pushed a commit that referenced this pull request Jul 1, 2021
* Moved cmake/templates to <root>

* Removed ngraph versioning, reused IE one

* Merged converage

* Removed duplicatde ngraph cmake options

* Moved dependencies to <root>/cmake

* Removed installing of VERSION

* Start #1

* cpack

* Added component type

* Added installation of tests targets

* Added ngraph tests target install

* Fixed runtime dependencies location

* Disable GNA unit tests

* Revert "Disable GNA unit tests"

This reverts commit da53986.

* Installed only core component

* Replaced ENABLE_DEV_PKG_INSTALL with EXCLUDE_FROM_ALL

* Removed extra cmake options
luo-cheng2021 pushed a commit that referenced this pull request Jul 19, 2021
luo-cheng2021 added a commit that referenced this pull request Aug 10, 2021
* [FrontEnd]enable pdpd ops conversion part3

* Add adaptive pool2d op conversion (#1)

* param support tensor (#2)

* add missing sync_batch_norm

* Update pow.cpp

* deal empty axis (#5)

* deal empty axis

* apply review comments

* fix code style

* fix code style

* change shape to i32

* fix code in shape

* fix code style

* fix paddle code style

* remove redandent ops

* fix maxAdativePool

* fix expand_v2

* remove redandent code

Co-authored-by: Mang Guo <[email protected]>
Co-authored-by: Luo Cheng <[email protected]>
luo-cheng2021 pushed a commit that referenced this pull request Oct 28, 2022
* remove reader tests #1

* remove reader tests #2

* remove reader tests #3

* remove reader tests #4

* Add clone_with_new_inputs to visitor tests

* fixes
luo-cheng2021 pushed a commit that referenced this pull request Mar 14, 2023
luo-cheng2021 pushed a commit that referenced this pull request Mar 16, 2023
luo-cheng2021 pushed a commit that referenced this pull request Jun 6, 2023
…f POT (openvinotoolkit#17398)

* Update model_optimization_guide.md

* Update model_optimization_guide.md

* Update model_optimization_guide.md

* Update model_optimization_guide.md

* Update model_optimization_guide.md

* Update model_optimization_guide.md

* Update model_optimization_guide.md

* Update home.rst

* Update ptq_introduction.md

* Update Introduction.md

* Update Introduction.md

* Update Introduction.md

* Update ptq_introduction.md

* Update ptq_introduction.md

* Update basic_quantization_flow.md

* Update basic_quantization_flow.md

* Update basic_quantization_flow.md

* Update quantization_w_accuracy_control.md

* Update quantization_w_accuracy_control.md

* Update quantization_w_accuracy_control.md

* Update quantization_w_accuracy_control.md

* Update quantization_w_accuracy_control.md

* Update quantization_w_accuracy_control.md

* Update quantization_w_accuracy_control.md

* Update quantization_w_accuracy_control.md

* Update quantization_w_accuracy_control.md

* Update basic_quantization_flow.md

* Update basic_quantization_flow.md

* Update quantization_w_accuracy_control.md

* Update basic_quantization_flow.md

* Update basic_quantization_flow.md

* Update model_optimization_guide.md

* Update ptq_introduction.md

* Update quantization_w_accuracy_control.md

* Update model_optimization_guide.md

* Update quantization_w_accuracy_control.md

* Update model_optimization_guide.md

* Update quantization_w_accuracy_control.md

* Update model_optimization_guide.md

* Update Introduction.md

* Update basic_quantization_flow.md

* Update basic_quantization_flow.md

* Update quantization_w_accuracy_control.md

* Update ptq_introduction.md

* Update Introduction.md

* Update model_optimization_guide.md

* Update basic_quantization_flow.md

* Update quantization_w_accuracy_control.md

* Update quantization_w_accuracy_control.md

* Update quantization_w_accuracy_control.md

* Update Introduction.md

* Update FrequentlyAskedQuestions.md

* Update model_optimization_guide.md

* Update Introduction.md

* Update model_optimization_guide.md

* Update model_optimization_guide.md

* Update model_optimization_guide.md

* Update model_optimization_guide.md

* Update model_optimization_guide.md

* Update ptq_introduction.md

* Update ptq_introduction.md

* added code snippet (#1)

* Update basic_quantization_flow.md

* Update basic_quantization_flow.md

* Update quantization_w_accuracy_control.md

* Update basic_quantization_flow.md

* Update basic_quantization_flow.md

* Update ptq_introduction.md

* Update model_optimization_guide.md

* Update basic_quantization_flow.md

* Update ptq_introduction.md

* Update quantization_w_accuracy_control.md

* Update basic_quantization_flow.md

* Update basic_quantization_flow.md

* Update basic_quantization_flow.md

* Update ptq_introduction.md

* Update ptq_introduction.md

* Delete ptq_introduction.md

* Update FrequentlyAskedQuestions.md

* Update Introduction.md

* Update quantization_w_accuracy_control.md

* Update introduction.md

* Update basic_quantization_flow.md code blocks

* Update quantization_w_accuracy_control.md code snippets

* Update docs/optimization_guide/nncf/ptq/code/ptq_torch.py

Co-authored-by: Alexander Suslov <[email protected]>

* Update model_optimization_guide.md

* Optimization docs proofreading  (#2)

* images updated

* delete reminder

* review

* text review

* change images to original ones

* Update filter_pruning.md code blocks

* Update basic_quantization_flow.md

* Update quantization_w_accuracy_control.md

* Update images (#3)

* images updated

* delete reminder

* review

* text review

* change images to original ones

* Update filter_pruning.md code blocks

* update images

* resolve conflicts

* resolve conflicts

* change images to original ones

* resolve conflicts

* update images

* fix conflicts

* Update model_optimization_guide.md

* Update docs/optimization_guide/nncf/ptq/code/ptq_tensorflow.py

Co-authored-by: Alexander Suslov <[email protected]>

* Update docs/optimization_guide/nncf/ptq/code/ptq_torch.py

Co-authored-by: Alexander Suslov <[email protected]>

* Update docs/optimization_guide/nncf/ptq/code/ptq_onnx.py

Co-authored-by: Alexander Suslov <[email protected]>

* Update docs/optimization_guide/nncf/ptq/code/ptq_aa_openvino.py

Co-authored-by: Alexander Suslov <[email protected]>

* Update docs/optimization_guide/nncf/ptq/code/ptq_openvino.py

Co-authored-by: Alexander Suslov <[email protected]>

* table format fix

* Update headers

* Update qat.md code blocks

---------

Co-authored-by: Alexander Suslov <[email protected]>
Co-authored-by: Tatiana Savina <[email protected]>
luo-cheng2021 pushed a commit that referenced this pull request Jan 10, 2024
* Remove `set_preprocess.cpp`

* Remove `preprocessing.hpp`

* Remove `locale.hpp` - ported to `CanCompileModelWithCustomLocale`

* Port `version.cpp` and remove legacy

* Revert shared `version.hpp`
luo-cheng2021 pushed a commit that referenced this pull request Jan 10, 2024
* Delete `ngraph/visibility.hpp`

* Delete `ngraph/log.hpp`

* Delete `ngraph/file_util.hpp`

* Delete `ngraph/type.hpp`

* Delete `ngraph/dimension.hpp`

* Delete `ngraph/coordinate.hpp`

* ClangFormat

* Fix build

* Fix pyngraph

* Remove comment

* Fix build
Copy link

github-actions bot commented Sep 6, 2024

This PR will be closed in a week because of 2 weeks of no activity.

@github-actions github-actions bot added the Stale label Sep 6, 2024
luo-cheng2021 pushed a commit that referenced this pull request Dec 27, 2024
### Details:
 - *item1*
	
CID 1529754: (#1 of 1): COPY_INSTEAD_OF_MOVE (COPY_INSTEAD_OF_MOVE)
1. copy_constructor_call: node_info_table is passed-by-value as
parameter to parse_freq_info_linux when it could be moved instead.
     	Use std::move(node_info_table) instead of node_info_table.
221                                  node_info_table,
222                                  _processors,
223                                  _numa_nodes,
224                                  _sockets,
225                                  _cores,
226                                  _proc_type_table,
227                                  _cpu_mapping_table);

### Tickets:
 - *ticket-id*
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants