Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recycle serialization buffers on transmission #342

Open
wants to merge 6 commits into
base: rolling
Choose a base branch
from

Conversation

fuzzypixelz
Copy link
Contributor

Adds a LIFO buffer pool in the context to reuse buffers allocated on serialization. The aim is not (only) to avoid the overhead of dynamic allocation but rather to enhance the cache locality of serialization buffers.

@fuzzypixelz
Copy link
Contributor Author

fuzzypixelz commented Dec 16, 2024

The aim of this pull request is to fix high latency for relatively large topics (e.g. 1 MB tagus) in the single-process iRobot benchmark with --ipc off. Results were obtained using the Mont Blanc topology on an Ubuntu 24.04.1 LTS machine with an Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz.

dd82e84

node topic size_b mean_us min_us max_us
ponce tagus 1000000 3016 2512 5705
geneva tagus 1000000 3513 2997 6457
mandalay tagus 1000000 2678 2214 5233

401016c

node topic size_b mean_us min_us max_us
ponce tagus 1000000 1947 1476 2788
geneva tagus 1000000 2200 1667 3063
mandalay tagus 1000000 1603 1225 1997

401016c + eed223a

node topic size_b mean_us min_us max_us
ponce tagus 1000000 393 257 1301
geneva tagus 1000000 657 438 1701
mandalay tagus 1000000 108 68 650

@fuzzypixelz fuzzypixelz changed the title Recycle serialization buffers on transmission. Recycle serialization buffers on transmission Dec 16, 2024
@clalancette
Copy link
Collaborator

All right, now that we've merged in #327 , we can consider this one. Please rebase this onto the latest, then we can do a full review of it. Until then, I'll mark it as a draft.

@clalancette clalancette marked this pull request as draft December 17, 2024 21:41
@YuanYuYuan YuanYuYuan mentioned this pull request Dec 18, 2024
@fuzzypixelz fuzzypixelz force-pushed the buffer-pool branch 2 times, most recently from 068cf50 to 21006d0 Compare December 19, 2024 11:32
Copy link
Contributor

@ahcorde ahcorde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are also many changes unrelated with the goal of this PR

rmw_zenoh_cpp/src/detail/buffer_pool.hpp Outdated Show resolved Hide resolved
@fuzzypixelz fuzzypixelz force-pushed the buffer-pool branch 2 times, most recently from 7ca544b to bb6fd88 Compare December 19, 2024 14:46
@fuzzypixelz
Copy link
Contributor Author

There are also many changes unrelated with the goal of this PR

There was a formatting error from my IDE. I've restored the files and manually re-applied the patches.

@fuzzypixelz fuzzypixelz force-pushed the buffer-pool branch 5 times, most recently from 8dd9bf5 to bcc36a1 Compare December 20, 2024 16:00
Copy link
Collaborator

@clalancette clalancette left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides the comments inline, do you have any updated performance numbers here?

rmw_zenoh_cpp/src/detail/buffer_pool.cpp Outdated Show resolved Hide resolved
rmw_zenoh_cpp/src/detail/buffer_pool.cpp Show resolved Hide resolved
///=============================================================================
BufferPool::~BufferPool()
{
rcutils_allocator_t allocator = rcutils_get_default_allocator();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to require some additional plumbing, but I think we should respect the allocator passed in via the options during rmw_init. To get to that, we'll have to change the constructor of rmw_context_impl_s to pass that into the BufferPool constructor, and then we can store the pointer in this class and use it as necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should be using the allocator passed in rmw_init_options_s for all allocation in the RMW.

  1. According to the documentation, rmw_init_options_s::allocator is "[The] allocator used during internal allocation of init options, if needed.". So by the same rmw_init is called, we have no business allocating through it.
  2. Other RMW implementations don't use rmw_init_options_s::allocator either. Instead they use rcutils_get_default_allocator and rmw_allocate.
  3. The only place where this allocator seems useful is in rmw_init_options_fini.

We also allocate many std::vector and std::string without the default allocator anyway, which seems wrong.

I can make a subsequent pull request addressing this issue.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing with the allocators is that always using rcutils_get_default_allocator is useless. The default allocator is just malloc, realloc, free, etc. So you may as well use those.

Memory allocation is a tricky subject here. The original goal of the rcutils_allocator was to make it so that consumers of the RMW API could add in an allocator that is not the default (like a pool allocator). Early on in rmw_zenoh development, we were very careful to use the passed-in allocator everywhere. As we've switched to C++ in more places, that has somewhat fallen by the wayside.

@Yadunund What's your thinking here? Should we give up on the rcutils_allocator and just use C++ builtins everywhere? Or should we move in the direction of trying to honor rcutils_allocator as much as we can?

@fuzzypixelz
Copy link
Contributor Author

Besides the comments inline, do you have any updated performance numbers here?

The following results were obtained using the Mont Blanc topology (with a 1 MiB tagus topic size) on an Ubuntu 24.04.1 LTS machine with an 12th Gen Intel(R) Core(TM) i5-1240P (different machine).

57a6b4b

node topic size_b mean_us sd_us min_us max_us
ponce tagus 1000000 839 255 235 1227
geneva tagus 1000000 1053 285 327 1490
mandalay tagus 1000000 643 252 121 1011

8ba5278

node topic size_b mean_us sd_us min_us max_us
ponce tagus 1000000 545 192 173 1091
geneva tagus 1000000 702 193 253 1326
mandalay tagus 1000000 342 189 86 771

cebb972

node topic size_b mean_us sd_us min_us max_us
ponce tagus 1000000 368 162 87 879
geneva tagus 1000000 522 162 127 1071
mandalay tagus 1000000 188 159 38 646

cebb972 + 8ba5278

node topic size_b mean_us sd_us min_us max_us
ponce tagus 1000000 224 31 60 313
geneva tagus 1000000 379 52 99 528
mandalay tagus 1000000 58 10 20 87

@fuzzypixelz fuzzypixelz marked this pull request as ready for review January 3, 2025 15:20
@Yadunund Yadunund self-assigned this Jan 3, 2025
fuzzypixelz and others added 5 commits January 17, 2025 09:58
Adds a bounded LIFO buffer pool in the context to reuse buffers
allocated on serialization. The aim is not (only) to avoid the
overhead of dynamic allocation but rather to enhance the cache
locality of serialization buffers.
Co-authored-by: Chris Lalancette <[email protected]>
Signed-off-by: Mahmoud Mazouz <[email protected]>
Comment on lines +442 to +445
The RMW recycles serialization buffers on transmission using a buffer pool with bounded memory
usage. These buffers are returned to the pool — without being deallocated — once they cross the
network boundary in host-to-host communication, or after transmission in inter-process
communication, or upon being consumed by subscriptions in intra-process communication, etc.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The RMW recycles serialization buffers on transmission using a buffer pool with bounded memory
usage. These buffers are returned to the pool — without being deallocated — once they cross the
network boundary in host-to-host communication, or after transmission in inter-process
communication, or upon being consumed by subscriptions in intra-process communication, etc.
The RMW recycles serialization buffers on transmission using a buffer pool with bounded memory
usage.
These buffers are returned to the pool - without being deallocated - once they cross the
network boundary in host-to-host communication, or after transmission in inter-process
communication, or upon being consumed by subscriptions in intra-process communication, etc.


When the total size of the allocated buffers within the pool exceeds
`RMW_ZENOH_BUFFER_POOL_MAX_SIZE_BYTES`, serialization buffers are allocated using the system
allocator and moved to Zenoh; no recyling is performed in this case to prevent the buffer pool from
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
allocator and moved to Zenoh; no recyling is performed in this case to prevent the buffer pool from
allocator and moved to Zenoh; no recycling is performed in this case to prevent the buffer pool from

Comment on lines +452 to +453
The default value of `RMW_ZENOH_BUFFER_POOL_MAX_SIZE_BYTES` is roughly proportionate to the cache
size of a typical modern CPU.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The default value of `RMW_ZENOH_BUFFER_POOL_MAX_SIZE_BYTES` is roughly proportionate to the cache
size of a typical modern CPU.
The default value of `RMW_ZENOH_BUFFER_POOL_MAX_SIZE_BYTES` is 16MB; this value was chosen since it is roughly the size of the cache in a modern CPU.

const char * env_value;
const char * error_str = rcutils_get_env("RMW_ZENOH_BUFFER_POOL_MAX_SIZE_BYTES", &env_value);
if (error_str != nullptr) {
RMW_ZENOH_LOG_ERROR_NAMED(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is somewhat pedantic, but I think this should be a WARN since we are continuing on anyway.

///=============================================================================
BufferPool::~BufferPool()
{
rcutils_allocator_t allocator = rcutils_get_default_allocator();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing with the allocators is that always using rcutils_get_default_allocator is useless. The default allocator is just malloc, realloc, free, etc. So you may as well use those.

Memory allocation is a tricky subject here. The original goal of the rcutils_allocator was to make it so that consumers of the RMW API could add in an allocator that is not the default (like a pool allocator). Early on in rmw_zenoh development, we were very careful to use the passed-in allocator everywhere. As we've switched to C++ in more places, that has somewhat fallen by the wayside.

@Yadunund What's your thinking here? Should we give up on the rcutils_allocator and just use C++ builtins everywhere? Or should we move in the direction of trying to honor rcutils_allocator as much as we can?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants