-
Notifications
You must be signed in to change notification settings - Fork 155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Atomic refactor for lock-free ringbuffer [Foxy] #302
Atomic refactor for lock-free ringbuffer [Foxy] #302
Conversation
… locks, port unique_lock creation into constructor and only perform lock/unlock operations in callbacks. * Both blocking read and write callbacks continue to rely on condition variables for data availability. * The write_overwrite callback now relies only on the atomic operations without locks, and does not push the read index if overwriting. * The read_timeout callback will simply not perform the callback instead of returning a nullptr. * Implemented true nonblocking read_nonblock and write_nonblock callbacks, which rely only on the atomic operations and return immediately either performing or not performing the callback.
…lementation. * Adjusted current tests for the lock-free behavior. * Added tests for the non-blocking implementation. * Implemented new tests for scenarios, where one thread always runs slower than the other.
…rDriver as well. * Utilizing the PacketMsg data structure to have the pointed-to buffer data transferred immediately once received and use the stable copy for further processing, instead of a reference to potentially mutable underlying data.
UpdateThis took a bit longer than I initially wanted. There were some digressions, regressions and fixes, but now I believe it should be even more stable than the community version. When experimenting with the edge cases (multiple subscribers, buffer overfills and recoveries back to normal), I often encountered cases when older packets were being sent to the
This worked fine and does drop the old packets so batching can recover, however, the root cause of the issue was a bug in the community driver's ringbuffer. The community version basically uses an overwriting implementation. But if the buffer was already fully overwritten the reader would be allowed to overtake the writer and read out older packets, giving the batcher older This driver (and this PR refactor) would not allow that when using the overwrite method, sice the reader is not allowed to access the index held by the writer and would, therefore, not move past it. I had a small setback here, because my initial handling of the
So currently, I am not using any changes in the SDK code, since all of my problems were solved in this PR by better ringbuffer handling. But, I'm not sure if the wider history checks (as shown above) wouldn't be something that's desired to also have in the SDK. In my testing getting older frames in the batcher practically never happens if the ringbuffer reading is done correctly, so the loop would add additional processing under normal operation. But if it does happen it was usually more than just one previous frame, so checking just for Regarding the reading/writing rules from #302 (comment), one might ask: why allow the write operations at any time and limit the reading operations? Wouldn't it be safer to limit the unsafe write operations and allow reading at any time?
Additional changesCommit 455aff2: I made the ValidationAll of these validations were using the
I currently have all the debug printing separate in a devel branch. I'm not sure if it would be useful to have something similar in the driver. The read/write counting and buffer status reports could be done at the driver level. However, the packet id counting would make more sense directly from the SDK in Final remarksThis refactor remedies the issue with the reader holding a lock on the buffer for the whole processing duration and instead allows the writer to continue filling the buffer, so we don't end up with missing lidar chunks under normal operation. However, when the processing time of the reader does get too high (for ex. with more subscribers), the buffer will eventually overfill and this will still cause less than ideal states. Simply increasing the buffer size is probably not a viable solution since the rate of filling is non-zero and the buffer would overfill eventually either way. So the second half of the problem - preventing the buffer from getting overfilled - still needs to be solved.
I've also done some quick CPU load tests. I am only measuring the total CPU usage with the There does appear to be more CPU usage with the new ringbuffer. I But I think it does make sense, since with this implementation, the threads should be spending more time doing work and less time waiting. Finally, I performed a direct refactor of the |
ad556dd
to
455aff2
Compare
I see that this thing has been quiet for some time now. Just checking in, @Samahu are there any further remarks or changes to be made? |
Hi @Imaniac230, sorry I was busy with other tasks. I am taking a look today. |
@Imaniac230 I am still evaluating your contribution, given the complexity of your solution I need a little more time before committing. Please note that I have been working on a solution that addresses the problem differently, you can view the prototype for ROS1 here, note it is still a work in progress in that I didn't get to fully replace the existing ThreadSafeRingBuffer yet. I will compare the performance of the two solution and let you know of my decision. Thanks for your patience. |
Sure, I'll be looking forward to your thoughts. |
@Samahu, I am assuming that this is no longer relevant with the recent merges. Or is there still something useful that could be taken from this PR? I hope it's ok if I write some of my thoughts here. Also, sorry if I'm writing this too late, I got preoccupied with some other stuff. When testing #320 on my
My solution here is also similar:
I also tested both solutions with With #321:
My solution here:
I'm not sure if the issue is now considered finished or if there is still some followup work. I was initially thinking that both of these solutions could be combined. But when the lidar scan processing is just offloaded to a separate thread with a separate Maybe it could be done so that the lower |
@Imaniac230 I would say this PR no longer relevant since the package has progressed. We do have some follow up work to improve our packet buffering and scan batching planned. I merged my MR as is even though I had one TODO item since many of our users wanted a working solution. I went with my approach since it was simpler and uses less locks and I had to shift my focus to other work. As I noted in the description of my PR anytime if subscribers take more time to consume a LidarScan than the time is needed to generate one .. there is no workaround other starting to throttle or drop scans. Regarding ROS2 Humble, I don't have a good assessment of the behavior under this distro with multiple subscribers. Also the performance seem to differ based on the underlying DDS used. Since this solution is no longer relevant I am going to close this PR, you can wait for another update (no ETA yet) to the ouster-ros regarding this issue or submit an improvement to the current mainline. |
Related Issues & PRs
Summary of Changes
A proposed refactor of the
ThreadSafeRingBuffer
usingstd::atomic
variables for lock-free access.Changes:
read
andwrite
methods continue to rely on condition variables for bufferfull
andempty
estimation. The locks are only held for the duration ofwait
and oncewait
is finished, the operations are performed atomically.write_overwrite
method is now non-blocking, meaning it will atomically overwrite the last item imemdiately. This will not push theread_idx
forward if an overwrite occurs.read_timeout
method continues to rely on the condition variable for bufferempty
estimation. Lock is only held for the duration ofwait_for
. If thewait_for
is successful, it will perform the read atomically. If thewait_for
timeouts, it will simply not perform the callback (instead of returning anullptr
) and unlock the lock.write_nonblock
andread_nonblock
methods, which return immediately and only perform theread
orwrite
operation if theempty
andfull
conditions are met. These conditions are checked atomically.write_idx
and might be performing a write. If the index was incremented afterwards, it's current/actual value would be basically meaningless.I've taken some liberty with the following changes, which may not be desired but make more sense in my limited understanding:
nullptr
in any case. If the corresponding conditions for a nonblocking or timeouted operation are not met we simply don't perform the callback operation and don't modify the buffer active size. I find it safer to just ignore the callback instead of returning an invalid pointer, which then must be handled safely by the caller.Validation
OS-0-128, sn: 122312000594, firmware rev: v3.0.1
):write
/read
write_overwrite
/read_timeout
write_nonblock
/read_nonblock
write
/read
write_overwrite
/read_timeout
write_nonblock
/read_nonblock