Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add arguments for sendmmsg and recvmmsg #2027

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

Molter73
Copy link
Contributor

What type of PR is this?

/kind feature

Any specific area of the project related to this PR?

/area API-version

/area build

/area CI

/area driver-kmod

/area driver-bpf

/area driver-modern-bpf

/area libscap

/area libpman

/area libsinsp

/area tests

Does this PR require a change in the driver versions?

/version driver-API-version-major

/version driver-API-version-minor

/version driver-API-version-patch

/version driver-SCHEMA-version-major

/version driver-SCHEMA-version-minor

/version driver-SCHEMA-version-patch

What this PR does / why we need it:

Add argument processing for the sendmmsg and recvmmsg syscalls.

These are quite tricky, they behave the same way sendmsg and recvmsg do, but allowing for multiple messages to be sent/received in a single syscall. This breaks some invariants on how Falco processes events, for instance, a sendmmsg call could send messages to multiple destinations in connectionless UDP, which we would need multiple socket tuples to represent in userspace. To work around this I proposed issuing multiple events from the kernel. This has lead me to do 2 implementations:

  • The kernel module and legacy eBPF probe only process the first message in the syscall. This is due to both verifier issues on the eBPF probe and the impossibility to issue multiple event headers in a single syscall.
  • The modern BPF probe processes all events by using bpf_loop if available, or does a best effort with a regular loop otherwise.

The implementation is far from perfect and I'm not super happy with it, but it's the best I've managed to come up with so far. Any suggestions for improvement are welcome.

Which issue(s) this PR fixes:

Fixes #1636

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

Add arguments for sendmmsg and recvmmsg syscalls.

@Molter73
Copy link
Contributor Author

/cc @Andreagit97 @FedeDP

@poiana
Copy link
Contributor

poiana commented Aug 27, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Molter73

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link

Please double check driver/SCHEMA_VERSION file. See versioning.

/hold

@FedeDP
Copy link
Contributor

FedeDP commented Aug 28, 2024

Thanks Mauro!
Putting this in the TBD milestone because we won't have enough time to review/improve it in the next ~10days, and then we will have to tag libs and drivers for the next Falco release. But we will make sure to come at it as soon as possible.
/milestone TBD

@poiana poiana added this to the TBD milestone Aug 28, 2024
@Andreagit97
Copy link
Member

Hei, thank you! Before looking at this PR I was thinking of a tail call approach, so you start with a first bpf program that sends the first message and then you tail call until you have no more messages to send (the limit is 32 if I recall well). This approach should work well in the modern ebpf (have you already considered it?) but I see the issue that it could cause with the old drivers... we have no easy ways to create multiple headers and send multiple events inside the same filler, all the architecture is built on the assumption of sending just one event per filler... I understand that creating all these helpers just for two syscalls could be an overkill. Send just the first message/tuple could be a hypothesis for the old drivers, even if I don't like it so much. At least in the kernel module, I would like to send all the messages/tuples in some way. I will try to think if we can do something different at least for the kernel module, but I'm not sure about the outcome...

@Molter73
Copy link
Contributor Author

Thanks Mauro! Putting this in the TBD milestone because we won't have enough time to review/improve it in the next ~10days, and then we will have to tag libs and drivers for the next Falco release. But we will make sure to come at it as soon as possible. /milestone TBD

No worries, I have a bunch of CI failures to go through before this can go in anyways and I'm also not super happy about the implementation so we can discuss as long as it takes.

I was thinking of a tail call approach, so you start with a first bpf program that sends the first message and then you tail call until you have no more messages to send (the limit is 32 if I recall well). This approach should work well in the modern ebpf (have you already considered it?)

That was my first thought too, I discarded it because:

  • The cap is 33 tail calls, so it would be around 32 after the initial tail call into the filler. This would not allow us to capture all events anyways in the modern bpf.
  • I couldn't find an easy way to keep track of what number of event we are actually on, that's probably a skill issue on my side, probably using a map or some part of the context would be enough.

but I see the issue that it could cause with the old drivers... we have no easy ways to create multiple headers and send multiple events inside the same filler, all the architecture is built on the assumption of sending just one event per filler... I understand that creating all these helpers just for two syscalls could be an overkill. Send just the first message/tuple could be a hypothesis for the old drivers, even if I don't like it so much. At least in the kernel module, I would like to send all the messages/tuples in some way. I will try to think if we can do something different at least for the kernel module, but I'm not sure about the outcome...

Yeah, I'm not super happy about this implementation either, but as you mention, it would take a pretty big effort to be able to send more than one message. I'm willing to do the effort, just need some pointers on where to start.

Also, I tried adding kernel tests that grab more than one event and it looked like the call to evt_test->assert_event_presence(); was not moving the event forward and instead was validating the previous one, or rather, it could tell there was another event in the buffer, but the internal values were not updated, any ideas why this could happen? Or is the framework meant to work with just one event? Anyways, I'll keep looking into it because I think it's a valuable test to have for these syscalls.

@Molter73
Copy link
Contributor Author

Molter73 commented Aug 28, 2024

Looks like the scap tests are failing because a recvmmsg event doesn't have arguments in the file and it's not able to be parsed correctly, not sure how to fix that, do I need to recreate the scap file altogether?

@Andreagit97
Copy link
Member

That was my first thought too, I discarded it because:

  • The cap is 33 tail calls, so it would be around 32 after the initial tail call into the filler. This would not allow us to capture all events anyways in the modern bpf.

Yep probably there is no reason why we should choose the tail call approach in the end, the loop seems to do its job 🤔

  • I couldn't find an easy way to keep track of what number of event we are actually on, that's probably a skill issue on my side, probably using a map or some part of the context would be enough.

Probably it would be enough to check the offset inside the auxmap struct, more in detail the payload_pos to check how much data we have already pushed, or if we need other information, probably yes we need to use or some scratch space in the same map or add an additional map. But again probably is not necessary to go this way

Yeah, I'm not super happy about this implementation either, but as you mention, it would take a pretty big effort to be able to send more than one message. I'm willing to do the effort, just need some pointers on where to start.

I'm not sure it is worth reworking all the architecture for just these 2 new events... if we find a cool way to handle it with add only code ok, but otherwise, I have some doubts...

Also, I tried adding kernel tests that grab more than one event and it looked like the call to evt_test->assert_event_presence(); was not moving the event forward and instead was validating the previous one, or rather, it could tell there was another event in the buffer, but the internal values were not updated, any ideas why this could happen? Or is the framework meant to work with just one event? Anyways, I'll keep looking into it because I think it's a valuable test to have for these syscalls.

Uhm the framework should be able to retrieve more than one event, see here an example

evt_test->assert_event_presence(CURRENT_PID, PPME_SYSCALL_CLOSE_E);
. If the events are in order we should be able to retrieve them...

To be honest, the best thing to do in the framework would be to scrape all the events in the buffers, save them in a cpp vector, and then search the events we need in the vector. This would provide us with great debug capabilities since we can print all the events we have seen from a specific thread for example and we could easily understand why our event is not there, maybe just a wrong event type

@Andreagit97
Copy link
Member

Usually, there are no issues when we only add parameters to an event in the event table. this is a particular case because the previous number of parameters was 0... we should try to understand in which method we are facing this exception

[ RUN      ] scap_file_kexec_arm64.tail_lineage
Trying to open the right engine!
unknown file: Failure
C++ exception with description "vector::_M_range_check: __n (which is 0) >= this->size() (which is 0)" thrown in the test body.
Trying to open the right engine!
[  FAILED  ] scap_file_kexec_arm64.tail_lineage (28 ms)

@Molter73
Copy link
Contributor Author

I'm not sure it is worth reworking all the architecture for just these 2 new events... if we find a cool way to handle it with add only code ok, but otherwise, I have some doubts...

Alright, I refrain from doing anything else in this regard until we have some time to think about it.

Uhm the framework should be able to retrieve more than one event, see here an example

That's weird, I'll give it a few more tries when I get a minute.

Usually, there are no issues when we only add parameters to an event in the event table. this is a particular case because the previous number of parameters was 0... we should try to understand in which method we are facing this exception

I'll try to get a stack trace, I can't remember the exact point it failed at off the top of my head.

@Andreagit97
Copy link
Member

Maybe I add an idea for the kernel module. In TRACEPOINT_PROBE(syscall_exit_probe, struct pt_regs *regs, long ret) we can add the following code:

        //----------------------------------------------------------------------------- New code
	if(event_pair->exit_event_type == PPME_SOCKET_SENDMMSG_X || 
		event_pair->exit_event_type == PPME_SOCKET_RECVMMSG_X)
	{
		for(...)
		{
			// we need to add a custom logic inside `record_event_all_consumers` for these syscalls to understand which tuple
			// and message we need to send.
			record_event_all_consumers(event_pair->exit_event_type, event_pair->flags, &event_data, KMOD_PROG_SYS_EXIT);
		}
                return;
	}
        //-----------------------------------------------------------------------------
	if (event_pair->flags & UF_USED)
		record_event_all_consumers(event_pair->exit_event_type, event_pair->flags, &event_data, KMOD_PROG_SYS_EXIT);
	else
		record_event_all_consumers(PPME_GENERIC_X, UF_ALWAYS_DROP, &event_data, KMOD_PROG_SYS_EXIT);

we could call the record_event_all_consumers in a for loop and each time we could send a different SENDMMSG event. We probably need to enrich the event_data struct with some information on how many messages we have already sent and what is the next message to send.

For what concern the legacy probe I had no great ideas, probably it's ok to send just the first message and not the others

@FedeDP
Copy link
Contributor

FedeDP commented Sep 2, 2024

Neat idea andre!
Re old bpf probe, i agree, don't spend too much time on it for now; perhaps we can fix it in the future if needed. Also, having a working solution that let's say kills (through the verifier) like 30% of our support matrix is not gonna work IMHO; if we find a solution, we should fine on that does not break too many verifiers; i think that is the hardest part.

@Molter73
Copy link
Contributor Author

Molter73 commented Sep 2, 2024

Alright then, I'll try to get the implementation for kmod done when I get a minute. I also still need to go through the last errors in CI.

@Molter73
Copy link
Contributor Author

Molter73 commented Sep 3, 2024

So here is the stacktrace for the scap file unit test that is failing:

#0  0x00007ffff5ea8664 in __pthread_kill_implementation () from /lib64/libc.so.6
#1  0x00007ffff5e4fc4e in raise () from /lib64/libc.so.6
#2  0x00007ffff5e37902 in abort () from /lib64/libc.so.6
#3  0x00007ffff60a5da9 in __gnu_cxx::__verbose_terminate_handler() [clone .cold] () from /lib64/libstdc++.so.6
#4  0x00007ffff60b7c4c in __cxxabiv1::__terminate(void (*)()) () from /lib64/libstdc++.so.6
#5  0x00007ffff60a5951 in std::terminate() () from /lib64/libstdc++.so.6
#6  0x00007ffff60b7ed8 in __cxa_throw () from /lib64/libstdc++.so.6
#7  0x00007ffff60a88df in std::__throw_out_of_range_fmt(char const*, ...) () from /lib64/libstdc++.so.6
#8  0x0000000000e753c9 in std::vector<sinsp_evt_param, std::allocator<sinsp_evt_param> >::_M_range_check (
    this=this@entry=0x7ffff4317a00, __n=__n@entry=0) at /usr/include/c++/14/bits/stl_vector.h:1160
#9  0x0000000000e6bde2 in std::vector<sinsp_evt_param, std::allocator<sinsp_evt_param> >::at (this=0x7ffff4317a00, __n=0)
    at /usr/include/c++/14/bits/stl_vector.h:1180
#10 sinsp_evt::get_param (this=this@entry=0x7ffff43179b8, id=id@entry=0)
    at /home/mmoltras/go/src/github.com/falcosecurity/libs/userspace/libsinsp/event.cpp:198
#11 0x0000000001174707 in sinsp_parser::reset (this=this@entry=0x515000009180, evt=evt@entry=0x7ffff43179b8)
    at /home/mmoltras/go/src/github.com/falcosecurity/libs/userspace/libsinsp/parsers.cpp:718
#12 0x000000000117ee53 in sinsp_parser::process_event (this=0x515000009180, evt=0x7ffff43179b8)
    at /home/mmoltras/go/src/github.com/falcosecurity/libs/userspace/libsinsp/parsers.cpp:90
#13 0x0000000001072167 in sinsp::next (this=0x7ffff4317840, puevt=0x7ffff3c77720)
    at /home/mmoltras/go/src/github.com/falcosecurity/libs/userspace/libsinsp/sinsp.cpp:1322
#14 0x0000000000c56032 in scap_file_test_helpers::capture_search_evt_by_type_and_tid (
    inspector=inspector@entry=0x7ffff4317840, type=type@entry=293, tid=tid@entry=107370)
    at /home/mmoltras/go/src/github.com/falcosecurity/libs/userspace/libsinsp/test/helpers/scap_file_helpers.cpp:45
#15 0x0000000000c5357d in scap_file_kexec_x86_tail_lineage_Test::TestBody (this=<optimized out>)
    at /home/mmoltras/go/src/github.com/falcosecurity/libs/userspace/libsinsp/test/scap_files/kexec_x86/kexec_x86.cpp:39
#16 0x00000000016c0064 in testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void> (
    object=object@entry=0x5020000082f0, method=&virtual testing::Test::TestBody(),
    location=location@entry=0x16cdb91 "the test body")
    at /home/mmoltras/go/src/github.com/falcosecurity/libs/build/googletest-src/googletest/src/gtest.cc:2657
#17 0x00000000016b409a in testing::Test::Run (this=this@entry=0x5020000082f0)
    at /home/mmoltras/go/src/github.com/falcosecurity/libs/build/googletest-src/googletest/src/gtest.cc:2674
#18 0x00000000016b4260 in testing::TestInfo::Run (this=0x5120000304c0)
    at /home/mmoltras/go/src/github.com/falcosecurity/libs/build/googletest-src/googletest/src/gtest.cc:2853
#19 0x00000000016b43d2 in testing::TestSuite::Run (this=0x512000030640)
    at /home/mmoltras/go/src/github.com/falcosecurity/libs/build/googletest-src/googletest/src/gtest.cc:3012
#20 0x00000000016b7f5e in testing::internal::UnitTestImpl::RunAllTests (this=0x516000000680)
    at /home/mmoltras/go/src/github.com/falcosecurity/libs/build/googletest-src/googletest/src/gtest.cc:5870
#21 0x00000000016c042f in testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (
    object=0x516000000680,
    method=(bool (testing::internal::UnitTestImpl::*)(testing::internal::UnitTestImpl * const)) 0x16b7c0e <testing::internal::UnitTestImpl::RunAllTests()>, location=location@entry=0x17c52d0 "auxiliary test code (environments or event listeners)")
    at /home/mmoltras/go/src/github.com/falcosecurity/libs/build/googletest-src/googletest/src/gtest.cc:2657
#22 0x00000000016b40e4 in testing::UnitTest::Run (this=0x2162540 <testing::UnitTest::GetInstance()::instance>)
    at /home/mmoltras/go/src/github.com/falcosecurity/libs/build/googletest-src/googletest/src/gtest.cc:5444
#23 0x000000000118d5ec in RUN_ALL_TESTS ()
    at /home/mmoltras/go/src/github.com/falcosecurity/libs/build/googletest-src/googletest/include/gtest/gtest.h:2293
#24 0x000000000118d5d5 in main (argc=<optimized out>, argv=0x7fffffffe2e8)
    at /home/mmoltras/go/src/github.com/falcosecurity/libs/build/googletest-src/googletest/src/gtest_main.cc:51

The interesting part is in these 2 lines though:

#10 sinsp_evt::get_param (this=this@entry=0x7ffff43179b8, id=id@entry=0)
    at /home/mmoltras/go/src/github.com/falcosecurity/libs/userspace/libsinsp/event.cpp:198
#11 0x0000000001174707 in sinsp_parser::reset (this=this@entry=0x515000009180, evt=evt@entry=0x7ffff43179b8)
    at /home/mmoltras/go/src/github.com/falcosecurity/libs/userspace/libsinsp/parsers.cpp:718

Looks like we are trying to get the file descriptor here and, because it is not set in the scap file, an out of range error is thrown here.

Best I can think of is catching the exception for recvmmsg and sendmmsg in parsers.cpp, and re-throw for other syscalls, or check before accessing for those syscalls, which kinda defeats the purpose of at, WDYT?

Copy link

codecov bot commented Sep 3, 2024

Codecov Report

Attention: Patch coverage is 77.77778% with 4 lines in your changes missing coverage. Please review.

Project coverage is 75.04%. Comparing base (55ff79f) to head (6510e2f).
Report is 90 commits behind head on master.

Files with missing lines Patch % Lines
userspace/libsinsp/parsers.cpp 77.77% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2027      +/-   ##
==========================================
+ Coverage   74.82%   75.04%   +0.22%     
==========================================
  Files         254      255       +1     
  Lines       33510    33602      +92     
  Branches     5746     5744       -2     
==========================================
+ Hits        25073    25217     +144     
+ Misses       8437     8385      -52     
Flag Coverage Δ
libsinsp 75.04% <77.77%> (+0.22%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Molter73
Copy link
Contributor Author

it's possible to have tail calls + a call to a static function not inline

This is possible, the only catch is the allowed stack size is reduced to 256 bytes I think, but it will still not work on 5.8 because of the ld_imm64 instruction is not implemented, preventing the call from happening. I can try to test it, but implementing this same logic with tail calls sounds like a nightmare to me.

@leogr
Copy link
Member

leogr commented Nov 19, 2024

@Molter73 Do you think we are in time to put this for 0.20 (due by Dec, or mid-Jan if we skip Dec)? 🤔

@Molter73
Copy link
Contributor Author

@Molter73 Do you think we are in time to put this for 0.20 (due by Dec, or mid-Jan if we skip Dec)? 🤔

We should, I just haven't had enough time to address all comments, I'll try to make room and push this over the finish line in the next couple of weeks.

@leogr
Copy link
Member

leogr commented Nov 19, 2024

@Molter73 Do you think we are in time to put this for 0.20 (due by Dec, or mid-Jan if we skip Dec)? 🤔

We should, I just haven't had enough time to address all comments, I'll try to make room and push this over the finish line in the next couple of weeks.

Great, let's say tentatively for
/milestone 0.20

@poiana
Copy link
Contributor

poiana commented Nov 19, 2024

@leogr: The provided milestone is not valid for this repository. Milestones in this repository: [0.18.2, 0.19.0, 0.20.0, TBD, next-driver]

Use /milestone clear to clear the milestone.

In response to this:

@Molter73 Do you think we are in time to put this for 0.20 (due by Dec, or mid-Jan if we skip Dec)? 🤔

We should, I just haven't had enough time to address all comments, I'll try to make room and push this over the finish line in the next couple of weeks.

Great, let's say tentatively for
/milestone 0.20

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@FedeDP
Copy link
Contributor

FedeDP commented Nov 19, 2024

/milestone 0.20.0

@poiana poiana modified the milestones: TBD, 0.20.0 Nov 19, 2024
@Molter73
Copy link
Contributor Author

So, "small" recap of the state of the PR, now that I've finished rebasing and addressing most comments.

  • Demultiplexing the syscall into multiple events
    • The modern_bpf driver is able to handle multiple messages coming from these syscalls.
    • Legacy probe will likely not get this due to verifier limitations on older kernels. Instead it only handles the first message in a syscall.
    • A suggestion to add this to the kernel module has been proposed by @Andreagit97, but I have not had time to implement it and it's unlikely I'll be able to do so (will still try though).
  • 5.8 kernels have a verifier issue stating the bpf_ld_imm64 is an unrecognized instruction
    • I managed to pin point this error to using the bpf_loop helper, seems this prevents the compiler from inlining the callbacks used with it and in turn cause the error on older kernels.
    • We've tried doing a wrapper callback with an inlined function, but that causes another verifier issue on 6.11.
    • Another proposed solution is to use tail calls on kernels that don't implement bpf_loop, but I fear this won't fix the 5.8 verifier issue (the callback will still not be inlined here) and it requires additional time to implement, which I currently don't have.
  • The driver schema needs adjusting, but I need help understanding if this change should do a major or minor bump.
  • All comments but 2 have been addressed, so I've marked them all as resolved, feel free to re-open if that is not the case on any of them.

@Andreagit97, @FedeDP, with all this in mind, please let me know how you guys want to proceed, which of the items above would you consider blockers for getting this PR merged and which we could delegate to follow ups that could be handled by either myself in the future or someone with more time to address them.

@Andreagit97
Copy link
Member

Hi Mauro! thank you for your hard work! IMO before merging this PR we should have:

  • The modern ebpf working on all supported kernels (it would be dangerous to release it with 5.8 broken since we don't even know if this is the unique version affected)
  • the kmod can send more than one event like the modern ebpf
  • [minor] adapt the dyn_snaplen to work also with sendmmsg
    WDYT @FedeDP ?

@FedeDP
Copy link
Contributor

FedeDP commented Nov 25, 2024

Great job Mauro! And thanks Andre for the continuous reviews!

Legacy probe will likely not get this due to verifier limitations on older kernels. Instead it only handles the first message in a syscall.

+1, this is a good decision IMHO.

A suggestion to add this to the kernel module has been proposed by @Andreagit97, but I have not had time to implement it and it's unlikely I'll be able to do so (will still try though).

Agree with Andre, this should be in before merging the PR. Do you have more time to work on this in the next few days? Otherwise we can try to jump in and help if we can.

5.8 kernels have a verifier issue stating the bpf_ld_imm64 is an unrecognized instruction

We need to somehow fix this; in the worst scenario, we could just send the first event only (like we do for bpf driver) on modern_ebpf 5.8, but i would avoid such a workaround.

The driver schema needs adjusting, but I need help understanding if this change should do a major or minor bump.

Since we just added 2 new events, this is a minor schema bump (no breaking change)!

[minor] adapt the dyn_snaplen to work also with sendmmsg

Yep, we should do that.

Mauro feel free to ask us for support, we'll try our best to help you! And again, thank you very much for tackling this one, it has been such a PITA to implement i guess :D

Due to limitations with the verifier, it won't be possible to iterate
over all messages, so the implementation is best effort and only the
first message is actually processed.

Signed-off-by: Mauro Ezequiel Moltrasio <[email protected]>
The current implementation is not complete, only the first message is
processed. In order to allow for multiple messages to be processed the
kmod needs to allow for multiple headers to be added to the ringbuffer
from the filler.

Signed-off-by: Mauro Ezequiel Moltrasio <[email protected]>
The added fields were added in newer kernels and can be used to check
for access of some newer helpers.

Signed-off-by: Mauro Ezequiel Moltrasio <[email protected]>
Signed-off-by: Mauro Ezequiel Moltrasio <[email protected]>
@Molter73
Copy link
Contributor Author

Alright, I had some time today and I managed to implement multiple message handling on the kmod and dynamic snaplen for the new syscalls (at least to the best of my abilities). I'll try to tackle the 5.8 verifier error next, but last time I tried it looked like there was no easy way around it, I might need someone else to take this one up, I'll let you guys know where I'm standing in a day or two.

Mauro feel free to ask us for support, we'll try our best to help you! And again, thank you very much for tackling this one, it has been such a PITA to implement i guess :D

It's been a bit painful, but it has also been a learning experience, so it's all good 😄

A new argument had to be added to the apply_dynamic_snaplen function, I
opted for using an auxiliar struct and pass a single pointer to it to
the function. I think this is a bit cleaner, since removing or adding
other arguments can be done by simply adding it to the struct, keeping
the function signature unchanged.

Signed-off-by: Mauro Ezequiel Moltrasio <[email protected]>
@Molter73
Copy link
Contributor Author

Molter73 commented Dec 5, 2024

Sorry for the delay on this, I haven't had much time to look into it, we are reaching the deadline for 0.20 and I don't think I'll be able to fix it. Here is where I'm standing now with the verifier issue on 5.8, the error message seems to be coming from this line in the kernel: https://github.com/torvalds/linux/blob/bcf876870b95592b52519ed4aafcf9d95999bc9c/kernel/bpf/verifier.c#L9074-L9084

That error seems to mean the compiler is adding a load operation on something that is not a map (at least to my limited understanding). This error persists on the 5.8 kernel, even if I use separate inlined and not inlined function like Andrea proposed and that introduces another error on newer kernels. I also tried using a macro and callback but that was awful and didn't fix the issue either. The only way I've managed to make the verifier error go away is to delete the bpf_loop calls, which probably means the compiler is then inlining the calls, but that of course defeats the whole purpose of the change.

One thing I did to work around this issue in our fork is to make loading of tracepoints limited to interesting ones, that way we can exclude the offending syscall at runtime, but I'm not sure that approach is 100% aligned with what we are trying to achieve here.

@Andreagit97, @FedeDP, any other ideas I could try? Would either of you had time and the will to look into it?

@Andreagit97
Copy link
Member

Andreagit97 commented Jan 3, 2025

After a little bit of investigation, we probably understood what the issue was.

I crafted a little repro:

#include <vmlinux.h>
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_tracing.h>
#include <bpf/bpf_core_read.h>

char LICENSE[] SEC("license") = "Dual BSD/GPL";

static long loop_fn(uint32_t index, void *ctx) {
  bpf_printk("handle_exit\n");
  return 0;
}

SEC("tp/raw_syscalls/sys_enter")
int test(void *ctx) {
  if (bpf_core_enum_value_exists(enum bpf_func_id, BPF_FUNC_loop) == 1) {
    bpf_printk("loop\n");
    bpf_loop(12, loop_fn, NULL, 0);
  } else {
    bpf_printk("skip loop\n");
  }
  return 0;
}

And this fails to load on all kernels < 5.13.

Starting from kernel 5.8, the failure is unrecognized bpf_ld_imm64 insn this is due to the fact that function callbacks are not supported in this kernel, and when we loop over instructions the verifier doesn't know what is loop_fn. It supposes it should be a pointer to a map, that's why we face this error.

In kernel 5.10 we actually face a different error because the check on bpf_ld_imm64 insn is moved after some functions, so now we face an error here https://github.com/torvalds/linux/blob/830b3c68c1fb1e9176028d02ef86f3cf76aa2476/kernel/bpf/verifier.c#L11908. The error persists on the latest patch version 5.10.232

In kernel 5.13 the support for PTR_TO_FUNC was finally added and so our code works.

I will report this error to the mailing list since it seems a legit use case but considering that all kernels < 5.13 (5.10 excluded) are EOL I don't think they will ever receive a fix. So if we want to support bpf_loop on kernels like 5.8 we probably need to find another way...

I see at least 2 possible ways:

  1. Don't use bpf_loop. We can use a simple loop capped to 16/32 iteration.
  2. Introduce a mechanism to load ebpf program according to the feature supported. One idea could be to load an initial ebpf program that probes all the available features we need and populates some globals. Then according to the values obtained, we load the right ebpf program, the one with the simple loop or the one with the bpf_loop. So something similar to what Mauro suggested, a dynamic loading according to what we need.

In the long term, the second solution is more valuable because it could help with similar issues but probably to merge this PR approach 1 is enough

@FedeDP
Copy link
Contributor

FedeDP commented Jan 3, 2025

In the long term, the second solution is more valuable because it could help with similar issues but probably to merge this PR approach 1 is enough

I totally agree with this one; we have been thinking about the second solution for ages now, but it's too complex to implement and maintain atm.

@Andreagit97
Copy link
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add args for sendmmsg events
5 participants