-
Notifications
You must be signed in to change notification settings - Fork 168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add arguments for sendmmsg and recvmmsg #2027
base: master
Are you sure you want to change the base?
Conversation
/cc @Andreagit97 @FedeDP |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Molter73 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Please double check driver/SCHEMA_VERSION file. See versioning. /hold |
Thanks Mauro! |
Hei, thank you! Before looking at this PR I was thinking of a tail call approach, so you start with a first bpf program that sends the first message and then you tail call until you have no more messages to send (the limit is 32 if I recall well). This approach should work well in the modern ebpf (have you already considered it?) but I see the issue that it could cause with the old drivers... we have no easy ways to create multiple headers and send multiple events inside the same filler, all the architecture is built on the assumption of sending just one event per filler... I understand that creating all these helpers just for two syscalls could be an overkill. Send just the first message/tuple could be a hypothesis for the old drivers, even if I don't like it so much. At least in the kernel module, I would like to send all the messages/tuples in some way. I will try to think if we can do something different at least for the kernel module, but I'm not sure about the outcome... |
No worries, I have a bunch of CI failures to go through before this can go in anyways and I'm also not super happy about the implementation so we can discuss as long as it takes.
That was my first thought too, I discarded it because:
Yeah, I'm not super happy about this implementation either, but as you mention, it would take a pretty big effort to be able to send more than one message. I'm willing to do the effort, just need some pointers on where to start. Also, I tried adding kernel tests that grab more than one event and it looked like the call to |
c9cd8b1
to
0616dfb
Compare
Looks like the scap tests are failing because a recvmmsg event doesn't have arguments in the file and it's not able to be parsed correctly, not sure how to fix that, do I need to recreate the scap file altogether? |
Yep probably there is no reason why we should choose the tail call approach in the end, the loop seems to do its job 🤔
Probably it would be enough to check the offset inside the
I'm not sure it is worth reworking all the architecture for just these 2 new events... if we find a cool way to handle it with add only code ok, but otherwise, I have some doubts...
Uhm the framework should be able to retrieve more than one event, see here an example
To be honest, the best thing to do in the framework would be to scrape all the events in the buffers, save them in a cpp vector, and then search the events we need in the vector. This would provide us with great debug capabilities since we can print all the events we have seen from a specific thread for example and we could easily understand why our event is not there, maybe just a wrong event type |
Usually, there are no issues when we only add parameters to an event in the event table. this is a particular case because the previous number of parameters was 0... we should try to understand in which method we are facing this exception
|
Alright, I refrain from doing anything else in this regard until we have some time to think about it.
That's weird, I'll give it a few more tries when I get a minute.
I'll try to get a stack trace, I can't remember the exact point it failed at off the top of my head. |
Maybe I add an idea for the kernel module. In //----------------------------------------------------------------------------- New code
if(event_pair->exit_event_type == PPME_SOCKET_SENDMMSG_X ||
event_pair->exit_event_type == PPME_SOCKET_RECVMMSG_X)
{
for(...)
{
// we need to add a custom logic inside `record_event_all_consumers` for these syscalls to understand which tuple
// and message we need to send.
record_event_all_consumers(event_pair->exit_event_type, event_pair->flags, &event_data, KMOD_PROG_SYS_EXIT);
}
return;
}
//-----------------------------------------------------------------------------
if (event_pair->flags & UF_USED)
record_event_all_consumers(event_pair->exit_event_type, event_pair->flags, &event_data, KMOD_PROG_SYS_EXIT);
else
record_event_all_consumers(PPME_GENERIC_X, UF_ALWAYS_DROP, &event_data, KMOD_PROG_SYS_EXIT); we could call the For what concern the legacy probe I had no great ideas, probably it's ok to send just the first message and not the others |
Neat idea andre! |
Alright then, I'll try to get the implementation for kmod done when I get a minute. I also still need to go through the last errors in CI. |
So here is the stacktrace for the scap file unit test that is failing:
The interesting part is in these 2 lines though:
Looks like we are trying to get the file descriptor here and, because it is not set in the scap file, an out of range error is thrown here. Best I can think of is catching the exception for recvmmsg and sendmmsg in parsers.cpp, and re-throw for other syscalls, or check before accessing for those syscalls, which kinda defeats the purpose of |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #2027 +/- ##
==========================================
+ Coverage 74.82% 75.04% +0.22%
==========================================
Files 254 255 +1
Lines 33510 33602 +92
Branches 5746 5744 -2
==========================================
+ Hits 25073 25217 +144
+ Misses 8437 8385 -52
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
This is possible, the only catch is the allowed stack size is reduced to 256 bytes I think, but it will still not work on 5.8 because of the ld_imm64 instruction is not implemented, preventing the call from happening. I can try to test it, but implementing this same logic with tail calls sounds like a nightmare to me. |
Great, let's say tentatively for |
@leogr: The provided milestone is not valid for this repository. Milestones in this repository: [ Use In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/milestone 0.20.0 |
60f9fd6
to
7861ecf
Compare
So, "small" recap of the state of the PR, now that I've finished rebasing and addressing most comments.
@Andreagit97, @FedeDP, with all this in mind, please let me know how you guys want to proceed, which of the items above would you consider blockers for getting this PR merged and which we could delegate to follow ups that could be handled by either myself in the future or someone with more time to address them. |
Hi Mauro! thank you for your hard work! IMO before merging this PR we should have:
|
Great job Mauro! And thanks Andre for the continuous reviews!
+1, this is a good decision IMHO.
Agree with Andre, this should be in before merging the PR. Do you have more time to work on this in the next few days? Otherwise we can try to jump in and help if we can.
We need to somehow fix this; in the worst scenario, we could just send the first event only (like we do for bpf driver) on modern_ebpf 5.8, but i would avoid such a workaround.
Since we just added 2 new events, this is a minor schema bump (no breaking change)!
Yep, we should do that. Mauro feel free to ask us for support, we'll try our best to help you! And again, thank you very much for tackling this one, it has been such a PITA to implement i guess :D |
Signed-off-by: Mauro Ezequiel Moltrasio <[email protected]>
Signed-off-by: Mauro Ezequiel Moltrasio <[email protected]>
Due to limitations with the verifier, it won't be possible to iterate over all messages, so the implementation is best effort and only the first message is actually processed. Signed-off-by: Mauro Ezequiel Moltrasio <[email protected]>
7861ecf
to
302802f
Compare
The current implementation is not complete, only the first message is processed. In order to allow for multiple messages to be processed the kmod needs to allow for multiple headers to be added to the ringbuffer from the filler. Signed-off-by: Mauro Ezequiel Moltrasio <[email protected]>
The added fields were added in newer kernels and can be used to check for access of some newer helpers. Signed-off-by: Mauro Ezequiel Moltrasio <[email protected]>
Signed-off-by: Mauro Ezequiel Moltrasio <[email protected]>
4af5bbc
to
b94a5e3
Compare
Alright, I had some time today and I managed to implement multiple message handling on the kmod and dynamic snaplen for the new syscalls (at least to the best of my abilities). I'll try to tackle the 5.8 verifier error next, but last time I tried it looked like there was no easy way around it, I might need someone else to take this one up, I'll let you guys know where I'm standing in a day or two.
It's been a bit painful, but it has also been a learning experience, so it's all good 😄 |
20fb0c5
to
171e35b
Compare
A new argument had to be added to the apply_dynamic_snaplen function, I opted for using an auxiliar struct and pass a single pointer to it to the function. I think this is a bit cleaner, since removing or adding other arguments can be done by simply adding it to the struct, keeping the function signature unchanged. Signed-off-by: Mauro Ezequiel Moltrasio <[email protected]>
171e35b
to
6510e2f
Compare
Sorry for the delay on this, I haven't had much time to look into it, we are reaching the deadline for 0.20 and I don't think I'll be able to fix it. Here is where I'm standing now with the verifier issue on 5.8, the error message seems to be coming from this line in the kernel: https://github.com/torvalds/linux/blob/bcf876870b95592b52519ed4aafcf9d95999bc9c/kernel/bpf/verifier.c#L9074-L9084 That error seems to mean the compiler is adding a load operation on something that is not a map (at least to my limited understanding). This error persists on the 5.8 kernel, even if I use separate inlined and not inlined function like Andrea proposed and that introduces another error on newer kernels. I also tried using a macro and callback but that was awful and didn't fix the issue either. The only way I've managed to make the verifier error go away is to delete the One thing I did to work around this issue in our fork is to make loading of tracepoints limited to interesting ones, that way we can exclude the offending syscall at runtime, but I'm not sure that approach is 100% aligned with what we are trying to achieve here. @Andreagit97, @FedeDP, any other ideas I could try? Would either of you had time and the will to look into it? |
After a little bit of investigation, we probably understood what the issue was. I crafted a little repro: #include <vmlinux.h>
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_tracing.h>
#include <bpf/bpf_core_read.h>
char LICENSE[] SEC("license") = "Dual BSD/GPL";
static long loop_fn(uint32_t index, void *ctx) {
bpf_printk("handle_exit\n");
return 0;
}
SEC("tp/raw_syscalls/sys_enter")
int test(void *ctx) {
if (bpf_core_enum_value_exists(enum bpf_func_id, BPF_FUNC_loop) == 1) {
bpf_printk("loop\n");
bpf_loop(12, loop_fn, NULL, 0);
} else {
bpf_printk("skip loop\n");
}
return 0;
} And this fails to load on all kernels < 5.13. Starting from kernel 5.8, the failure is In kernel 5.10 we actually face a different error because the check on In kernel 5.13 the support for PTR_TO_FUNC was finally added and so our code works. I will report this error to the mailing list since it seems a legit use case but considering that all kernels < 5.13 (5.10 excluded) are EOL I don't think they will ever receive a fix. So if we want to support I see at least 2 possible ways:
In the long term, the second solution is more valuable because it could help with similar issues but probably to merge this PR approach 1 is enough |
I totally agree with this one; we have been thinking about the second solution for ages now, but it's too complex to implement and maintain atm. |
What type of PR is this?
/kind feature
Any specific area of the project related to this PR?
/area driver-kmod
/area driver-bpf
/area driver-modern-bpf
/area tests
Does this PR require a change in the driver versions?
What this PR does / why we need it:
Add argument processing for the sendmmsg and recvmmsg syscalls.
These are quite tricky, they behave the same way sendmsg and recvmsg do, but allowing for multiple messages to be sent/received in a single syscall. This breaks some invariants on how Falco processes events, for instance, a sendmmsg call could send messages to multiple destinations in connectionless UDP, which we would need multiple socket tuples to represent in userspace. To work around this I proposed issuing multiple events from the kernel. This has lead me to do 2 implementations:
bpf_loop
if available, or does a best effort with a regular loop otherwise.The implementation is far from perfect and I'm not super happy with it, but it's the best I've managed to come up with so far. Any suggestions for improvement are welcome.
Which issue(s) this PR fixes:
Fixes #1636
Special notes for your reviewer:
Does this PR introduce a user-facing change?: