Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

trace exec upper_layer: fix BTF type size relocation #2400

Merged
merged 1 commit into from
Jan 23, 2024

Conversation

alban
Copy link
Member

@alban alban commented Jan 22, 2024

bpf_core_type_size() has to be used explicitly to ensure the correct BTF relocation will be used.

Using sizeof or other implicit methods don't work.

Fixes: 8c433c4 ("image-based trace exec: add upper_layer field")
Fixes: a1d98b1 ("built-in trace exec: add upper_layer field")

Fixes: #2353

How to use

Check the upper layer field in the trace exec gadget on different kernels

Testing done

Deploy this DaemonSet on AKS with 2 node pools:

  • Mariner Linux
  • Ubuntu
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: upperlayer
  labels:
    k8s-app: upperlayer
spec:
  selector:
    matchLabels:
      name: upperlayer
  template:
    metadata:
      labels:
        name: upperlayer
        role: demo
    spec:
      containers:
      - name: upperlayer
        image: ubuntu
        command: [ "sh", "-c", "cp /usr/bin/uname / ; while sleep 1 ; do /host/usr/bin/uname -a ; uname -a ; /uname -a ; done" ]
        volumeMounts:
        - name: host
          mountPath: /host
      volumes:
      - name: host
        hostPath:
          path: /
sudo -E ./ig image build -t myrepo.azurecr.io/trace-exec:debug ./gadgets/trace_exec
sudo -E ./ig image push myrepo.azurecr.io/trace-exec:debug
./kubectl-gadget run myrepo.azurecr.io/trace-exec:debug -o columns=k8s.node,comm,ret,args,upper_layer

K8S.NODE                            COMM   ARGS                 UPPER_LAYER
aks-userpool-39817355-vmss000000    uname  /host/usr/bin/uname  false
aks-userpool-39817355-vmss000000    uname  /usr/bin/uname       false
aks-userpool-39817355-vmss000000    uname  /uname               true
aks-userpool-39817355-vmss000000    sleep  /usr/bin/sleep       false
aks-azurepool-26141556-vmss000000   uname  /host/usr/bin/uname  false
aks-azurepool-26141556-vmss000000   uname  /usr/bin/uname       false
aks-azurepool-26141556-vmss000000   uname  /uname               true
aks-azurepool-26141556-vmss000000   sleep  /usr/bin/sleep       false

@alban
Copy link
Member Author

alban commented Jan 22, 2024

I also tested that this PR fixes the built-in gadget too:

$ ./kubectl-gadget trace exec -o columns=k8s.node,k8s.container,comm,ret,args,upperlayer
INFO[0000] Experimental features enabled                
K8S.NODE                            K8S.CONTAINER  COMM  RET ARGS                    UPPERLAYER
aks-userpool-39817355-vmss000000    upperlayer     uname 0   /host/usr/bin/uname -a  false
aks-userpool-39817355-vmss000000    upperlayer     uname 0   /usr/bin/uname -a       false
aks-userpool-39817355-vmss000000    upperlayer     uname 0   /uname -a               true
aks-userpool-39817355-vmss000000    upperlayer     sleep 0   /usr/bin/sleep 1        false

aks-azurepool-26141556-vmss000000   upperlayer     uname 0   /host/usr/bin/uname -a  false
aks-azurepool-26141556-vmss000000   upperlayer     uname 0   /usr/bin/uname -a       false
aks-azurepool-26141556-vmss000000   upperlayer     uname 0   /uname -a               true
aks-azurepool-26141556-vmss000000   upperlayer     sleep 0   /usr/bin/sleep 1        false

struct _ovl_inode {
struct inode vfs_inode;
struct dentry *__upperdentry;
};

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps it can be fixed by adding __attribute__((preserve_access_index)); to this structure? (I think it'll force clang to emit a relocation when the container_of macro needs to calculate offsetof(type,member))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, why is this structure so different from the upstream one? https://github.com/torvalds/linux/blob/v5.0/fs/overlayfs/ovl_entry.h#L91-L105

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I now understand, we have the inode, and we suppose the dentry is the next member on the structure, so that's why we don't care about any other members.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps it can be fixed by adding attribute((preserve_access_index)); to this structure? (I think it'll force clang to emit a relocation when the container_of macro needs to calculate offsetof(type,member))

cilium/ebpf emits a bad relocation when I add this attribute because it does not find struct _ovl_inode in the kernel BTF.

const badRelo = 0xbad2310 

I think I now understand, we have the inode, and we suppose the dentry is the next member on the structure, so that's why we don't care about any other members.

Correct. In that way, no need to define types for other fields.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cilium/ebpf emits a bad relocation when I add this attribute because it does not find struct _ovl_inode in the kernel BTF.

That's unfortunate.

Copy link

@lmb lmb Jan 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use "flavour names" like struct ovl_inode___gadget. Note the three _. Anything after the underscore is ignored by the library when trying to find a target. https://facebookmicrosites.github.io/bpf/blog/2020/02/19/bpf-portability-and-co-re.html#dealing-with-kernel-version-and-configuration-differences

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lmb Thanks for the link, I didn't know about struct flavors.

The following seems to work. Tested without cilium/ebpf#1300, so struct ovl_inode is unknown to BTF and only the "else" branch is tested.

Is it the approach you recommend?

#include <vmlinux.h>

struct ovl_inode___gadget {
       struct inode vfs_inode;
       struct dentry *__upperdentry;
} __attribute__((preserve_access_index));
...

	if (bpf_core_type_exists(struct ovl_inode___gadget)) {
		struct ovl_inode___gadget *oi = container_of(
			inode, struct ovl_inode___gadget, vfs_inode);
		bpf_probe_read_kernel(&upperdentry, sizeof upperdentry,
				      &oi->__upperdentry);
	} else {
		bpf_probe_read_kernel(&upperdentry, sizeof upperdentry,
				      ((void *)inode) +
					      bpf_core_type_size(struct inode));
	}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think something like checking for the existence of the field would be more reliable: https://github.com/libbpf/libbpf/blob/f81eef23b33c0dbf923e863a72ce51ea4d32e291/src/bpf_core_read.h#L151-L164

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I gave it a try: #2406

About bpf_core_field_exists: I'll give it a try. If the struct does not exist, I guess bpf_core_field_exists would just return false?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is correct.

Copy link
Member

@eiffel-fl eiffel-fl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi.

It seems to work but I am concerned about how it would behave if upstream changes.
Moreover, can you push it to citest/? This way, we ensure everything is correct with regard to the CI. If everything is green on the CI side, I will approve it.

Best regards.

struct inode vfs_inode;
struct dentry *__upperdentry;
};

static __always_inline bool upper_layer()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rename it as has_upper_layer() since it returns a bool.

// struct ovl_inode defined in fs/overlayfs/ovl_entry.h
// Unfortunately, not exported to vmlinux.h
// and not available in /sys/kernel/btf/vmlinux
// See https://github.com/cilium/ebpf/pull/1300
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we bring momentum to this PR and then use it once merged?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be good. But even if the cilium/ebpf PR is merged, we still have work to do to make use of it in Inspektor Gadget:

  • How an image-based gadget can specify which kernel module to load, so that the run command uses the correct KernelTypes:
ebpf.CollectionOptions{
    MapReplacements: mapReplacements,
    Programs: ebpf.ProgramOptions{
        KernelTypes: /* BTF from the kernel and the modules */,
    },
}

Comment on lines +173 to +174
bpf_core_type_size(struct inode));

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If one day the order of the fields in struct ovl_inode changes upstream, this will not work anymore, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct.

Sysdig has the same issue, it makes the same assumption (see extract__exe_upper_layer).

To fix that, we will need:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct.

Sysdig has the same issue, it makes the same assumption (see extract__exe_upper_layer).

We should definitely take inspiration from strong behavior and not weak assumption like this one.
Can you open a follow-up PR to document this limitation?
Also, is this field shown by default? It should be hidden by default.

To fix that, we will need:

* [Add CO-RE support for kernel modules cilium/ebpf#1300](https://github.com/cilium/ebpf/pull/1300)

I will review and test this PR, can you please do so?
Even though this is not enough for this PR, this is something which would definitely be helpful for Inspektor Gadget, so we should definitely participate.

* Use it in the built-in gadget
  
  * By loading /sys/kernel/btf/overlay when we have both `CONFIG_DEBUG_INFO_BTF=y` and `CONFIG_DEBUG_INFO_BTF_MODULES=y`
  * With BTFGen when we have neither
    
    * [repository: BTFHUB should contain BTF files for all kernel modules aquasecurity/btfhub#70](https://github.com/aquasecurity/btfhub/issues/70)
  * Some other solutions when we have `CONFIG_DEBUG_INFO_BTF=y` but not `CONFIG_DEBUG_INFO_BTF_MODULES=y`?

* Use it in image-based gadgets
  
  * Add a way for the gadget to specify which modules to load

This is not the goal of Inspektor Gadget to load modules.
But we can indeed, use the YAML to read which modules BTF under /sys/kerne/btf the gadget needs.

  * Add BTFGen support for image-based gadgets [image-based gadgets: Support BTFGen #2045](https://github.com/inspektor-gadget/inspektor-gadget/issues/2045)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should definitely take inspiration from strong behavior and not weak assumption like this one.
Can you open a follow-up PR to document this limitation?

Ok, I will.

Also, is this field shown by default? It should be hidden by default.

It's hidden by default.

I will review and test this PR, can you please do so?

I'll try as time allows.

This is not the goal of Inspektor Gadget to load modules.
But we can indeed, use the YAML to read which modules BTF under /sys/kerne/btf the gadget needs.

Yes, I meant loading the BTF file of a kernel module.

bpf_core_type_size() has to be used explicitly to ensure the correct BTF
relocation will be used.

Using sizeof or other implicit methods don't work.

Fixes: 8c433c4 ("image-based trace exec: add upper_layer field")
Fixes: a1d98b1 ("built-in trace exec: add upper_layer field")

Signed-off-by: Alban Crequy <[email protected]>
@alban alban force-pushed the alban_upperlayer_fix branch from a355c26 to 7c8dd8f Compare January 23, 2024 09:59
@alban
Copy link
Member Author

alban commented Jan 23, 2024

@alban
Copy link
Member Author

alban commented Jan 23, 2024

The CI is green, on citest/alban_upperlayer_fix too.

Copy link
Member

@mauriciovasquezbernal mauriciovasquezbernal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. It's still "hacky" but the final solution requires a lot of work as mentioned by Alban.

struct _ovl_inode {
struct inode vfs_inode;
struct dentry *__upperdentry;
};

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cilium/ebpf emits a bad relocation when I add this attribute because it does not find struct _ovl_inode in the kernel BTF.

That's unfortunate.

@alban alban merged commit e5d49ae into main Jan 23, 2024
109 checks passed
@alban alban deleted the alban_upperlayer_fix branch January 23, 2024 14:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants