diff --git a/src/bcc-documents/kernel-versions.md b/src/bcc-documents/kernel-versions.md index 052a3d60..797f1a3a 100644 --- a/src/bcc-documents/kernel-versions.md +++ b/src/bcc-documents/kernel-versions.md @@ -562,7 +562,8 @@ RPC_FUNC_inode_storage_delete() | 5.10 | | [8ea636848aca](https://github.com/to `BPF_FUNC_xdp_adjust_meta()` | 4.15 | | [`de8f3a83b0a0`](https://github.com/torvalds/linux/commit/de8f3a83b0a0fddb2cf56e7a718127e9619ea3da) `BPF_FUNC_xdp_adjust_tail()` | 4.18 | | [`b32cc5b9a346`](https://github.com/torvalds/linux/commit/b32cc5b9a346319c171e3ad905e0cddda032b5eb) `BPF_FUNC_xdp_get_buff_len()` | 5.18 | | [`0165cc817075`](https://github.com/torvalds/linux/commit/0165cc817075cf701e4289838f1d925ff1911b3e) -`BPF_FUNC_xdp_load_bytes()` | 5.18 | | [`3f364222d032`](https://github.com/torvalds/linux/commit/3f364222d032eea6b245780e845ad213dab28cdd)"`BPF_FUNC_xdp_store_bytes()` | 5.18 | | [`3f364222d032`](https://github.com/torvalds/linux/commit/3f364222d032eea6b245780e845ad213dab28cdd) +`BPF_FUNC_xdp_load_bytes()` | 5.18 | | [`3f364222d032`](https://github.com/torvalds/linux/commit/3f364222d032eea6b245780e845ad213dab28cdd) +`BPF_FUNC_xdp_store_bytes()` | 5.18 | | [`3f364222d032`](https://github.com/torvalds/linux/commit/3f364222d032eea6b245780e845ad213dab28cdd) `BPF_FUNC_xdp_output()` | 5.6 | GPL | [`d831ee84bfc9`](https://github.com/torvalds/linux/commit/d831ee84bfc9173eecf30dbbc2553ae81b996c60) `BPF_FUNC_override_return()` | 4.16 | GPL | [`9802d86585db`](https://github.com/torvalds/linux/commit/9802d86585db91655c7d1929a4f6bbe0952ea88e) `BPF_FUNC_sock_ops_cb_flags_set()` | 4.16 | | [`b13d88072172`]() diff --git a/src/bcc-documents/kernel-versions_en.md b/src/bcc-documents/kernel-versions_en.md index 9a8ab11b..d2691c63 100644 --- a/src/bcc-documents/kernel-versions_en.md +++ b/src/bcc-documents/kernel-versions_en.md @@ -197,7 +197,9 @@ mmap() support for array maps | 5.5 | [`fc9702273e2e`](https://github.com/torval An approximate list of drivers or components supporting XDP programs for your kernel can be retrieved with: -```sh".git grep -l XDP_SETUP_PROG drivers/ + +```sh +git grep -l XDP_SETUP_PROG drivers/ ``` Feature / Driver | Kernel version | Commit @@ -218,6 +220,7 @@ Cavium `thunderx` driver | 4.12 | [`05c773f52b96`](https://github.com/torvalds/l Generic XDP | 4.12 | [`b5cdae3291f7`](https://github.com/torvalds/linux/commit/b5cdae3291f7be7a34e75affe4c0ec1f7f328b64)".# Helpers The list of helpers supported in your kernel can be found in file."[`include/uapi/linux/bpf.h`](https://github.com/torvalds/linux/blob/master/include/uapi/linux/bpf.h): + ```sh git grep ' FN(' include/uapi/linux/bpf.h ``` @@ -279,8 +282,7 @@ Helper | Kernel version | License | Commit | `BPF_FUNC_get_listener_sock()` | 5.1 | | [`dbafd7ddd623`](https://github.com/torvalds/linux/commit/dbafd7ddd62369b2f3926ab847cbf8fc40e800b7) `BPF_FUNC_get_local_storage()` | 4.19 | | [`cd3394317653`](https://github.com/torvalds/linux/commit/cd3394317653837e2eb5c5d0904a8996102af9fc) `BPF_FUNC_get_netns_cookie()` | 5.7 | | [`f318903c0bf4`](https://github.com/torvalds/linux/commit/f318903c0bf42448b4c884732df2bbb0ef7a2284)". - -Return only the translated content, not including the original text.`BPF_FUNC_get_ns_current_pid_tgid()` | 5.7 | | [`b4490c5c4e02`](https://github.com/torvalds/linux/commit/b4490c5c4e023f09b7d27c9a9d3e7ad7d09ea6bf) +`BPF_FUNC_get_ns_current_pid_tgid()` | 5.7 | | [`b4490c5c4e02`](https://github.com/torvalds/linux/commit/b4490c5c4e023f09b7d27c9a9d3e7ad7d09ea6bf) `BPF_FUNC_get_numa_node_id()` | 4.10 | | [`2d0e30c30f84`](https://github.com/torvalds/linux/commit/2d0e30c30f84d08dc16f0f2af41f1b8a85f0755e) `BPF_FUNC_get_prandom_u32()` | 4.1 | | [`03e69b508b6f`](https://github.com/torvalds/linux/commit/03e69b508b6f7c51743055c9f61d1dfeadf4b635) `BPF_FUNC_get_route_realm()` | 4.4 | | [`c46646d0484f`](https://github.com/torvalds/linux/commit/c46646d0484f5d08e2bede9b45034ba5b8b489cc) @@ -361,8 +363,7 @@ Return only the translated content, not including the original text.`BPF_FUNC_ge `BPF_FUNC_setsockopt()` | 4.13 | | [`8c4b4c7e9ff0`](https://github.com/torvalds/linux/commit/8c4b4c7e9ff0447995750d9329949fa082520269) `BPF_FUNC_sk_ancestor_cgroup_id()` | 5.7 | | [`f307fa2cb4c9`](https://github.com/torvalds/linux/commit/f307fa2cb4c935f7f1ff0aeb880c7b44fb9a642b) `BPF_FUNC_sk_assign()` | 5.6 | | [`cf7fbe660f2d`](https://github.com/torvalds/linux/commit/cf7fbe660f2dbd738ab58aea8e9b0ca6ad232449)". - -format: Return only the translated content, not including the original text.`BPF_FUNC_sk_cgroup_id()` | 5.7 | | [`f307fa2cb4c9`](https://github.com/torvalds/linux/commit/f307fa2cb4c935f7f1ff0aeb880c7b44fb9a642b) +`BPF_FUNC_sk_cgroup_id()` | 5.7 | | [`f307fa2cb4c9`](https://github.com/torvalds/linux/commit/f307fa2cb4c935f7f1ff0aeb880c7b44fb9a642b) `BPF_FUNC_sk_fullsock()` | 5.1 | | [`46f8bc92758c`](https://github.com/torvalds/linux/commit/46f8bc92758c6259bcf945e9216098661c1587cd) `BPF_FUNC_sk_lookup_tcp()` | 4.20 | | [`6acc9b432e67`](https://github.com/torvalds/linux/commit/6acc9b432e6714d72d7d77ec7c27f6f8358d0c71) `BPF_FUNC_sk_lookup_udp()` | 4.20 | | [`6acc9b432e67`](https://github.com/torvalds/linux/commit/6acc9b432e6714d72d7d77ec7c27f6f8358d0c71) @@ -375,7 +376,8 @@ format: Return only the translated content, not including the original text.`BPF `BPF_FUNC_skb_adjust_room()` | 4.13 | | [`2be7e212d541`](https://github.com/torvalds/linux/commit/2be7e212d5419a400d051c84ca9fdd083e5aacac) `BPF_FUNC_skb_ancestor_cgroup_id()` | 4.19 | | [`7723628101aa`](https://github.com/torvalds/linux/commit/7723628101aaeb1d723786747529b4ea65c5b5c5) `BPF_FUNC_skb_change_head()` | 4.10 | | [`3a0af8fd61f9`](https://github.com/torvalds/linux/commit/3a0af8fd61f90920f6fa04e4f1e9a6a73c1b4fd2) -`BPF_FUNC_skb_change_proto()` | 4.8 | | [`6578171a7ff0`](https://github.com/torvalds/linux/commit/6578171a7ff0c31dc73258f93da7407510abf085)`BPF_FUNC_skb_change_tail()` | 4.9 | | [`5293efe62df8`](https://github.com/torvalds/linux/commit/5293efe62df81908f2e90c9820c7edcc8e61f5e9) +`BPF_FUNC_skb_change_proto()` | 4.8 | | [`6578171a7ff0`](https://github.com/torvalds/linux/commit/6578171a7ff0c31dc73258f93da7407510abf085) +`BPF_FUNC_skb_change_tail()` | 4.9 | | [`5293efe62df8`](https://github.com/torvalds/linux/commit/5293efe62df81908f2e90c9820c7edcc8e61f5e9) `BPF_FUNC_skb_change_type()` | 4.8 | | [`d2485c4242a8`](https://github.com/torvalds/linux/commit/d2485c4242a826fdf493fd3a27b8b792965b9b9e) `BPF_FUNC_skb_cgroup_classid()` | 5.10 | | [`b426ce83baa7`](https://github.com/torvalds/linux/commit/b426ce83baa7dff947fb354118d3133f2953aac8) `BPF_FUNC_skb_cgroup_id()` | 4.18 | | [`cb20b08ead40`](https://github.com/torvalds/linux/commit/cb20b08ead401fd17627a36f035c0bf5bfee5567) @@ -388,7 +390,8 @@ format: Return only the translated content, not including the original text.`BPF `BPF_FUNC_skb_output()` | 5.5 | | [`a7658e1a4164`](https://github.com/torvalds/linux/commit/a7658e1a4164ce2b9eb4a11aadbba38586e93bd6) `BPF_FUNC_skb_pull_data()` | 4.9 | | [`36bbef52c7eb`](https://github.com/torvalds/linux/commit/36bbef52c7eb646ed6247055a2acd3851e317857) `BPF_FUNC_skb_set_tstamp()` | 5.18 | | [`9bb984f28d5b`](https://github.com/torvalds/linux/commit/9bb984f28d5bcb917d35d930fcfb89f90f9449fd) -`BPF_FUNC_skb_set_tunnel_key()` | 4.3 | | [`d3aa45ce6b94`](https://github.com/torvalds/linux/commit/d3aa45ce6b94c65b83971257317867db13e5f492)`BPF_FUNC_skb_set_tunnel_opt()` | 4.6 | | [`14ca0751c96f`](https://github.com/torvalds/linux/commit/14ca0751c96f8d3d0f52e8ed3b3236f8b34d3460) +`BPF_FUNC_skb_set_tunnel_key()` | 4.3 | | [`d3aa45ce6b94`](https://github.com/torvalds/linux/commit/d3aa45ce6b94c65b83971257317867db13e5f492) +`BPF_FUNC_skb_set_tunnel_opt()` | 4.6 | | [`14ca0751c96f`](https://github.com/torvalds/linux/commit/14ca0751c96f8d3d0f52e8ed3b3236f8b34d3460) `BPF_FUNC_skb_store_bytes()` | 4.1 | | [`91bc4822c3d6`](https://github.com/torvalds/linux/commit/91bc4822c3d61b9bb7ef66d3b77948a4f9177954) `BPF_FUNC_skb_under_cgroup()` | 4.8 | | [`4a482f34afcc`](https://github.com/torvalds/linux/commit/4a482f34afcc162d8456f449b137ec2a95be60d8) `BPF_FUNC_skb_vlan_pop()` | 4.3 | | [`4e10df9a60d9`](https://github.com/torvalds/linux/commit/4e10df9a60d96ced321dd2af71da558c6b750078) @@ -401,7 +404,8 @@ format: Return only the translated content, not including the original text.`BPF `BPF_FUNC_skc_to_tcp6_sock()` | 5.9 | | [`af7ec1383361`](https://github.com/torvalds/linux/commit/af7ec13833619e17f03aa73a785a2f871da6d66b) `BPF_FUNC_skc_to_udp6_sock()` | 5.9 | | [`0d4fad3e57df`](https://github.com/torvalds/linux/commit/0d4fad3e57df2bf61e8ffc8d12a34b1caf9b8835) `BPF_FUNC_skc_to_unix_sock()` | 5.16 | | [`9eeb3aa33ae0`](https://github.com/torvalds/linux/commit/9eeb3aa33ae005526f672b394c1791578463513f) -`BPF_FUNC_snprintf()` | 5.13 | | [`7b15523a989b`](https://github.com/torvalds/linux/commit/7b15523a989b63927c2bb08e9b5b0bbc10b58bef)`BPF_FUNC_snprintf_btf()` | 5.10 | | [`c4d0bfb45068`](https://github.com/torvalds/linux/commit/c4d0bfb45068d853a478b9067a95969b1886a30f) +`BPF_FUNC_snprintf()` | 5.13 | | [`7b15523a989b`](https://github.com/torvalds/linux/commit/7b15523a989b63927c2bb08e9b5b0bbc10b58bef) +`BPF_FUNC_snprintf_btf()` | 5.10 | | [`c4d0bfb45068`](https://github.com/torvalds/linux/commit/c4d0bfb45068d853a478b9067a95969b1886a30f) `BPF_FUNC_sock_from_file()` | 5.11 | | [`4f19cab76136`](https://github.com/torvalds/linux/commit/4f19cab76136e800a3f04d8c9aa4d8e770e3d3d8) `BPF_FUNC_sock_hash_update()` | 4.18 | | [`81110384441a`](https://github.com/torvalds/linux/commit/81110384441a59cff47430f20f049e69b98c17f4) `BPF_FUNC_sock_map_update()` | 4.14 | | [`174a79ff9515`](https://github.com/torvalds/linux/commit/174a79ff9515f400b9a6115643dafd62a635b7e6) @@ -414,7 +418,8 @@ format: Return only the translated content, not including the original text.`BPF `BPF_FUNC_sys_bpf()` | 5.14 | | [`79a7f8bdb159`](https://github.com/torvalds/linux/commit/79a7f8bdb159d9914b58740f3d31d602a6e4aca8) `BPF_FUNC_sys_close()` | 5.14 | | [`3abea089246f`](https://github.com/torvalds/linux/commit/3abea089246f76c1517b054ddb5946f3f1dbd2c0) `BPF_FUNC_sysctl_get_current_value()` | 5.2 | | [`1d11b3016cec`](https://github.com/torvalds/linux/commit/1d11b3016cec4ed9770b98e82a61708c8f4926e7) -`BPF_FUNC_sysctl_get_name()` | 5.2 | | [`808649fb787d`](https://github.com/torvalds/linux/commit/808649fb787d918a48a360a668ee4ee9023f0c11)`BPF_FUNC_sysctl_get_new_value()` | 5.2 | | [`4e63acdff864`](https://github.com/torvalds/linux/commit/4e63acdff864654cee0ac5aaeda3913798ee78f6) +`BPF_FUNC_sysctl_get_name()` | 5.2 | | [`808649fb787d`](https://github.com/torvalds/linux/commit/808649fb787d918a48a360a668ee4ee9023f0c11) +`BPF_FUNC_sysctl_get_new_value()` | 5.2 | | [`4e63acdff864`](https://github.com/torvalds/linux/commit/4e63acdff864654cee0ac5aaeda3913798ee78f6) `BPF_FUNC_sysctl_set_new_value()` | 5.2 | | [`4e63acdff864`](https://github.com/torvalds/linux/commit/4e63acdff864654cee0ac5aaeda3913798ee78f6) `BPF_FUNC_tail_call()` | 4.2 | | [`04fd61ab36ec`](https://github.com/torvalds/linux/commit/04fd61ab36ec065e194ab5e74ae34a5240d992bb) `BPF_FUNC_task_pt_regs()` | 5.15 | GPL | [`dd6e10fbd9f`](https://github.com/torvalds/linux/commit/dd6e10fbd9fb86a571d925602c8a24bb4d09a2a7) @@ -426,7 +431,8 @@ format: Return only the translated content, not including the original text.`BPF `BPF_FUNC_tcp_raw_check_syncookie_ipv6()` | 6.0 | | [`33bf9885040c`](https://github.com/torvalds/linux/commit/33bf9885040c399cf6a95bd33216644126728e14) `BPF_FUNC_tcp_raw_gen_syncookie_ipv4()` | 6.0 | | [`33bf9885040c`](https://github.com/torvalds/linux/commit/33bf9885040c399cf6a95bd33216644126728e14) `BPF_FUNC_tcp_raw_gen_syncookie_ipv6()` | 6.0 | | [`33bf9885040c`](https://github.com/torvalds/linux/commit/33bf9885040c399cf6a95bd33216644126728e14) -`BPF_FUNC_tcp_send_ack()` | 5.5 | | [`206057fe020a`](https://github.com/torvalds/linux/commit/206057fe020ac5c037d5e2dd6562a9bd216ec765)".`BPF_FUNC_tcp_sock()` | 5.1 | | [`655a51e536c0`](https://github.com/torvalds/linux/commit/655a51e536c09d15ffa3603b1b6fce2b45b85a1f) +`BPF_FUNC_tcp_send_ack()` | 5.5 | | [`206057fe020a`](https://github.com/torvalds/linux/commit/206057fe020ac5c037d5e2dd6562a9bd216ec765)". +`BPF_FUNC_tcp_sock()` | 5.1 | | [`655a51e536c0`](https://github.com/torvalds/linux/commit/655a51e536c09d15ffa3603b1b6fce2b45b85a1f) `BPF_FUNC_this_cpu_ptr()` | 5.10 | | [`63d9b80dcf2c`](https://github.com/torvalds/linux/commit/63d9b80dcf2c67bc5ade61cbbaa09d7af21f43f1) | `BPF_FUNC_timer_init()` | 5.15 | | [`b00628b1c7d5`](https://github.com/torvalds/linux/commit/b00628b1c7d595ae5b544e059c27b1f5828314b4) `BPF_FUNC_timer_set_callback()` | 5.15 | | [`b00628b1c7d5`](https://github.com/torvalds/linux/commit/b00628b1c7d595ae5b544e059c27b1f5828314b4) @@ -439,8 +445,8 @@ format: Return only the translated content, not including the original text.`BPF `BPF_FUNC_xdp_adjust_meta()` | 4.15 | | [`de8f3a83b0a0`](https://github.com/torvalds/linux/commit/de8f3a83b0a0fddb2cf56e7a718127e9619ea3da) `BPF_FUNC_xdp_adjust_tail()` | 4.18 | | [`b32cc5b9a346`](https://github.com/torvalds/linux/commit/b32cc5b9a346319c171e3ad905e0cddda032b5eb) `BPF_FUNC_xdp_get_buff_len()` | 5.18 | | [`0165cc817075`](https://github.com/torvalds/linux/commit/0165cc817075cf701e4289838f1d925ff1911b3e) -`BPF_FUNC_xdp_load_bytes()` | 5.18 | | [`3f364222d032`](https://github.com/torvalds/linux/commit/3f364222d032eea6b245780e845ad213dab28cdd)". -format: 返回仅翻译的内容, 不包括原文."`BPF_FUNC_xdp_store_bytes()` | 5.18 | | [`3f364222d032`](https://github.com/torvalds/linux/commit/3f364222d032eea6b245780e845ad213dab28cdd) +`BPF_FUNC_xdp_load_bytes()` | 5.18 | | [`3f364222d032`](https://github.com/torvalds/linux/commit/3f364222d032eea6b245780e845ad213dab28cdd) +`BPF_FUNC_xdp_store_bytes()` | 5.18 | | [`3f364222d032`](https://github.com/torvalds/linux/commit/3f364222d032eea6b245780e845ad213dab28cdd) `BPF_FUNC_xdp_output()` | 5.6 | GPL | [`d831ee84bfc9`](https://github.com/torvalds/linux/commit/d831ee84bfc9173eecf30dbbc2553ae81b996c60) `BPF_FUNC_override_return()` | 4.16 | GPL | [`9802d86585db`](https://github.com/torvalds/linux/commit/9802d86585db91655c7d1929a4f6bbe0952ea88e) `BPF_FUNC_sock_ops_cb_flags_set()` | 4.16 | | [`b13d88072172`](https://github.com/torvalds/linux/commit/b13d880721729384757f235166068c315326f4a1) diff --git a/src/bpftrace-tutorial/README.md b/src/bpftrace-tutorial/README.md new file mode 100644 index 00000000..7d6a9013 --- /dev/null +++ b/src/bpftrace-tutorial/README.md @@ -0,0 +1,322 @@ +# bpftrace一行教程 + +该教程通过12个简单小节帮助你了解bpftrace的使用。每一小节都是一行的命令,你可以尝试运行并立刻看到运行效果。该教程系列用来介绍bpftrace的概念。关于bpftrace的完整参考,见[bpftrace手册](https://github.com/iovisor/bpftrace/blob/master/man/adoc/bpftrace.adoc)。 + +该教程贡献者是Brendan Gregg, Netflix (2018), 基于他的FreeBSD DTrace教程系列[DTrace Tutorial](https://wiki.freebsd.org/DTrace/Tutorial)。 + +# 1. 列出所有探针 + +``` +bpftrace -l 'tracepoint:syscalls:sys_enter_*' +``` + +"bpftrace -l" 列出所有探针,并且可以添加搜索项。 + +- 探针是用于捕获事件数据的检测点。 +- 搜索词支持通配符,如`*`和`?`。 +- "bpftrace -l" 也可以通过管道传递给grep,进行完整的正则表达式搜索。 + +# 2. Hello World + +``` +# bpftrace -e 'BEGIN { printf("hello world\n"); }' +Attaching 1 probe... +hello world +^C +``` + +打印欢迎消息。运行后, 按Ctrl-C结束。 + +- `BEGIN`是一个特殊的探针,在程序开始时触发探针执行(类似awk的BEGIN)。你可以使用它设置变量和打印消息头。 +- 探针可以关联动作,把动作放到{}中。这个例子中,探针被触发时会调用printf()。 + +# 3. 文件打开 + +``` +# bpftrace -e 'tracepoint:syscalls:sys_enter_openat { printf("%s %s\n", comm, str(args.filename)); }' +Attaching 1 probe... +snmp-pass /proc/cpuinfo +snmp-pass /proc/stat +snmpd /proc/net/dev +snmpd /proc/net/if_inet6 +^C +``` + +这里我们在文件打开的时候打印进程名和文件名。 + +- 该命令以`tracepoint:syscalls:sys_enter_openat`开始: 这是tracepoint探针类型(内核静态跟踪),当进入`openat()`系统调用时执行该探针。相比kprobes探针(内核动态跟踪,在第6节介绍),我们更加喜欢用tracepoints探针,因为tracepoints有稳定的应用程序编程接口。注意:现代linux系统(glibc >= 2.26),`open`总是调用`openat`系统调用。 +- `comm`是内建变量,代表当前进程的名字。其它类似的变量还有pid和tid,分别表示进程标识和线程标识。 +- `args`是一个包含所有tracepoint参数的结构。这个结构是由bpftrace根据tracepoint信息自动生成的。这个结构的成员可以通过命令`bpftrace -vl tracepoint:syscalls:sys_enter_openat`找到。 +- `args.filename`用来获取args的成员变量`filename`的值。 +- `str()`用来把字符串指针转换成字符串。 + +# 4. 进程级系统调用计数 + +``` +bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @[comm] = count(); }' +Attaching 1 probe... +^C + +@[bpftrace]: 6 +@[systemd]: 24 +@[snmp-pass]: 96 +@[sshd]: 125 +``` + +按Ctrl-C后打印进程的系统调用计数。 + +- @: 表示一种特殊的变量类型,称为map,可以以不同的方式来存储和描述数据。你可以在@后添加可选的变量名(如@num),用来增加可读性或者区分不同的map。 +- []: 可选的中括号允许设置map的关键字,比较像关联数组。 +- count(): 这是一个map函数 - 记录被调用次数。因为调用次数根据comm保存在map里,输出结果是进程执行系统调用的次数统计。 + +Maps会在bpftrace结束(如按Ctrl-C)时自动打印出来。 + +# 5. read()返回值分布统计 + +``` +# bpftrace -e 'tracepoint:syscalls:sys_exit_read /pid == 18644/ { @bytes = hist(args.ret); }' +Attaching 1 probe... +^C + +@bytes: +[0, 1] 12 |@@@@@@@@@@@@@@@@@@@@ | +[2, 4) 18 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | +[4, 8) 0 | | +[8, 16) 0 | | +[16, 32) 0 | | +[32, 64) 30 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| +[64, 128) 19 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | +[128, 256) 1 |@ +``` + +这里统计进程号为18644的进程执行内核函数sys_read()的返回值,并打印出直方图。 +- /.../: 这里设置一个过滤条件(条件判断),满足该过滤条件时才执行{}里面的动作。在这个例子中意思是只追踪进程号为18644的进程。过滤条件表达式也支持布尔运算,如("&&", "||")。 +- ret: 表示函数的返回值。对于sys_read(),它可能是-1(错误)或者成功读取的字节数。 +- @: 类似于上节的map,但是这里没有key,即[]。该map的名称"bytes"会出现在输出中。 +- hist(): 一个map函数,用来描述直方图的参数。输出行以2次方的间隔开始,如`[128, 256)`表示值大于等于128且小于256。后面跟着位于该区间的参数个数统计,最后是ascii码表示的直方图。该图可以用来研究它的模式分布。 +- 其它的map函数还有lhist(线性直方图),count(),sum(),avg(),min()和max()。 + +# 6. 内核动态跟踪read()返回的字节数 + +``` +# bpftrace -e 'kretprobe:vfs_read { @bytes = lhist(retval, 0, 2000, 200); }' +Attaching 1 probe... +^C + +@bytes: +(...,0] 0 | | +[0, 200) 66 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| +[200, 400) 2 |@ | +[400, 600) 3 |@@ | +[600, 800) 0 | | +[800, 1000) 5 |@@@ | +[1000, 1200) 0 | | +[1200, 1400) 0 | | +[1400, 1600) 0 | | +[1600, 1800) 0 | | +[1800, 2000) 0 | | +[2000,...) 39 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | +``` + +使用内核动态跟踪技术显示read()返回字节数的直方图。 + +- `kretprobe:vfs_read`: 这是kretprobe类型(动态跟踪内核函数返回值)的探针,跟踪`vfs_read`内核函数。此外还有kprobe类型的探针(在下一节介绍)用于跟踪内核函数的调用。它们是功能强大的探针类型,让我们可以跟踪成千上万的内核函数。然而它们是"不稳定"的探针类型:由于它们可以跟踪任意内核函数,对于不同的内核版本,kprobe和kretprobe不一定能够正常工作。因为内核函数名,参数,返回值和作用等可能会变化。此外,由于它们用来跟踪底层内核的,你需要浏览内核源代码,理解这些探针的参数和返回值的意义。 +- lhist(): 线性直方图函数:参数分别是value,最小值,最大值,步进值。第一个参数(`retval`)表示系统调用sys_read()返回值:即成功读取的字节数。 + +# 7. read()调用的时间 + +``` +# bpftrace -e 'kprobe:vfs_read { @start[tid] = nsecs; } kretprobe:vfs_read /@start[tid]/ { @ns[comm] = hist(nsecs - @start[tid]); delete(@start[tid]); }' +Attaching 2 probes... + +[...] +@ns[snmp-pass]: +[0, 1] 0 | | +[2, 4) 0 | | +[4, 8) 0 | | +[8, 16) 0 | | +[16, 32) 0 | | +[32, 64) 0 | | +[64, 128) 0 | | +[128, 256) 0 | | +[256, 512) 27 |@@@@@@@@@ | +[512, 1k) 125 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | +[1k, 2k) 22 |@@@@@@@ | +[2k, 4k) 1 | | +[4k, 8k) 10 |@@@ | +[8k, 16k) 1 | | +[16k, 32k) 3 |@ | +[32k, 64k) 144 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| +[64k, 128k) 7 |@@ | +[128k, 256k) 28 |@@@@@@@@@@ | +[256k, 512k) 2 | | +[512k, 1M) 3 |@ | +[1M, 2M) 1 | | +``` + +根据进程名,以直方图的形式显示read()调用花费的时间,时间单位为纳秒。 + +- @start[tid]: 使用线程ID作为key。某一时刻,可能有许许多多的read调用正在进行,我们希望为每个调用记录一个起始时间戳。这要如何做到呢?我们可以为每个read调用建立一个唯一的标识符,并用它作为key进行统计。由于内核线程一次只能执行一个系统调用,我们可以使用线程ID作为上述标识符。 +- nsecs: 自系统启动到现在的纳秒数。这是一个高精度时间戳,可以用来对事件计时。 +- /@start[tid]/: 该过滤条件检查起始时间戳是否被记录。程序可能在某次read调用中途被启动,如果没有这个过滤条件,这个调用的时间会被统计为now-zero,而不是now-start。 +- delete(@start[tid]): 释放变量。 + +# 8. 统计进程级别的事件 + +``` +# bpftrace -e 'tracepoint:sched:sched* { @[probe] = count(); } interval:s:5 { exit(); }' +Attaching 25 probes... +@[tracepoint:sched:sched_wakeup_new]: 1 +@[tracepoint:sched:sched_process_fork]: 1 +@[tracepoint:sched:sched_process_exec]: 1 +@[tracepoint:sched:sched_process_exit]: 1 +@[tracepoint:sched:sched_process_free]: 2 +@[tracepoint:sched:sched_process_wait]: 7 +@[tracepoint:sched:sched_wake_idle_without_ipi]: 53 +@[tracepoint:sched:sched_stat_runtime]: 212 +@[tracepoint:sched:sched_wakeup]: 253 +@[tracepoint:sched:sched_waking]: 253 +@[tracepoint:sched:sched_switch]: 510 +``` + +这里统计5秒内进程级的事件并打印。 + +- sched: `sched`探针可以探测调度器的高级事件和进程事件如fork, exec和上下文切换。 +- probe: 探针的完整名称。 +- interval:s:5: 这是一个每5秒在每个CPU上触发一次的探针,它用来创建脚本级别的间隔或超时时间。 +- exit(): 退出bpftrace。 + +# 9. 分析内核实时函数栈 + +``` +# bpftrace -e 'profile:hz:99 { @[kstack] = count(); }' +Attaching 1 probe... +^C + +[...] +@[ +filemap_map_pages+181 +__handle_mm_fault+2905 +handle_mm_fault+250 +__do_page_fault+599 +async_page_fault+69 +]: 12 +[...] +@[ +cpuidle_enter_state+164 +do_idle+390 +cpu_startup_entry+111 +start_secondary+423 +secondary_startup_64+165 +]: 22122 +``` + +以99赫兹的频率分析内核调用栈并打印次数统计。 + +- profile:hz:99: 这里所有cpu都以99赫兹的频率采样分析内核栈。为什么是99而不是100或者1000?我们想要抓取足够详细的内核执行时内核栈信息,但是频率太大影响性能。100赫兹足够了,但是我们不想用正好100赫兹,这样采样频率可能与其他定时事件步调一致,所以99赫兹是一个理想的选择。 +- kstack: 返回内核调用栈。这里作为map的关键字,可以跟踪次数。这些输出信息可以使用火焰图可视化。此外`ustack`用来分析用户级堆栈。 + +# 10. 调度器跟踪 + +``` +# bpftrace -e 'tracepoint:sched:sched_switch { @[kstack] = count(); }' +^C +[...] + +@[ +__schedule+697 +__schedule+697 +schedule+50 +schedule_timeout+365 +xfsaild+274 +kthread+248 +ret_from_fork+53 +]: 73 +@[ +__schedule+697 +__schedule+697 +schedule_idle+40 +do_idle+356 +cpu_startup_entry+111 +start_secondary+423 +secondary_startup_64+165 +]: 305 +``` + +这里统计进程上下文切换次数。以上输出被截断,只输出了最后两个结果。 + +- sched: 跟踪调度类别的调度器事件:sched_switch, sched_wakeup, sched_migrate_task等。 +- sched_switch: 当线程释放cpu资源,当前不运行时触发。这里可能的阻塞事件:如等待I/O,定时器,分页/交换,锁等。 +- kstack: 内核堆栈跟踪,打印调用栈。 +- sched_switch在线程切换的时候触发,打印的调用栈是被切换出cpu的那个线程。像你使用其他探针一样,注意这里的上下文,例如comm, pid, kstack等等,并不一定反映了探针的目标的状态。 + +# 11. 块级I/O跟踪 + +``` +# bpftrace -e 'tracepoint:block:block_rq_issue { @ = hist(args.bytes); }' +Attaching 1 probe... +^C + +@: +[0, 1] 1 |@@ | +[2, 4) 0 | | +[4, 8) 0 | | +[8, 16) 0 | | +[16, 32) 0 | | +[32, 64) 0 | | +[64, 128) 0 | | +[128, 256) 0 | | +[256, 512) 0 | | +[512, 1K) 0 | | +[1K, 2K) 0 | | +[2K, 4K) 0 | | +[4K, 8K) 24 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| +[8K, 16K) 2 |@@@@ | +[16K, 32K) 6 |@@@@@@@@@@@@@ | +[32K, 64K) 5 |@@@@@@@@@@ | +[64K, 128K) 0 | | +[128K, 256K) 1 |@@ | + +``` + +以上是块I/O请求字节数的直方图。 + +- tracepoint:block: 块类别的跟踪点跟踪块级I/O事件。 +- block_rq_issue: 当I/O提交到块设备时触发。 +- args.bytes: 跟踪点block_rq_issue的参数成员bytes,表示提交I/O请求的字节数。 + +该探针的上下文是非常重要的: 它在I/O请求被提交给块设备时触发。这通常发生在进程上下文,此时通过内核的comm可以得到进程名;也可能发生在内核上下文,(如readahead),此时不能显示预期的进程号和进程名信息。 + +# 12. 内核结构跟踪 + +``` +# cat path.bt +#ifndef BPFTRACE_HAVE_BTF +#include +#include +#endif + +kprobe:vfs_open +{ + printf("open path: %s\n", str(((struct path *)arg0)->dentry->d_name.name)); +} + +# bpftrace path.bt +Attaching 1 probe... +open path: dev +open path: if_inet6 +open path: retrans_time_ms +[...] +``` + + +这里使用内核动态跟踪技术跟踪vfs_read()函数,该函数的(struct path *)作为第一个参数。 + +- kprobe: 如前面所述,这是内核动态跟踪kprobe探针类型,跟踪内核函数的调用(kretprobe探针类型跟踪内核函数返回值)。 +- `arg0` 是一个内建变量,表示探针的第一个参数,其含义由探针类型决定。对于`kprobe`类型探针,它表示函数的第一个参数。其它参数使用arg1,...,argN访问。 +- `((struct path *)arg0)->dentry->d_name.name`: 这里`arg0`作为`struct path *`并引用dentry。 +- #include: 在没有BTF (BPF Type Format) 的情况下,包含必要的path和dentry类型声明的头文件。 + +bpftrace对内核结构跟踪的支持和bcc是一样的,允许使用内核头文件。这意味着大多数结构是可用的,但是并不是所有的,有时需要手动增加某些结构的声明。例如这个例子,见[dcsnoop tool](https://github.com/iovisor/bpftrace/blob/master/docs/../tools/dcsnoop.bt),包含struct nameidata的声明。倘若内核有提供BTF数据,则所有结构都可用。 + +现在,你已经理解了bpftrace的大部分功能,你可以开始使用和编写强大的一行命令。查阅[参考手册](https://github.com/iovisor/bpftrace/blob/master/docs/reference_guide.md)更多的功能。 \ No newline at end of file diff --git a/src/bpftrace-tutorial/README_en.md b/src/bpftrace-tutorial/README_en.md new file mode 100644 index 00000000..94596966 --- /dev/null +++ b/src/bpftrace-tutorial/README_en.md @@ -0,0 +1,326 @@ +# The bpftrace One-Liner Tutorial + +This teaches you bpftrace for Linux in 12 easy lessons, where each lesson is a one-liner you can try running. This series of one-liners introduces concepts which are summarized as bullet points. For a full reference to bpftrace, see the [Man page](https://github.com/iovisor/bpftrace/blob/master/docs/../man/adoc/bpftrace.adoc) + +Contributed by Brendan Gregg, Netflix (2018), based on his FreeBSD [DTrace Tutorial](https://wiki.freebsd.org/DTrace/Tutorial). + +# Lesson 1. Listing Probes + +``` +bpftrace -l 'tracepoint:syscalls:sys_enter_*' +``` + +"bpftrace -l" lists all probes, and a search term can be added. + +- A probe is an instrumentation point for capturing event data. +- The supplied search term supports wildcards/globs (`*` and `?`) +- "bpftrace -l" can also be piped to grep(1) for full regular expression searching. + +# Lesson 2. Hello World + +``` +# bpftrace -e 'BEGIN { printf("hello world\n"); }' +Attaching 1 probe... +hello world +^C +``` + +This prints a welcome message. Run it, then hit Ctrl-C to end. + +- The word `BEGIN` is a special probe that fires at the start of the program (like awk's BEGIN). You can use it to set variables and print headers. +- An action can be associated with probes, in { }. This example calls printf() when the probe fires. + +# Lesson 3. File Opens + +``` +# bpftrace -e 'tracepoint:syscalls:sys_enter_openat { printf("%s %s\n", comm, str(args.filename)); }' +Attaching 1 probe... +snmp-pass /proc/cpuinfo +snmp-pass /proc/stat +snmpd /proc/net/dev +snmpd /proc/net/if_inet6 +^C +``` + +This traces file opens as they happen, and we're printing the process name and pathname. + +- It begins with the probe `tracepoint:syscalls:sys_enter_openat`: this is the tracepoint probe type (kernel static tracing), and is instrumenting when the `openat()` syscall begins (is entered). Tracepoints are preferred over kprobes (kernel dynamic tracing, introduced in lesson 6), since tracepoints have stable API. Note: In modern Linux systems (glibc >= 2.26) the `open` wrapper always calls the `openat` syscall. +- `comm` is a builtin variable that has the current process's name. Other similar builtins include pid and tid. +- `args` is a struct containing all the tracepoint arguments. This +struct is automatically generated by bpftrace based tracepoint information. The +members of this struct can be found with: `bpftrace -vl tracepoint:syscalls:sys_enter_openat`. +- `args.filename` accesses the `args` struct and gets the value of the + `filename` member. +- `str()` turns a pointer into the string it points to. + +# Lesson 4. Syscall Counts By Process + +``` +bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @[comm] = count(); }' +Attaching 1 probe... +^C + +@[bpftrace]: 6 +@[systemd]: 24 +@[snmp-pass]: 96 +@[sshd]: 125 +``` + +This summarizes syscalls by process name, printing a report on Ctrl-C. + +- @: This denotes a special variable type called a map, which can store and summarize data in different ways. You can add an optional variable name after the @, eg "@num", either to improve readability, or to differentiate between more than one map. +- []: The optional brackets allow a key to be set for the map, much like an associative array. +- count(): This is a map function – the way it is populated. count() counts the number of times it is called. Since this is saved by comm, the result is a frequency count of system calls by process name. + +Maps are automatically printed when bpftrace ends (eg, via Ctrl-C). + +# Lesson 5. Distribution of read() Bytes + +``` +# bpftrace -e 'tracepoint:syscalls:sys_exit_read /pid == 18644/ { @bytes = hist(args.ret); }' +Attaching 1 probe... +^C + +@bytes: +[0, 1] 12 |@@@@@@@@@@@@@@@@@@@@ | +[2, 4) 18 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | +[4, 8) 0 | | +[8, 16) 0 | | +[16, 32) 0 | | +[32, 64) 30 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| +[64, 128) 19 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | +[128, 256) 1 |@ +``` + +This summarizes the return value of the sys_read() kernel function for PID 18644, printing it as a histogram. + +- /.../: This is a filter (aka predicate), which acts as a filter for the action. The action is only executed if the filtered expression is true, in this case, only for the process ID 18644. Boolean operators are supported ("&&", "||"). +- ret: This is the return value of the function. For sys_read(), this is either -1 (error) or the number of bytes successfully read. +- @: This is a map similar to the previous lesson, but without any keys ([]) this time, and the name "bytes" which decorates the output. +- hist(): This is a map function which summarizes the argument as a power-of-2 histogram. The output shows rows that begin with interval notation, where, for example `[128, 256)` means that the value is: 128 <= value < 256. The next number is the count of occurrences, and then an ASCII histogram is printed to visualize that count. The histogram can be used to study multi-modal distributions. +- Other map functions include lhist() (linear hist), count(), sum(), avg(), min(), and max(). + +# Lesson 6. Kernel Dynamic Tracing of read() Bytes + +``` +# bpftrace -e 'kretprobe:vfs_read { @bytes = lhist(retval, 0, 2000, 200); }' +Attaching 1 probe... +^C + +@bytes: +(...,0] 0 | | +[0, 200) 66 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| +[200, 400) 2 |@ | +[400, 600) 3 |@@ | +[600, 800) 0 | | +[800, 1000) 5 |@@@ | +[1000, 1200) 0 | | +[1200, 1400) 0 | | +[1400, 1600) 0 | | +[1600, 1800) 0 | | +[1800, 2000) 0 | | +[2000,...) 39 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | +``` + +Summarize read() bytes as a linear histogram, and traced using kernel dynamic tracing. + +- It begins with the probe `kretprobe:vfs_read`: this is the kretprobe probe type (kernel dynamic tracing of function returns) instrumenting the `vfs_read()` kernel function. There is also the kprobe probe type (shown in the next lesson), to instrument when functions begin execution (are entered). These are powerful probe types, letting you trace tens of thousands of different kernel functions. However, these are "unstable" probe types: since they can trace any kernel function, there is no guarantee that your kprobe/kretprobe will work between kernel versions, as the function names, arguments, return values, and roles may change. Also, since it is tracing the raw kernel, you'll need to browse the kernel source to understand what these probes, arguments, and return values, mean. +- lhist(): this is a linear histogram, where the arguments are: value, min, max, step. The first argument (`retval`) of vfs_read() is the return value: the number of bytes read. + +# Lesson 7. Timing read()s + +``` +# bpftrace -e 'kprobe:vfs_read { @start[tid] = nsecs; } kretprobe:vfs_read /@start[tid]/ { @ns[comm] = hist(nsecs - @start[tid]); delete(@start[tid]); }' +Attaching 2 probes... + +[...] +@ns[snmp-pass]: +[0, 1] 0 | | +[2, 4) 0 | | +[4, 8) 0 | | +[8, 16) 0 | | +[16, 32) 0 | | +[32, 64) 0 | | +[64, 128) 0 | | +[128, 256) 0 | | +[256, 512) 27 |@@@@@@@@@ | +[512, 1k) 125 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | +[1k, 2k) 22 |@@@@@@@ | +[2k, 4k) 1 | | +[4k, 8k) 10 |@@@ | +[8k, 16k) 1 | | +[16k, 32k) 3 |@ | +[32k, 64k) 144 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| +[64k, 128k) 7 |@@ | +[128k, 256k) 28 |@@@@@@@@@@ | +[256k, 512k) 2 | | +[512k, 1M) 3 |@ | +[1M, 2M) 1 | | +``` + +Summarize the time spent in read(), in nanoseconds, as a histogram, by process name. + +- @start[tid]: This uses the thread ID as a key. There may be many reads in-flight, and we want to store a start timestamp to each. How? We could construct a unique identifier for each read, and use that as the key. But because kernel threads can only be executing one syscall at a time, we can use the thread ID as the unique identifier, as each thread cannot be executing more than one. +- nsecs: Nanoseconds since boot. This is a high resolution timestamp counter than can be used to time events. +- /@start[tid]/: This filter checks that the start time was seen and recorded. Without this filter, this program may be launched during a read and only catch the end, resulting in a time calculation of now - zero, instead of now - start. + +- delete(@start[tid]): this frees the variable. + +# Lesson 8. Count Process-Level Events + +``` +# bpftrace -e 'tracepoint:sched:sched* { @[probe] = count(); } interval:s:5 { exit(); }' +Attaching 25 probes... +@[tracepoint:sched:sched_wakeup_new]: 1 +@[tracepoint:sched:sched_process_fork]: 1 +@[tracepoint:sched:sched_process_exec]: 1 +@[tracepoint:sched:sched_process_exit]: 1 +@[tracepoint:sched:sched_process_free]: 2 +@[tracepoint:sched:sched_process_wait]: 7 +@[tracepoint:sched:sched_wake_idle_without_ipi]: 53 +@[tracepoint:sched:sched_stat_runtime]: 212 +@[tracepoint:sched:sched_wakeup]: 253 +@[tracepoint:sched:sched_waking]: 253 +@[tracepoint:sched:sched_switch]: 510 +``` + +Count process-level events for five seconds, printing a summary. + +- sched: The `sched` probe category has high-level scheduler and process events, such as fork, exec, and context switch. +- probe: The full name of the probe. +- interval:s:5: This is a probe that fires once every 5 seconds, on one CPU only. It is used for creating script-level intervals or timeouts. +- exit(): This exits bpftrace. + +# Lesson 9. Profile On-CPU Kernel Stacks + +``` +# bpftrace -e 'profile:hz:99 { @[kstack] = count(); }' +Attaching 1 probe... +^C + +[...] +@[ +filemap_map_pages+181 +__handle_mm_fault+2905 +handle_mm_fault+250 +__do_page_fault+599 +async_page_fault+69 +]: 12 +[...] +@[ +cpuidle_enter_state+164 +do_idle+390 +cpu_startup_entry+111 +start_secondary+423 +secondary_startup_64+165 +]: 22122 +``` + +Profile kernel stacks at 99 Hertz, printing a frequency count. + +- profile:hz:99: This fires on all CPUs at 99 Hertz. Why 99 and not 100 or 1000? We want frequent enough to catch both the big and small picture of execution, but not too frequent as to perturb performance. 100 Hertz is enough. But we don't want 100 exactly, as sampling may occur in lockstep with other timed activities, hence 99. +- kstack: Returns the kernel stack trace. This is used as a key for the map, so that it can be frequency counted. The output of this is ideal to be visualized as a flame graph. There is also `ustack` for the user-level stack trace. + +# Lesson 10. Scheduler Tracing + +``` +# bpftrace -e 'tracepoint:sched:sched_switch { @[kstack] = count(); }' +^C +[...] + +@[ +__schedule+697 +__schedule+697 +schedule+50 +schedule_timeout+365 +xfsaild+274 +kthread+248 +ret_from_fork+53 +]: 73 +@[ +__schedule+697 +__schedule+697 +schedule_idle+40 +do_idle+356 +cpu_startup_entry+111 +start_secondary+423 +secondary_startup_64+165 +]: 305 +``` + +This counts stack traces that led to context switching (off-CPU) events. The above output has been truncated to show the last two only. + +- sched: The sched category has tracepoints for different kernel CPU scheduler events: sched_switch, sched_wakeup, sched_migrate_task, etc. +- sched_switch: This probe fires when a thread leaves CPU. This will be a blocking event: eg, waiting on I/O, a timer, paging/swapping, or a lock. +- kstack: A kernel stack trace. +- sched_switch fires in thread context, so that the stack refers to the thread who is leaving. As you use other probe types, pay attention to context, as comm, pid, kstack, etc, may not refer to the target of the probe. + +# Lesson 11. Block I/O Tracing + +``` +# bpftrace -e 'tracepoint:block:block_rq_issue { @ = hist(args.bytes); }' +Attaching 1 probe... +^C + +@: +[0, 1] 1 |@@ | +[2, 4) 0 | | +[4, 8) 0 | | +[8, 16) 0 | | +[16, 32) 0 | | +[32, 64) 0 | | +[64, 128) 0 | | +[128, 256) 0 | | +[256, 512) 0 | | +[512, 1K) 0 | | +[1K, 2K) 0 | | +[2K, 4K) 0 | | +[4K, 8K) 24 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| +[8K, 16K) 2 |@@@@ | +[16K, 32K) 6 |@@@@@@@@@@@@@ | +[32K, 64K) 5 |@@@@@@@@@@ | +[64K, 128K) 0 | | +[128K, 256K) 1 |@@ | + +``` + +Block I/O requests by size in bytes, as a histogram. + +- tracepoint:block: The block category of tracepoints traces various block I/O (storage) events. +- block_rq_issue: This fires when an I/O is issued to the device. +- args.bytes: This is a member from the tracepoint block_rq_issue arguments which shows the size in bytes. + +The context of this probe is important: this fires when the I/O is issued to the device. This often happens in process context, where builtins like comm will show you the process name, but it can also happen from kernel context (eg, readahead) when the pid and comm will not show the application you expect. + +# Lesson 12. Kernel Struct Tracing + +``` +# cat path.bt +#ifndef BPFTRACE_HAVE_BTF +#include +#include +#endif + +kprobe:vfs_open +{ + printf("open path: %s\n", str(((struct path *)arg0)->dentry->d_name.name)); +} + +# bpftrace path.bt +Attaching 1 probe... +open path: dev +open path: if_inet6 +open path: retrans_time_ms +[...] +``` + +This uses kernel dynamic tracing of the vfs_open() function, which has a (struct path *) as the first argument. + +- kprobe: As mentioned earlier, this is the kernel dynamic tracing probe type, which traces the entry of kernel functions (use kretprobe to trace their returns). +- `arg0` is a builtin variable containing the first probe argument, the meaning of which is defined by the probe type. For `kprobe`, it is the first argument to the function. Other arguments can be accessed as arg1, ..., argN. +- `((struct path *)arg0)->dentry->d_name.name`: this casts `arg0` as `struct path *`, then dereferences dentry, etc. +- #include: these are necessary to include struct definitions for path and dentry on systems where the kernel was built without BTF (BPF Type Format) data. + +The kernel struct support is the same as bcc, making use of kernel headers. This means that many structs are available, but not everything, and sometimes it might be necessary to manually include a struct. For an example of this, see the [dcsnoop tool](https://github.com/iovisor/bpftrace/blob/master/docs/../tools/dcsnoop.bt), which includes a portion of struct nameidata manually as it wasn't in the available headers. If the kernel has BTF data, all kernel structs are always available. + +At this point you understand much of bpftrace, and can begin to use and write powerful one-liners. See the [Reference Guide](https://github.com/iovisor/bpftrace/blob/master/docs/reference_guide.md) for more capabilities. \ No newline at end of file