-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce coupling between control flow graph and instructions #808
Conversation
Signed-off-by: Elazar Gershuni <[email protected]>
WalkthroughThe pull request introduces a series of structural changes across multiple files, primarily focusing on the management and representation of control-flow graphs (CFGs) through the new Changes
Possibly related PRs
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🔇 Additional comments (3)src/main/linux_verifier.cpp (3)
Zero initialization of kernel-facing structures is security-critical. Verify that all subsequent field assignments fully initialize the used union members. ✅ Verification successfulZero initialization followed by complete field assignments verified The code properly initializes all required fields after zero initialization:
🏁 Scripts executedThe following scripts were executed for the analysis: Script: #!/bin/bash
# Search for bpf_attr structure definition and usage
ast-grep --pattern 'union bpf_attr {
$$$
}'
# Find all attribute field assignments
rg "attr\." --type cpp -A 1
Length of output: 1973
The buffer size varies between 10 and 1000000 bytes based on verbosity. Ensure the smaller size is sufficient for non-verbose mode. ✅ Verification successfulBuffer sizes are appropriate for their use cases The buffer sizes are correctly set:
The buffer is only used when print_failures is enabled, making the 10-byte allocation in non-verbose mode a memory optimization rather than a limitation. 🏁 Scripts executedThe following scripts were executed for the analysis: Script: #!/bin/bash
# Search for buffer usage patterns to verify minimum size requirements
rg -A 5 "buf\.(data|size)" --type cpp
Length of output: 843 Script: #!/bin/bash
# Search for buffer size initialization and usage
ast-grep --pattern 'buf = std::vector<char>($_)'
# Find logging related code to understand buffer usage
rg -B 5 -A 5 "log_buf|log_size" --type cpp
Length of output: 1624 Script: #!/bin/bash
# Find buffer size initialization and verbosity options
rg -B 10 -A 2 "std::vector<char>" --type cpp
# Look for error message patterns to estimate minimum buffer size
rg "Failed to verify program|LOG:" --type cpp
Length of output: 3003 Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🧹 Outside diff range comments (1)
src/crab_verifier.cpp (1)
Line range hint
71-75
: Consider caching assertions for performanceMultiple iterations over
cfg.assertions_at(label)
could be optimized by storing the result in a local variable.- for (const Assertion& assertion : cfg.assertions_at(label)) { + const auto& assertions = cfg.assertions_at(label); + for (const Assertion& assertion : assertions) {
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
📒 Files selected for processing (24)
src/asm_cfg.cpp
(13 hunks)src/asm_files.cpp
(2 hunks)src/asm_ostream.cpp
(5 hunks)src/asm_parse.cpp
(1 hunks)src/asm_syntax.hpp
(1 hunks)src/asm_unmarshal.cpp
(2 hunks)src/assertions.cpp
(3 hunks)src/config.hpp
(1 hunks)src/crab/cfg.hpp
(2 hunks)src/crab/fwd_analyzer.cpp
(3 hunks)src/crab/split_dbm.cpp
(2 hunks)src/crab/thresholds.cpp
(2 hunks)src/crab/thresholds.hpp
(1 hunks)src/crab/var_factory.cpp
(1 hunks)src/crab/wto.cpp
(5 hunks)src/crab/wto.hpp
(1 hunks)src/crab_utils/graph_ops.hpp
(1 hunks)src/crab_verifier.cpp
(3 hunks)src/crab_verifier.hpp
(3 hunks)src/main/check.cpp
(2 hunks)src/string_constraints.hpp
(1 hunks)src/test/ebpf_yaml.cpp
(1 hunks)src/test/test_verify.cpp
(1 hunks)src/test/test_wto.cpp
(1 hunks)
🧰 Additional context used
📓 Learnings (1)
src/crab_verifier.cpp (2)
Learnt from: elazarg
PR: vbpf/ebpf-verifier#692
File: src/crab/ebpf_domain.cpp:1366-1367
Timestamp: 2024-11-12T09:50:23.369Z
Learning: The project is compiled with C++20.
Learnt from: elazarg
PR: vbpf/ebpf-verifier#692
File: src/crab/ebpf_domain.cpp:641-642
Timestamp: 2024-11-12T09:50:16.890Z
Learning: The project uses the C++20 standard and supports C++20 features.
🪛 cppcheck (2.10-2)
src/asm_ostream.cpp
[error] 134-134: Syntax Error
(internalAstError)
src/asm_cfg.cpp
[error] 134-134: Syntax Error
(internalAstError)
🔇 Additional comments (37)
src/crab/fwd_analyzer.cpp (4)
39-41
: Access assertions via _cfg.assertions_at(label)
The change correctly updates the method to retrieve assertions for the given label, enhancing consistency in how assertions are accessed throughout the codebase.
44-44
: Access instruction via _cfg.instruction_at(label)
Updating to use _cfg.instruction_at(label)
standardizes instruction retrieval, promoting better maintainability.
Line range hint 146-151
: Confirm the use of _cfg.parents_of(head)
in cycle handling
The transition to _cfg.parents_of(head)
in the loop handling logic should maintain the correct aggregation of invariants from parent nodes. Verify that this change does not alter the intended behavior within cycle analysis.
54-56
: Verify the correctness of _cfg.parents_of(node)
Ensure that _cfg.parents_of(node)
returns the correct set of predecessor nodes, accurately replicating the functionality previously provided by _cfg.prev_nodes(node)
.
Run the following script to check for any remaining usages of _cfg.prev_nodes
in the codebase:
✅ Verification successful
parents_of
is correctly implemented and used consistently
The codebase shows parents_of
is properly implemented in cfg.hpp
to return parent nodes through the parents
member. It's used consistently across multiple components including the forward analyzer, threshold analysis, and CFG builder. No instances of the old prev_nodes
method exist, indicating a complete migration.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Identify any remaining usages of `_cfg.prev_nodes`.
# Test: Search for `_cfg.prev_nodes` usage. Expected: No occurrences.
rg '_cfg\.prev_nodes'
Length of output: 23
Script:
#!/bin/bash
# Search for parents_of method definition and usage
ast-grep --pattern 'parents_of($_) { $$$ }'
# Search for any references to parents_of
rg 'parents_of' -A 2
# Search for prev_nodes method definition to see if it was renamed
ast-grep --pattern 'prev_nodes($_) { $$$ }'
Length of output: 1419
src/test/ebpf_yaml.cpp (1)
20-20
: Use explicit namespace aliasing for cfg_t
Introducing using crab::cfg_t;
improves code clarity by explicitly specifying the cfg_t
type from the crab
namespace. Ensure that this does not introduce any name conflicts in the current scope.
src/asm_cfg.cpp (2)
22-77
: Review implementation of cfg_builder_t
The introduction of cfg_builder_t
encapsulates CFG construction and modification logic, enhancing code organization. Verify that all methods within cfg_builder_t
correctly maintain CFG integrity, especially with label uniqueness and edge management.
Line range hint 129-210
: Ensure correct inlining of function macros in add_cfg_nodes
The add_cfg_nodes
function recursively inlines function macros. Confirm that edge cases, such as recursive calls and maximum call stack depth, are properly handled to prevent infinite recursion or stack overflows.
Run the following script to search for potential recursive calls exceeding MAX_CALL_STACK_FRAMES
:
✅ Verification successful
Function macro inlining safeguards are properly implemented
The code has robust safeguards against edge cases:
- Recursion check: Throws
InvalidControlFlow
if same stack frame prefix is detected - Stack depth limit: Hard limit of 8 frames defined by
MAX_CALL_STACK_FRAMES
inebpf_base.h
- Test coverage:
test-data/calllocal.yaml
verifies illegal recursion detection
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Detect function macros that may exceed the maximum call stack frames.
# Test: Search for `CallLocal` instructions and check their nesting levels. Expect: No instances exceeding the limit.
ast-grep --pattern $'CallLocal {
target: $_,
stack_frame_prefix: $_
}'
Length of output: 75
Script:
#!/bin/bash
# Search for CallLocal struct/class definition and its usage
rg -U "struct CallLocal|class CallLocal" -A 10
# Search for MAX_CALL_STACK_FRAMES definition and usage
rg "MAX_CALL_STACK_FRAMES"
# Search for recursive call prevention checks
rg "illegal recursion"
Length of output: 1332
src/asm_files.cpp (1)
208-208
: Function signature update is appropriate
The parameters for find_subprogram
are correctly ordered, enhancing clarity.
src/string_constraints.hpp (1)
19-19
: Minor constructor formatting change
The constructor's formatting change does not impact functionality.
src/config.hpp (1)
5-10
: Added prepare_cfg_options
struct
Introduces configuration options for CFG preparation with sensible default values.
src/crab/thresholds.hpp (1)
61-61
: Ensure all get_thresholds
calls are updated
The method signature of get_thresholds
now accepts const label_t&
. Verify that all usages are updated accordingly.
Run the following script to find and review all method calls:
✅ Verification successful
Let me verify the types of the arguments being passed in the calls.
All get_thresholds
calls are correctly updated
All calls pass a label_t
reference as the first argument:
vertex
inoperator()
head
fromcycle->head()
pre
fromparents_of(head)
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Find usages of `get_thresholds` to ensure they pass `label_t`
rg 'get_thresholds\(' -A 1
Length of output: 607
Script:
#!/bin/bash
# Get more context around the calls to understand the argument types
rg 'get_thresholds\(' -B 2 -A 2
Length of output: 1454
src/test/test_wto.cpp (5)
3-4
: Updated include directive
Including <catch2/catch_all.hpp>
is necessary for the test framework.
8-8
: Simplified type usage with using
directive
Using crab::label_t
improves code readability.
13-21
: Refactored test case using cfg_from_adjacency_list
Replaced manual CFG construction with crab::cfg_from_adjacency_list
for clarity.
31-36
: Refactored second test case for readability
Utilized crab::cfg_from_adjacency_list
to simplify CFG setup in the test.
46-50
: Streamlined third test case with adjacency list
Enhanced test maintainability by adopting crab::cfg_from_adjacency_list
.
src/crab/thresholds.cpp (1)
71-79
: LGTM: Proper usage of cfg methods
The changes correctly use m_cfg.parents_of(head)
to access predecessors, maintaining better encapsulation of the CFG internals.
src/crab_verifier.hpp (2)
31-32
: LGTM: Improved variable naming
Variable renames from generic warnings
to specific reach_warning
and warning_vec
improve code clarity.
Also applies to: 41-42
67-70
: LGTM: Consistent namespace usage
Methods properly qualified with crab::
namespace, improving code clarity and preventing potential name conflicts.
Also applies to: 73-75
src/crab_verifier.cpp (1)
92-96
: LGTM: Improved unreachable code reporting
Better error messages for unreachable code by including the specific assume condition that caused it.
src/crab/wto.hpp (1)
117-117
: LGTM: Explicit namespace usage improves clarity
The change from cfg_t
to crab::cfg_t
helps prevent potential naming conflicts.
src/crab/var_factory.cpp (1)
151-151
: LGTM: Simplified string concatenation
The direct concatenation of 's' and '[' improves readability without changing functionality.
src/asm_syntax.hpp (1)
240-241
: Verify callers handle the inverted boolean logic
The semantic change from is_explicit
to is_implicit
requires verification of all usage sites.
✅ Verification successful
Boolean logic change is safe
The codebase uses is_implicit
consistently:
src/asm_parse.cpp
: Setsis_implicit = false
for test casessrc/asm_cfg.cpp
: Setsis_implicit = true
for control flow assumptionssrc/assertions.cpp
: Correctly checks!is_implicit
to identify explicit assertions
The semantic change from is_explicit
to is_implicit
is properly handled across all usage sites with correct boolean logic.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for usages of the is_explicit/is_implicit field
rg -A 2 "is_(explicit|implicit)"
Length of output: 886
src/crab/wto.cpp (3)
7-7
: LGTM: Using crab namespace for cfg_t
The change aligns with the PR objective of reducing coupling between control flow graph and instructions.
62-62
: Fix: Removed redundant semicolon after member initializer list
113-116
: Improvement: Using std::ranges::reverse_view for reverse iteration
Modern C++ approach that improves readability and maintainability.
Also applies to: 161-165
src/main/check.cpp (1)
242-244
: LGTM: Consistent usage of crab::cfg_t
The namespace change is consistently applied across all code paths (zoneCrab, cfg, and stats domains).
Also applies to: 280-280
src/assertions.cpp (3)
3-3
: Added standard assertion header
305-306
: LGTM: Simplified get_assertions function
The function has been streamlined while maintaining its core functionality.
180-182
: Verify the inverted condition logic
The condition has been changed from is_explicit
to !is_implicit
. Ensure this change maintains the intended behavior.
✅ Verification successful
The change from is_explicit
to !is_implicit
is correct
The codebase consistently uses is_implicit
as the field name in the Assume
struct, with true
being the default value. The change maintains the intended behavior:
is_implicit=false
inasm_parse.cpp
corresponds to!is_implicit
check inassertions.cpp
is_implicit=true
is used for automatically generated assumptions inasm_cfg.cpp
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for any other occurrences of is_explicit/is_implicit to verify consistent usage
# Search for is_explicit usage
echo "Searching for is_explicit usage:"
rg "is_explicit" -A 2
# Search for is_implicit usage
echo "Searching for is_implicit usage:"
rg "is_implicit" -A 2
Length of output: 1045
src/asm_parse.cpp (1)
215-215
: Verify the impact of changing the Assume instruction flag.
The change from is_explicit
to is_implicit
might affect how assumptions are processed in the control flow graph.
✅ Verification successful
The change from is_explicit
to is_implicit
is consistent with the codebase
The codebase shows is_implicit
is used consistently:
- Default value is
true
inasm_syntax.hpp
- Used in control flow handling in
asm_cfg.cpp
for implicit assumptions - Checked in
assertions.cpp
for assumption processing - Current change to
false
inasm_parse.cpp
correctly marks explicit assumptions
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check for any other uses of is_implicit/is_explicit in assumptions
rg -A 2 "is_(explicit|implicit)"
Length of output: 886
src/asm_unmarshal.cpp (2)
420-421
: Parameter reordering improves readability.
Moving pc
to the end of the parameter list follows the pattern of placing less frequently modified parameters last.
9-9
: Verify the necessity of debug.hpp inclusion.
The new include suggests addition of debugging capabilities.
✅ Verification successful
The debug.hpp inclusion is necessary
The file uses the CRAB_ERROR macro defined in debug.hpp for error handling in multiple places:
- Handling unexpected opcodes
- Handling unexpected widths
- Handling invalid instruction classes
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check if debug.hpp is actually used in this file
rg --no-heading "CRAB_DEBUG|crab::debug"
Length of output: 42
Script:
#!/bin/bash
# Check if debug.hpp is used in the specific file
rg --no-heading "CRAB_DEBUG|crab::debug" src/asm_unmarshal.cpp
# Check the content of the file to see what headers are actually used
cat src/asm_unmarshal.cpp
# Look for any debug-related macros or functions that might be using this header
rg -A 3 "debug\.hpp" src/asm_unmarshal.cpp
Length of output: 35725
Script:
#!/bin/bash
# Search for CRAB_ERROR usage which is likely defined in debug.hpp
rg "CRAB_ERROR" src/asm_unmarshal.cpp
# Look for any other debug-related macros or functions that might be using this header
ast-grep --pattern 'CRAB_ERROR($$$)'
Length of output: 5943
src/test/test_verify.cpp (1)
6-9
: Clean dependency management.
The explicit include and using directive improve code clarity by making the cfg_t type dependency visible at the top of the file.
src/crab_utils/graph_ops.hpp (1)
743-744
: LGTM!
The [[maybe_unused]]
attribute is correctly applied to suppress compiler warnings for the loop variable that is only used for iteration count.
src/crab/split_dbm.cpp (2)
788-788
: LGTM!
The logging statement for contradiction case is consistent with the error handling flow and matches the format of other logging statements.
1224-1225
: LGTM!
The [[maybe_unused]]
attribute is correctly applied to suppress compiler warnings for the overflow
variable that is only used in an assert statement.
Signed-off-by: Elazar Gershuni <[email protected]>
Control-flow related code can be largely separated from other parts of the semantics. This is a step in this direction.
The internal struct (previously value_t, now adjacent_t) no longer holds an instruction, and instead holds a separate mapping from labels to instructions. It is also encapsulated inside cfg_t, so client code only talks about labels and child/parent labels, not nodes.
cfg_t now also hides the chimera GuardedInstruction struct behind to different access methods, and is now only known to the cfg_t itself and to the builder. The assertions can potentially be generated on the fly (which might make sense for Callx or other opaque assertions), and GuardedInstruction can be removed.
Only the new cfg_builder_t can create or alter a control flow graph. Any other client code can trivially assume the cfg_t is immutable (as far as control flow goes; instructions are mutable for now). cfg_builder resides inside asm_cfg.cpp, so its effective interface is very small: either create from an instruction sequence, or from an adjacency list (for wto tests).
prepare_cfg_options is moved inside config.hpp, and config.hpp no longer carry any other inclusion.
Summary by CodeRabbit
Release Notes
New Features
Bug Fixes
Documentation
Refactor
Tests