Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add shared basic block library #18497

Merged
merged 13 commits into from
Jan 24, 2025

Conversation

paldepind
Copy link
Contributor

@paldepind paldepind commented Jan 15, 2025

Adds a shared basic block library in the controlflow pack and modifies the basic block implementation of Swift, Ruby, C#, and Rust to use it.

A few notes:

  • There's two ways to use the basic block library. Either through the new codeql.controlflow.BasicBlock or through the existing codeql.controlflow.Cfg. The former is suitable for languages that don't use the control graph library and the latter for those that do. In this PR all languages use the latter, but I've tried instantiating codeql.controlflow.BasicBlock for Go and that seems to work fine as well.

  • The BasicBlock class for C# now has both the current getASuccessorByType/1 method and a new getASuccessor/1 method that it inherits from the basic block library and which is the name used in Ruby, Rust, and Swift. We could consider deprecating getASuccessorByType/1 in order to not have two methods doing the same and to increase consistency between language libraries.

  • For the BasicBlock subclasses in Cfg.qll I've changed the current names a bit, such that they are all consistently of the form ${name}BasicBlock. For instance JoinBlock is instead JoinBasicBlock. For the existing language-level basic block libraries I've kept the current names for backwards compatibility, so only Rust use the new names.

@github-actions github-actions bot added C# Ruby Rust Pull requests that update Rust code Swift labels Jan 15, 2025
@paldepind paldepind force-pushed the shared-basic-block-library branch from 1c9e7cc to 5bfeff6 Compare January 15, 2025 14:26
@paldepind paldepind force-pushed the shared-basic-block-library branch 5 times, most recently from 7112872 to b313a48 Compare January 16, 2025 15:03
@paldepind paldepind force-pushed the shared-basic-block-library branch from b313a48 to 8b20b0d Compare January 16, 2025 15:37
@paldepind paldepind marked this pull request as ready for review January 16, 2025 15:45
@paldepind paldepind requested review from a team as code owners January 16, 2025 15:45
@hvitved
Copy link
Contributor

hvitved commented Jan 17, 2025

  • We could consider deprecating getASuccessorByType/1 in order to not have two methods doing the same and to increase consistency between language libraries.

I think we should do that.

Copy link
Contributor

@hvitved hvitved left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice work, great to see even more shared code 🎉

shared/controlflow/codeql/controlflow/BasicBlock.qll Outdated Show resolved Hide resolved
shared/controlflow/codeql/controlflow/BasicBlock.qll Outdated Show resolved Hide resolved
shared/controlflow/codeql/controlflow/BasicBlock.qll Outdated Show resolved Hide resolved
shared/controlflow/codeql/controlflow/BasicBlock.qll Outdated Show resolved Hide resolved
shared/controlflow/codeql/controlflow/Cfg.qll Outdated Show resolved Hide resolved
ruby/ql/lib/codeql/ruby/controlflow/BasicBlocks.qll Outdated Show resolved Hide resolved
ruby/ql/lib/codeql/ruby/controlflow/BasicBlocks.qll Outdated Show resolved Hide resolved
@paldepind
Copy link
Contributor Author

Thanks for the review and comments. I've addressed all but one.

@paldepind
Copy link
Contributor Author

paldepind commented Jan 17, 2025

  • We could consider deprecating getASuccessorByType/1 in order to not have two methods doing the same and to increase consistency between language libraries.

I think we should do that.

I'll do that in a follow up PR 👍

@paldepind paldepind requested a review from a team as a code owner January 17, 2025 12:32
@github-actions github-actions bot added the Actions Analysis of GitHub Actions label Jan 17, 2025
Comment on lines 1737 to 1742
predicate immediatelyControls(BasicBlock succ, SuccessorType s) {
succ = this.getASuccessor(s) and
forall(BasicBlock pred | pred = succ.getAPredecessor() and pred != this |
succ.dominates(pred)
)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Java, this predicate is called dominatingEdge and it includes bbIDominates(this, succ) as an additional conjunct. I think that ought to be semantically identical, but it might offer a better join-order since immediate dominance joined with the successor relation yields a much stronger context for the forall.
Also, in this general library, it's probably worth it to expand the qldoc a bit on the relationship between the concept of controls and dominance. In short: controls equals edge dominance. See also the qldoc for dominatingEdge in Java.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the concept of dominating edge aka immediatelyControls might very well be applicable beyond ConditionBasicBlocks, so it should possibly be available as a top-level predicate between basic blocks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In particular, this concept can be put into BasicBlocks.qll as it's independent of .isCondition().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that ought to be semantically identical

That seems true to me.

it might offer a better join-order since immediate dominance joined with the successor relation yields a much stronger context for the forall.

How can I validate that? I tried looking at the evaluator logs for the predicate with and without that conjunct and didn't see any noticeable difference.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, in this general library, it's probably worth it to expand the qldoc a bit on the relationship between the concept of controls and dominance.

I've written some stuff now for the immediatelyControls predicate. Let me know if there's more I should add.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can I validate that? I tried looking at the evaluator logs for the predicate with and without that conjunct and didn't see any noticeable difference

Look at the RA with tuple counts and compare the two versions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, including the extra conjunct did indeed produce fewer tuples during the calculation 👍

Copy link
Contributor

@geoffw0 geoffw0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Swift changes LGTM. I'd like to see DCA runs at some point before this is merged.

@paldepind paldepind force-pushed the shared-basic-block-library branch from 468707e to 62a459d Compare January 20, 2025 12:33
Comment on lines 169 to 183
* The above implies that this block immediately dominates `succ`. But
* "controls" is a stronger notion than "dominates". It is not the case
* that any immediate successor that is immediately dominated by this block
* is also immediately controlled by this block. To see why, consider this
* example corresponding to an `if`-statement without an `else` block:
* ```
* ... --> cond --[true]--> ... --> if stmt
* \ /
* ----[false]-----------
* ```
* The basic block for `cond` immediately dominates the immediately
* succeeding basic block for the `if` statement. But the `if` statement
* is not immediately controlled by the `cond` basic block and the `false`
* edge since it is also possible to reach the `if` statement via a path
* through the `true` edge.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found this hard to read and also missing some main points (also, the placement of if stmt in that graph is non-sensical to me). How about this:

Suggested change
* The above implies that this block immediately dominates `succ`. But
* "controls" is a stronger notion than "dominates". It is not the case
* that any immediate successor that is immediately dominated by this block
* is also immediately controlled by this block. To see why, consider this
* example corresponding to an `if`-statement without an `else` block:
* ```
* ... --> cond --[true]--> ... --> if stmt
* \ /
* ----[false]-----------
* ```
* The basic block for `cond` immediately dominates the immediately
* succeeding basic block for the `if` statement. But the `if` statement
* is not immediately controlled by the `cond` basic block and the `false`
* edge since it is also possible to reach the `if` statement via a path
* through the `true` edge.
* The concept of an edge `E` controlling a node `N` in a graph can also be
* described as _edge dominance_ in the sense that if `E` was split in two
* with an added node in the middle then "controlled by `E`" would be
* equivalent to dominance by that added node.
* Note that controls/edge-dominance is stronger than node dominance as
* it implies node dominance (by either endpoint), but the converse is not
* true, hence the need for this concept.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel that this explains the notion of "control" by introducing a synonym for it, "edge dominance" and explaining the synonym instead. Then afterwards we have to write "controls/edge-dominance" because there is now two names for the same thing.

Instead, I've now tried to spell out the analogy/relation between dominance and control but without introducing another term helps (it's not standard nor used elsewhere in the library).

  • the placement of if stmt in that graph is non-sensical to me

Sorry, it should have been if expression, not statement. I've fixed the example and moved it inside the body as internal documentation.

Also, as the "immediately" in immediatelyControls is very different from the "immediately" in immediatelyDominates, I've moved the controls vs. dominates explanation to the controls predicate which corresponds more nicely with the dominates/strictlyDominates predicates.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, it should have been if expression, not statement.

That's still doesn't make sense to me - if anything, it's even more confusing. I don't know what an "if expression" is, if anything, I would guess it to be the ternary operator ? : known from e.g. Java/C++/C# etc. But that cannot exist "without an else branch". I expect the false-successor of a condition in an if statement without an else branch to be whatever statement that follows the entire if statement chunk.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expression oriented languages (without if statements) often have an if expression where the else branch is optional. For instance, Rust, Ruby, and OCaml. In Rust if cond { ... } would give rise to the graph in the example.

But let's change it to an if statement as in your suggestion. I've done that 👍

@geoffw0
Copy link
Contributor

geoffw0 commented Jan 21, 2025

Swift DCA results look good but there's a 16.2% analysis time regression - which sounds like a lot, but we get a lot of wobble for metrics like this on the Swift DCA job (e.g. a 17.8% slowdown wobble a month ago). I did another run for another data point and got 13.4% slowdown (and a random failure), which is a little better but far from reassuring. The C# run is showing no such pattern.

Can you say anything about the expected performance impact of this work? (I'm suspicious my concern amounts to nothing)

@paldepind
Copy link
Contributor Author

@geoffw0

Can you say anything about the expected performance impact of this work? (I'm suspicious my concern amounts to nothing)

The expectation is that performance should be the same or improved. Most of the code is identical to the previous code for Swift and in the cases where it is not it should be better performing.

@hvitved
Copy link
Contributor

hvitved commented Jan 23, 2025

I have started DCA runs for the relevant languages.

hvitved
hvitved previously approved these changes Jan 23, 2025
Copy link
Contributor

@hvitved hvitved left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DCA looks good 🎉

Comment on lines 185 to 195
// `if` expression without an `else` block:
// ```
// ... --> cond --[true]--> ... --> if expr
// \ /
// ----[false]-----------
// ```
// The basic block for `cond` immediately dominates the directly
// succeeding basic block for the `if` expression. But the `if`
// expression is not immediately controlled by the `cond` basic block and
// the `false` edge since it is also possible to reach the `if`
// expression via the `true` edge.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just replace if expr with a name like e.g. A in the ascii graph to identify the subsequent basic block?

Suggested change
// `if` expression without an `else` block:
// ```
// ... --> cond --[true]--> ... --> if expr
// \ /
// ----[false]-----------
// ```
// The basic block for `cond` immediately dominates the directly
// succeeding basic block for the `if` expression. But the `if`
// expression is not immediately controlled by the `cond` basic block and
// the `false` edge since it is also possible to reach the `if`
// expression via the `true` edge.
// `if` statement without an `else` block:
// ```
// ... --> cond --[true]--> ... --> A
// \ /
// ----[false]-----------
// ```
// The basic block for `cond` immediately dominates the basic block `A`
// that follows the `if` statement. But `A`
// is not controlled by the `cond` basic block and
// the `false` edge since it is also possible to reach `A`
// via the `true` edge.

@paldepind
Copy link
Contributor Author

Thanks for running the remaining DCAs @hvitved.

@geoffw0
Copy link
Contributor

geoffw0 commented Jan 23, 2025

Most of the code is identical to the previous code for Swift and in the cases where it is not it should be better performing.

I'm satisfied with this + lack of wobble for languages where we get better data. 👍

@paldepind paldepind merged commit a6cd53e into github:main Jan 24, 2025
54 of 55 checks passed
@paldepind paldepind deleted the shared-basic-block-library branch January 24, 2025 09:49
@paldepind
Copy link
Contributor Author

Thanks for reviewing. Really nice that we also got some performance improvements in, to the benefit of the languages that now use the library 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Actions Analysis of GitHub Actions C# documentation Ruby Rust Pull requests that update Rust code Swift
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants