-
Notifications
You must be signed in to change notification settings - Fork 12.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[analyzer] Widen loops: Augment assignment note when values invalidated at the start of a loop #122398
base: main
Are you sure you want to change the base?
[analyzer] Widen loops: Augment assignment note when values invalidated at the start of a loop #122398
Conversation
…ed at the start of a loop This patch augments the assignment note to inform the user when a value is invalidated and reassigned a conjured symbol as part of loop widening.
@llvm/pr-subscribers-clang-static-analyzer-1 @llvm/pr-subscribers-clang Author: Douglas (dgg5503) ChangesThis PR attempts to improve the value assignment diagnostic specifically for the case where The motivation behind this diagnostic is the fact that the current diagnostic is a bit confusing at first glance. For example: class A {
public:
void m_fn1();
};
void fn1() {
A a;
A *b = &a;
for (;;) {
(void)!b;
b->m_fn1();
}
}
The message In such cases, I'd like to make the diagnostic a bit more verbose. I propose something like:
This indicates to the user that The implementation I present in this PR minimally passes all Clang static analysis LIT tests, however, I am not confident the approach I've taken is idiomatic with respect to the design of Clang's static analysis. I have what I think is a slightly more general solution here where I am open to any and all suggestions as my knowledge in the Clang static analyzer is limited. Full diff: https://github.com/llvm/llvm-project/pull/122398.diff 5 Files Affected:
diff --git a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h
index 20c446e33ef9a5..6017b7e0ea8f81 100644
--- a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h
+++ b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h
@@ -290,6 +290,10 @@ class ExprEngine {
/// This tag applies to a node created after removeDead.
static const ProgramPointTag *cleanupNodeTag();
+ /// A tag to track when loop widening occurs.
+ /// This tag applies to a node created after getWidenedLoopState.
+ static const ProgramPointTag *loopWideningNodeTag();
+
/// processCFGElement - Called by CoreEngine. Used to generate new successor
/// nodes by processing the 'effects' of a CFG element.
void processCFGElement(const CFGElement E, ExplodedNode *Pred,
diff --git a/clang/lib/StaticAnalyzer/Core/BugReporterVisitors.cpp b/clang/lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
index a9b4dbb39b5bd6..53494b5cf9cca8 100644
--- a/clang/lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
+++ b/clang/lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
@@ -1353,6 +1353,49 @@ static void showBRDefaultDiagnostics(llvm::raw_svector_ostream &OS,
OS << " to ";
SI.Dest->printPretty(OS);
}
+
+ // If the value was part of a loop widening node and its value was
+ // invalidated (i.e. replaced with a conjured symbol) then let the user know
+ // that it was due to loop widening.
+ if (SI.StoreSite && SI.Dest &&
+ SI.StoreSite->getLocation().getTag() ==
+ ExprEngine::loopWideningNodeTag()) {
+
+ // TODO: Is it always the case that there's only one predecessor?
+ assert(SI.StoreSite->hasSinglePred() &&
+ "more than one predecessor found, this needs to be handled...");
+ if (const ExplodedNode *previous = SI.StoreSite->getFirstPred()) {
+ // Was the associated memory region for this variable changed from
+ // non-Symbolic to Symbolic because of invalidation?
+ // TODO: Better yet, if there was simply a way to know if a given
+ // SVal / MemRegion was invalidated as part of the current state,
+ // then that should be more robust than what's going on here.
+ // Could we somehow make use of "RegionChanges" /
+ // "runCheckersForRegionChanges" and the ExplicitRegions parameter.
+ // Still need to somehow know when a particular Global
+ // Variable is invalidated. I have not found this to be possible with
+ // "RegionChanges" unless I'm missing something...
+ const ProgramState *lastState = previous->getState().get();
+ const SVal &lastSVal = lastState->getSVal(SI.Dest);
+ const SymbolRef valueSymbol = SI.Value.getAsSymbol(true);
+ const SymbolRef lastSymbol = lastSVal.getAsSymbol(true);
+
+ bool isNowSymbolic = false;
+ if (valueSymbol) {
+ if (!lastSymbol) {
+ // Last state did not have a symbol for SI.Value in current state.
+ // Probably (?) invalidated.
+ isNowSymbolic = true;
+ } else {
+ // If the symbol types differ, then there was likely invalidation
+ isNowSymbolic = valueSymbol->getKind() != lastSymbol->getKind();
+ }
+ }
+
+ if (isNowSymbolic)
+ OS << " (invalidation as part of widen-loops made this symbolic here)";
+ }
+ }
}
static bool isTrivialCopyOrMoveCtor(const CXXConstructExpr *CE) {
diff --git a/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp b/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
index ff8bdcea9a2201..4660edbf3dde78 100644
--- a/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
+++ b/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
@@ -1109,6 +1109,11 @@ const ProgramPointTag *ExprEngine::cleanupNodeTag() {
return &cleanupTag;
}
+const ProgramPointTag *ExprEngine::loopWideningNodeTag() {
+ static SimpleProgramPointTag loopWideningTag(TagProviderName, "Widen Loop");
+ return &loopWideningTag;
+}
+
void ExprEngine::ProcessStmt(const Stmt *currStmt, ExplodedNode *Pred) {
// Reclaim any unnecessary nodes in the ExplodedGraph.
G.reclaimRecentlyAllocatedNodes();
@@ -2546,7 +2551,7 @@ void ExprEngine::processCFGBlockEntrance(const BlockEdge &L,
const LocationContext *LCtx = Pred->getLocationContext();
ProgramStateRef WidenedState =
getWidenedLoopState(Pred->getState(), LCtx, BlockCount, Term);
- nodeBuilder.generateNode(WidenedState, Pred);
+ nodeBuilder.generateNode(WidenedState, Pred, loopWideningNodeTag());
return;
}
diff --git a/clang/test/Analysis/loop-widening-notes.cpp b/clang/test/Analysis/loop-widening-notes.cpp
index a3f030dfe98826..ac98d81c23475e 100644
--- a/clang/test/Analysis/loop-widening-notes.cpp
+++ b/clang/test/Analysis/loop-widening-notes.cpp
@@ -9,7 +9,7 @@ int test_for_bug_25609() {
bar();
for (int i = 0; // expected-note {{Loop condition is true. Entering loop body}}
// expected-note@-1 {{Loop condition is false. Execution continues on line 16}}
- ++i, // expected-note {{Value assigned to 'p_a'}}
+ ++i, // expected-note {{Value assigned to 'p_a' (invalidation as part of widen-loops made this symbolic here)}}
i < flag_a;
++i) {}
@@ -23,7 +23,7 @@ int while_analyzer_output() {
flag_b = 100;
int num = 10;
while (flag_b-- > 0) { // expected-note {{Loop condition is true. Entering loop body}}
- // expected-note@-1 {{Value assigned to 'num'}}
+ // expected-note@-1 {{Value assigned to 'num' (invalidation as part of widen-loops made this symbolic here)}}
// expected-note@-2 {{Loop condition is false. Execution continues on line 30}}
num = flag_b;
}
@@ -45,7 +45,7 @@ int do_while_analyzer_output() {
do { // expected-note {{Loop condition is true. Execution continues on line 47}}
// expected-note@-1 {{Loop condition is false. Exiting loop}}
num--;
- } while (flag_c-- > 0); //expected-note {{Value assigned to 'num'}}
+ } while (flag_c-- > 0); //expected-note {{Value assigned to 'num' (invalidation as part of widen-loops made this symbolic here)}}
int local = 0;
if (num == 0) // expected-note {{Assuming 'num' is equal to 0}}
// expected-note@-1 {{Taking true branch}}
@@ -59,7 +59,7 @@ int test_for_loop() {
int num = 10;
for (int i = 0; // expected-note {{Loop condition is true. Entering loop body}}
// expected-note@-1 {{Loop condition is false. Execution continues on line 67}}
- new int(10), // expected-note {{Value assigned to 'num'}}
+ new int(10), // expected-note {{Value assigned to 'num' (invalidation as part of widen-loops made this symbolic here)}}
i < flag_d;
++i) {
++num;
@@ -73,7 +73,7 @@ int test_for_loop() {
int test_for_range_loop() {
int arr[10] = {0};
- for(auto x : arr) { // expected-note {{Assigning value}}
+ for(auto x : arr) { // expected-note {{Assigning value (invalidation as part of widen-loops made this symbolic here)}}
++x;
}
if (arr[0] == 0) // expected-note {{Assuming the condition is true}}
diff --git a/clang/test/Analysis/loop-widening.cpp b/clang/test/Analysis/loop-widening.cpp
index fbcb72dee160ae..0b05267c463fc3 100644
--- a/clang/test/Analysis/loop-widening.cpp
+++ b/clang/test/Analysis/loop-widening.cpp
@@ -16,7 +16,7 @@ void fn1() {
for (;;) { // expected-note{{Loop condition is true. Entering loop body}}
// expected-note@-1{{Loop condition is true. Entering loop body}}
- // expected-note@-2{{Value assigned to 'b'}}
+ // expected-note@-2{{Value assigned to 'b' (invalidation as part of widen-loops made this symbolic here)}}
// no crash during bug report construction
g = !b; // expected-note{{Assuming 'b' is null}}
|
// TODO: Better yet, if there was simply a way to know if a given | ||
// SVal / MemRegion was invalidated as part of the current state, | ||
// then that should be more robust than what's going on here. | ||
// Could we somehow make use of "RegionChanges" / | ||
// "runCheckersForRegionChanges" and the ExplicitRegions parameter. | ||
// Still need to somehow know when a particular Global | ||
// Variable is invalidated. I have not found this to be possible with | ||
// "RegionChanges" unless I'm missing something... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to implement this in main...dgg5503:llvm-project:dgg5503/main/invalidation-diagnostic-b but all MemRegion
s from the ProgramState
and before are tracked. I'm wondering if there's a way to clear InvalidatedMemoryRegionSet
at the start of each ProgramState
, or, if there's a different approach I should be taking...
I agree that it's important to augment the bug reports with information about the source of the symbols. Especially for conjured and derived symbols that are produced widely because of invalidations. Read this post of mine: https://discourse.llvm.org/t/memory-region-invalidation-tracking-improvements/62432 |
Hi @steakhal , I'm happy to hear others share a similar concern! I agree that it would make more sense to generalize this beyond loop widening. However, I'm having a hard time understanding how your proposal would fit in to the augmentation of the assignment message like I do in this PR. The part I'm most confused about is how would Regardless, once I have a better understanding of how your proposal fits in, I'd be happy to give your approach a try with your guidance :). Thank you |
Let me explain. This is nice and all, but there are a couple of problems with it:
What I proposed in the linked RFC was to have SymExpr kind for representing the value of a symbol after some invalidation. It would have everything inside what we might need. The value before and after the invalidation (possibly even the states themselves), the kind of the invalidation, such as loop-widening or anything else. In the visitor, it would simplify your code to: if (const auto *Invalidation = dyn_cast_or_null<InvalidationArtifact>(SI.Value.getAsSymbol())) {
if (isa<LoopWidening>(Invalidation->getCause())) {
OS << " (loop-widening invalidated this symbol here)";
}
} To be transparent, what you do makes sense. Given that this would only affect configs explicitly enabling the experimental loop widening, I'm also okay with some less elegant solutions of course. |
Hi @steakhal . Thank you for the further clarification and example. I now understand how your proposal fits in with what I'm trying to accomplish. I've started to implement the proposal but I have a few questions:
For the time being, I've implemented a rough first-draft approach here: main...dgg5503:llvm-project:dgg5503/main/invalidation-diagnostic-update In my draft, I used your code from main...steakhal:llvm-project:csa-invalidation-artifact as a foundation. I made If you'd prefer, I'd be happy to close this PR and open a new one using the new branch I've linked above. |
Thank you for your time for considering it!
Excellent question! Now, for us this should mean that an This "refinement" may break existing code that currently relied on the fact that invalidation usually placed a Conjured symbol as the new value, but those could be fairly easily identified and should be even easier to fix: just check if the symbol is an InvalidationArtifact, and there you go. So at this point expect a couple of broken tests.
Yes, If something escapes (and needs invalidation), then the best mental model is that a single write operation is simulated, but then what value should be bound there? You guessed it, we conjure a fresh symbol, and bind it. So, for regular primitive objects, such as However, for aggregates (like arrays) and most complicated stuff, we still just conjure a single symbol to the whole object, despite that we usually only operate on a smaller subobject, such as the concrete array element, or a specific field of a struct object. In our new mental model, where we have InvalidationArtifacts, this concept still makes sense. The DerivedSymbol will just wrap an InvalidationArtifact symbol instead of a conjured one. I just wanted to bring awareness to the concept of DerivedSymbols, because back in the day I was really confused by it and it took me a long time to grasp the concept. Now, the final twist to the story is that in the Store we may not materialize conjured symbols, because if you think about it you don't really need to in a lot of cases. I can't exactly recall right now, but as if we had some logic in the past for handling invalidation in the Store that only dropped the bindings of the escaped regions (so that we later don't load an old value), and only rebinds the memory spaces with fresh conjured symbols. I bet you have a lot to unpack now, so let me know if you have any questions.
Thanks for the updates. I'll try to spare some time tomorrow on your code. Thank you for your patience. |
This PR attempts to improve the value assignment diagnostic specifically for the case where
widen-loops
is enabled during Clang static analysis.The motivation behind this diagnostic is the fact that the current diagnostic is a bit confusing at first glance. For example:
The message
loop-widening-notes-best.cpp:8:3: note: Value assigned to 'b'
appearing where it is makes technical sense if you understand whatwiden-loops
does behind the scenes. In short,widen-loops
invalidates nearly all values in the current state before the start of the loop. At least in this example, the variableb
is invalidated at the start of the for loop and, as part of the process of invalidation, is reassigned a conjured symbol henceValue assigned to 'b'
. Without that knowledge, it is not intuitive from the diagnostic why the assignment message appears where it does.In such cases, I'd like to make the diagnostic a bit more verbose. I propose something like:
This indicates to the user that
b
has been invalidated and turned into a symbolic representation because of actions taken bywiden-loops
.The implementation I present in this PR minimally passes all Clang static analysis LIT tests, however, I am not confident the approach I've taken is idiomatic with respect to the design of Clang's static analysis. I have what I think is a slightly more general solution here where
MemRegion
invalidations are tracked for eachProgramState
, but it also has its own pitfalls (see TODO comments in the diff). I'd be curious to know if there's a preference in either approach, or, if a different approach should be taken.I am open to any and all suggestions as my knowledge in the Clang static analyzer is limited.