Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[analyzer] Widen loops: Augment assignment note when values invalidated at the start of a loop #122398

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

dgg5503
Copy link
Contributor

@dgg5503 dgg5503 commented Jan 10, 2025

This PR attempts to improve the value assignment diagnostic specifically for the case where widen-loops is enabled during Clang static analysis.

The motivation behind this diagnostic is the fact that the current diagnostic is a bit confusing at first glance. For example:

class A {
public:
  void m_fn1();
};
void fn1() {
  A a;
  A *b = &a;
  for (;;) {
    (void)!b;
    b->m_fn1();
  }
}
> clang -cc1 -internal-isystem C:\...\lib\clang\include -nostdsysteminc -analyze -analyzer-constraints=range -setup-static-analyzer -analyzer-checker=core -analyzer-max-loop 2 -analyzer-config widen-loops=true -analyzer-output=text loop-widening-notes-best.cpp

loop-widening-notes-best.cpp:10:5: warning: Called C++ object pointer is null [core.CallAndMessage]
   10 |     b->m_fn1();
      |     ^
loop-widening-notes-best.cpp:8:3: note: Loop condition is true.  Entering loop body
    8 |   for (;;) {
      |   ^
loop-widening-notes-best.cpp:8:3: note: Value assigned to 'b'
loop-widening-notes-best.cpp:8:3: note: Loop condition is true.  Entering loop body
loop-widening-notes-best.cpp:9:11: note: Assuming 'b' is null
    9 |     (void)!b;
      |           ^~
loop-widening-notes-best.cpp:10:5: note: Called C++ object pointer is null
   10 |     b->m_fn1();
      |     ^
1 warning generated.

The message loop-widening-notes-best.cpp:8:3: note: Value assigned to 'b' appearing where it is makes technical sense if you understand what widen-loops does behind the scenes. In short, widen-loops invalidates nearly all values in the current state before the start of the loop. At least in this example, the variable b is invalidated at the start of the for loop and, as part of the process of invalidation, is reassigned a conjured symbol hence Value assigned to 'b'. Without that knowledge, it is not intuitive from the diagnostic why the assignment message appears where it does.

In such cases, I'd like to make the diagnostic a bit more verbose. I propose something like:

loop-widening-notes-best.cpp:8:3: note: Value assigned to 'b' (invalidation as part of widen-loops made this symbolic here)

This indicates to the user that b has been invalidated and turned into a symbolic representation because of actions taken by widen-loops.

The implementation I present in this PR minimally passes all Clang static analysis LIT tests, however, I am not confident the approach I've taken is idiomatic with respect to the design of Clang's static analysis. I have what I think is a slightly more general solution here where MemRegion invalidations are tracked for each ProgramState, but it also has its own pitfalls (see TODO comments in the diff). I'd be curious to know if there's a preference in either approach, or, if a different approach should be taken.

I am open to any and all suggestions as my knowledge in the Clang static analyzer is limited.

…ed at the start of a loop

This patch augments the assignment note to inform the user when a value
is invalidated and reassigned a conjured symbol as part of loop
widening.
@dgg5503 dgg5503 requested a review from jcsxky January 10, 2025 01:33
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:static analyzer labels Jan 10, 2025
@llvmbot
Copy link
Member

llvmbot commented Jan 10, 2025

@llvm/pr-subscribers-clang-static-analyzer-1

@llvm/pr-subscribers-clang

Author: Douglas (dgg5503)

Changes

This PR attempts to improve the value assignment diagnostic specifically for the case where widen-loops is enabled during Clang static analysis.

The motivation behind this diagnostic is the fact that the current diagnostic is a bit confusing at first glance. For example:

class A {
public:
  void m_fn1();
};
void fn1() {
  A a;
  A *b = &a;
  for (;;) {
    (void)!b;
    b->m_fn1();
  }
}
> clang -cc1 -internal-isystem C:\...\lib\clang\include -nostdsysteminc -analyze -analyzer-constraints=range -setup-static-analyzer -analyzer-checker=core -analyzer-max-loop 2 -analyzer-config widen-loops=true -analyzer-output=text loop-widening-notes-best.cpp

loop-widening-notes-best.cpp:10:5: warning: Called C++ object pointer is null [core.CallAndMessage]
   10 |     b->m_fn1();
      |     ^
loop-widening-notes-best.cpp:8:3: note: Loop condition is true.  Entering loop body
    8 |   for (;;) {
      |   ^
loop-widening-notes-best.cpp:8:3: note: Value assigned to 'b'
loop-widening-notes-best.cpp:8:3: note: Loop condition is true.  Entering loop body
loop-widening-notes-best.cpp:9:11: note: Assuming 'b' is null
    9 |     (void)!b;
      |           ^~
loop-widening-notes-best.cpp:10:5: note: Called C++ object pointer is null
   10 |     b->m_fn1();
      |     ^
1 warning generated.

The message loop-widening-notes-best.cpp:8:3: note: Value assigned to 'b' appearing where it is makes technical sense if you understand what widen-loops does behind the scenes. In short, widen-loops invalidates nearly all values in the current state before the start of the loop. At least in this example, the variable b is invalidated at the start of the for loop and, as part of the process of invalidation, is reassigned a conjured symbol hence Value assigned to 'b'. Without that knowledge, it is not intuitive from the diagnostic why the assignment message appears where it does.

In such cases, I'd like to make the diagnostic a bit more verbose. I propose something like:

loop-widening-notes-best.cpp:8:3: note: Value assigned to 'b' (invalidation as part of widen-loops made this symbolic here)

This indicates to the user that b has been invalidated and turned into a symbolic representation because of actions taken by widen-loops.

The implementation I present in this PR minimally passes all Clang static analysis LIT tests, however, I am not confident the approach I've taken is idiomatic with respect to the design of Clang's static analysis. I have what I think is a slightly more general solution here where MemRegion invalidations are tracked for each ProgramState, but it also has its own pitfalls (see TODO comments in the diff). I'd be curious to know if there's a preference in either approach, or, if a different approach should be taken.

I am open to any and all suggestions as my knowledge in the Clang static analyzer is limited.


Full diff: https://github.com/llvm/llvm-project/pull/122398.diff

5 Files Affected:

  • (modified) clang/include/clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h (+4)
  • (modified) clang/lib/StaticAnalyzer/Core/BugReporterVisitors.cpp (+43)
  • (modified) clang/lib/StaticAnalyzer/Core/ExprEngine.cpp (+6-1)
  • (modified) clang/test/Analysis/loop-widening-notes.cpp (+5-5)
  • (modified) clang/test/Analysis/loop-widening.cpp (+1-1)
diff --git a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h
index 20c446e33ef9a5..6017b7e0ea8f81 100644
--- a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h
+++ b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h
@@ -290,6 +290,10 @@ class ExprEngine {
   /// This tag applies to a node created after removeDead.
   static const ProgramPointTag *cleanupNodeTag();
 
+  /// A tag to track when loop widening occurs.
+  /// This tag applies to a node created after getWidenedLoopState.
+  static const ProgramPointTag *loopWideningNodeTag();
+
   /// processCFGElement - Called by CoreEngine. Used to generate new successor
   ///  nodes by processing the 'effects' of a CFG element.
   void processCFGElement(const CFGElement E, ExplodedNode *Pred,
diff --git a/clang/lib/StaticAnalyzer/Core/BugReporterVisitors.cpp b/clang/lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
index a9b4dbb39b5bd6..53494b5cf9cca8 100644
--- a/clang/lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
+++ b/clang/lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
@@ -1353,6 +1353,49 @@ static void showBRDefaultDiagnostics(llvm::raw_svector_ostream &OS,
     OS << " to ";
     SI.Dest->printPretty(OS);
   }
+
+  // If the value was part of a loop widening node and its value was
+  // invalidated (i.e. replaced with a conjured symbol) then let the user know
+  // that it was due to loop widening.
+  if (SI.StoreSite && SI.Dest &&
+      SI.StoreSite->getLocation().getTag() ==
+          ExprEngine::loopWideningNodeTag()) {
+
+    // TODO: Is it always the case that there's only one predecessor?
+    assert(SI.StoreSite->hasSinglePred() &&
+           "more than one predecessor found, this needs to be handled...");
+    if (const ExplodedNode *previous = SI.StoreSite->getFirstPred()) {
+      // Was the associated memory region for this variable changed from
+      // non-Symbolic to Symbolic because of invalidation?
+      // TODO: Better yet, if there was simply a way to know if a given
+      // SVal / MemRegion was invalidated as part of the current state,
+      // then that should be more robust than what's going on here.
+      // Could we somehow make use of "RegionChanges" /
+      // "runCheckersForRegionChanges" and the ExplicitRegions parameter.
+      // Still need to somehow know when a particular Global
+      // Variable is invalidated. I have not found this to be possible with
+      // "RegionChanges" unless I'm missing something...
+      const ProgramState *lastState = previous->getState().get();
+      const SVal &lastSVal = lastState->getSVal(SI.Dest);
+      const SymbolRef valueSymbol = SI.Value.getAsSymbol(true);
+      const SymbolRef lastSymbol = lastSVal.getAsSymbol(true);
+
+      bool isNowSymbolic = false;
+      if (valueSymbol) {
+        if (!lastSymbol) {
+          // Last state did not have a symbol for SI.Value in current state.
+          // Probably (?) invalidated.
+          isNowSymbolic = true;
+        } else {
+          // If the symbol types differ, then there was likely invalidation
+          isNowSymbolic = valueSymbol->getKind() != lastSymbol->getKind();
+        }
+      }
+
+      if (isNowSymbolic)
+        OS << " (invalidation as part of widen-loops made this symbolic here)";
+    }
+  }
 }
 
 static bool isTrivialCopyOrMoveCtor(const CXXConstructExpr *CE) {
diff --git a/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp b/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
index ff8bdcea9a2201..4660edbf3dde78 100644
--- a/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
+++ b/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
@@ -1109,6 +1109,11 @@ const ProgramPointTag *ExprEngine::cleanupNodeTag() {
   return &cleanupTag;
 }
 
+const ProgramPointTag *ExprEngine::loopWideningNodeTag() {
+  static SimpleProgramPointTag loopWideningTag(TagProviderName, "Widen Loop");
+  return &loopWideningTag;
+}
+
 void ExprEngine::ProcessStmt(const Stmt *currStmt, ExplodedNode *Pred) {
   // Reclaim any unnecessary nodes in the ExplodedGraph.
   G.reclaimRecentlyAllocatedNodes();
@@ -2546,7 +2551,7 @@ void ExprEngine::processCFGBlockEntrance(const BlockEdge &L,
     const LocationContext *LCtx = Pred->getLocationContext();
     ProgramStateRef WidenedState =
         getWidenedLoopState(Pred->getState(), LCtx, BlockCount, Term);
-    nodeBuilder.generateNode(WidenedState, Pred);
+    nodeBuilder.generateNode(WidenedState, Pred, loopWideningNodeTag());
     return;
   }
 
diff --git a/clang/test/Analysis/loop-widening-notes.cpp b/clang/test/Analysis/loop-widening-notes.cpp
index a3f030dfe98826..ac98d81c23475e 100644
--- a/clang/test/Analysis/loop-widening-notes.cpp
+++ b/clang/test/Analysis/loop-widening-notes.cpp
@@ -9,7 +9,7 @@ int test_for_bug_25609() {
     bar();
   for (int i = 0;  // expected-note {{Loop condition is true.  Entering loop body}}                    
                    // expected-note@-1 {{Loop condition is false. Execution continues on line 16}}
-       ++i,        // expected-note {{Value assigned to 'p_a'}} 
+       ++i,        // expected-note {{Value assigned to 'p_a' (invalidation as part of widen-loops made this symbolic here)}}
        i < flag_a;
        ++i) {}
                                       
@@ -23,7 +23,7 @@ int while_analyzer_output() {
   flag_b = 100;
   int num = 10;
   while (flag_b-- > 0) { // expected-note {{Loop condition is true.  Entering loop body}} 
-                         // expected-note@-1 {{Value assigned to 'num'}} 
+                         // expected-note@-1 {{Value assigned to 'num' (invalidation as part of widen-loops made this symbolic here)}}
                          // expected-note@-2 {{Loop condition is false. Execution continues on line 30}}
     num = flag_b;
   }
@@ -45,7 +45,7 @@ int do_while_analyzer_output() {
   do {   // expected-note {{Loop condition is true. Execution continues on line 47}} 
          // expected-note@-1 {{Loop condition is false.  Exiting loop}}
     num--;
-  } while (flag_c-- > 0); //expected-note {{Value assigned to 'num'}}
+  } while (flag_c-- > 0); //expected-note {{Value assigned to 'num' (invalidation as part of widen-loops made this symbolic here)}}
   int local = 0;
   if (num == 0)       // expected-note {{Assuming 'num' is equal to 0}} 
                       // expected-note@-1 {{Taking true branch}}
@@ -59,7 +59,7 @@ int test_for_loop() {
   int num = 10;
   for (int i = 0;    // expected-note {{Loop condition is true.  Entering loop body}} 
                      // expected-note@-1 {{Loop condition is false. Execution continues on line 67}}
-       new int(10),  // expected-note {{Value assigned to 'num'}}
+       new int(10),  // expected-note {{Value assigned to 'num' (invalidation as part of widen-loops made this symbolic here)}}
        i < flag_d;
        ++i) {         
     ++num;
@@ -73,7 +73,7 @@ int test_for_loop() {
 
 int test_for_range_loop() {
   int arr[10] = {0};
-  for(auto x : arr) { // expected-note {{Assigning value}} 
+  for(auto x : arr) { // expected-note {{Assigning value (invalidation as part of widen-loops made this symbolic here)}}
     ++x;
   }
   if (arr[0] == 0)   // expected-note {{Assuming the condition is true}} 
diff --git a/clang/test/Analysis/loop-widening.cpp b/clang/test/Analysis/loop-widening.cpp
index fbcb72dee160ae..0b05267c463fc3 100644
--- a/clang/test/Analysis/loop-widening.cpp
+++ b/clang/test/Analysis/loop-widening.cpp
@@ -16,7 +16,7 @@ void fn1() {
 
   for (;;) { // expected-note{{Loop condition is true.  Entering loop body}}
              // expected-note@-1{{Loop condition is true.  Entering loop body}}
-             // expected-note@-2{{Value assigned to 'b'}}
+             // expected-note@-2{{Value assigned to 'b' (invalidation as part of widen-loops made this symbolic here)}}
              // no crash during bug report construction
 
     g = !b;     // expected-note{{Assuming 'b' is null}}

Comment on lines +1370 to +1377
// TODO: Better yet, if there was simply a way to know if a given
// SVal / MemRegion was invalidated as part of the current state,
// then that should be more robust than what's going on here.
// Could we somehow make use of "RegionChanges" /
// "runCheckersForRegionChanges" and the ExplicitRegions parameter.
// Still need to somehow know when a particular Global
// Variable is invalidated. I have not found this to be possible with
// "RegionChanges" unless I'm missing something...
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to implement this in main...dgg5503:llvm-project:dgg5503/main/invalidation-diagnostic-b but all MemRegions from the ProgramState and before are tracked. I'm wondering if there's a way to clear InvalidatedMemoryRegionSet at the start of each ProgramState, or, if there's a different approach I should be taking...

@steakhal
Copy link
Contributor

I agree that it's important to augment the bug reports with information about the source of the symbols. Especially for conjured and derived symbols that are produced widely because of invalidations.
Loop widening is just one source of invalidation, and we could generalize it slightly.

Read this post of mine: https://discourse.llvm.org/t/memory-region-invalidation-tracking-improvements/62432
Unfortunately, I never got to implement it, but I'd be very happy as a reviewer to guide you through.

@dgg5503
Copy link
Contributor Author

dgg5503 commented Jan 10, 2025

I agree that it's important to augment the bug reports with information about the source of the symbols. Especially for conjured and derived symbols that are produced widely because of invalidations. Loop widening is just one source of invalidation, and we could generalize it slightly.

Read this post of mine: https://discourse.llvm.org/t/memory-region-invalidation-tracking-improvements/62432 Unfortunately, I never got to implement it, but I'd be very happy as a reviewer to guide you through.

Hi @steakhal ,

I'm happy to hear others share a similar concern!

I agree that it would make more sense to generalize this beyond loop widening. However, I'm having a hard time understanding how your proposal would fit in to the augmentation of the assignment message like I do in this PR.

The part I'm most confused about is how would SymbolInvalidationArtifact be set/retrieved after invalidation occurs? Would this somehow be tied to symbols created as part of invalidation (like created conjured symbols in InvalidateRegionsWorker::VisitCluster) which could then be retrieved via SI.Value.getAsSymbol() and casting (speaking from the PoV of showBRDiagnostics)? I think I would better understand if you could please provide a small example in the context of diagnostic output. I'm fairly certain I'm lacking understanding of some static analysis design concept that's not making this click in my head...

Regardless, once I have a better understanding of how your proposal fits in, I'd be happy to give your approach a try with your guidance :).

Thank you

@steakhal
Copy link
Contributor

I agree that it would make more sense to generalize this beyond loop widening. However, I'm having a hard time understanding how your proposal would fit in to the augmentation of the assignment message like I do in this PR.

The part I'm most confused about is how would SymbolInvalidationArtifact be set/retrieved after invalidation occurs? Would this somehow be tied to symbols created as part of invalidation (like created conjured symbols in InvalidateRegionsWorker::VisitCluster) which could then be retrieved via SI.Value.getAsSymbol() and casting (speaking from the PoV of showBRDiagnostics)? I think I would better understand if you could please provide a small example in the context of diagnostic output. I'm fairly certain I'm lacking understanding of some static analysis design concept that's not making this click in my head...

Regardless, once I have a better understanding of how your proposal fits in, I'd be happy to give your approach a try with your guidance :).

Let me explain.
So far, in the visitor you want to know if the value of something before and after some invalidation event. In your case you are interested in loop-widening invalidations.
So, once you have all these, you formulate a decision if you should add an extra note in the BRVisitor.

This is nice and all, but there are a couple of problems with it:

  • Tightly couples the specific invalidation event (loop widening) because you had to add a special tag for the node. For other kinds of invalidations we may not have a tag, such as manual invalidations using ProgramState::invalidateRegions. This makes piggibacking on an ProgramPoint tag a less elegant solution.
  • You need to do the digging yourself for the values before and after the invalidation.

What I proposed in the linked RFC was to have SymExpr kind for representing the value of a symbol after some invalidation. It would have everything inside what we might need. The value before and after the invalidation (possibly even the states themselves), the kind of the invalidation, such as loop-widening or anything else.

In the visitor, it would simplify your code to:

if (const auto *Invalidation = dyn_cast_or_null<InvalidationArtifact>(SI.Value.getAsSymbol())) {
  if (isa<LoopWidening>(Invalidation->getCause())) {
    OS << " (loop-widening invalidated this symbol here)";
  }
}

To be transparent, what you do makes sense. Given that this would only affect configs explicitly enabling the experimental loop widening, I'm also okay with some less elegant solutions of course.

@dgg5503
Copy link
Contributor Author

dgg5503 commented Jan 14, 2025

Hi @steakhal .

Thank you for the further clarification and example. I now understand how your proposal fits in with what I'm trying to accomplish.

I've started to implement the proposal but I have a few questions:

  • When regions are invalidated via getWidenedLoopState (see the call to PrevState->invalidateRegions), further down the call stack, in InvalidateRegionsWorker::VisitCluster, regions are bound to a conjured symbol constructed via svalBuilder.conjureSymbolVal. Is InvalidationArtifact supposed to wrap the result of svalBuilder.conjureSymbolVal, or is it supposed to be a replacement for conjured symbols in the context of invalidation?
  • If InvalidationArtifact is a replacement for conjured symbols, is it always the case that invalidation results in conjured symbols? From my limited exploration, that seems like the case but I just want to confirm. If so, would it make more sense to have InvalidationArtifact be an extension of SymbolConjured with the addition of InvalidationCause?

For the time being, I've implemented a rough first-draft approach here: main...dgg5503:llvm-project:dgg5503/main/invalidation-diagnostic-update

In my draft, I used your code from main...steakhal:llvm-project:csa-invalidation-artifact as a foundation. I made SymbolInvalidationArtifact basically a copy/paste of SymbolConjured with the addition of InvalidationCause since it seems like conjured symbols are almost always the result of invalidation. Please note there is still much work to be done (i.e. allocating InvalidationCause properly, implementing unimplemented functions, getting all LIT tests passing, etc). I just want to make sure I'm on the right path before I invest more time. Please let me know what you think.

If you'd prefer, I'd be happy to close this PR and open a new one using the new branch I've linked above.

@steakhal
Copy link
Contributor

Hi @steakhal .

Thank you for the further clarification and example. I now understand how your proposal fits in with what I'm trying to accomplish.

Thank you for your time for considering it!

I've started to implement the proposal but I have a few questions:

  • When regions are invalidated via getWidenedLoopState (see the call to PrevState->invalidateRegions), further down the call stack, in InvalidateRegionsWorker::VisitCluster, regions are bound to a conjured symbol constructed via svalBuilder.conjureSymbolVal. Is InvalidationArtifact supposed to wrap the result of svalBuilder.conjureSymbolVal, or is it supposed to be a replacement for conjured symbols in the context of invalidation?

Excellent question!
If you look at the class hierarchy of SymExprs, then you will see the SymbolData class. Any subclasses of it are equal in the sense that the actual subtype usually doesn't matter.
For instance, as far a checker is concerned, (usually) it shouldn't differentiate and handle differently Conjured symbols and MetadataSymbols, or any other symbols. The important part is that it's a SymbolData.

Now, for us this should mean that an InvalidationArtifact is just another subtype of SymbolData, and where in the past (for the lack of it) we used Conjured symbols for representing the value of a region "after" an invalidation, we would instead create an "InvalidationArtifact". While in the past, one could not conclude why we had a Conjured symbol for something, now we could tell if it's representing the value after some invalidation event.

This "refinement" may break existing code that currently relied on the fact that invalidation usually placed a Conjured symbol as the new value, but those could be fairly easily identified and should be even easier to fix: just check if the symbol is an InvalidationArtifact, and there you go. So at this point expect a couple of broken tests.

  • If InvalidationArtifact is a replacement for conjured symbols, is it always the case that invalidation results in conjured symbols? From my limited exploration, that seems like the case but I just want to confirm. If so, would it make more sense to have InvalidationArtifact be an extension of SymbolConjured with the addition of InvalidationCause?

Yes, InvalidationArtifact is a replacement (or a better word would be "refinement") of conjured symbols iff those were conjured due to some invalidation event. Now, do we always have a conjured symbol for invalidation? Yes and no.
Let's first have a look at what a conjured symbol means. They have a type, let's say int. A conjured symbol of such means an arbitrary value within that domain. Even if we don't know anything concrete about such a symbol, we know that they always compare equal against itself. It's like in mathematics. This is not only true for conjured symbols, but to anything that is a SymbolData. (There is one little caviat to the type of a conjured symbol :D They frequently just left as int because IDK, they didn't pass the right type all the way, so don't be confused if you see int for the type for a conjured symbol that represents the value of IDK, a char for instance; funny right?)

If something escapes (and needs invalidation), then the best mental model is that a single write operation is simulated, but then what value should be bound there? You guessed it, we conjure a fresh symbol, and bind it. So, for regular primitive objects, such as int, we will end up with a conjured symbol of int.

However, for aggregates (like arrays) and most complicated stuff, we still just conjure a single symbol to the whole object, despite that we usually only operate on a smaller subobject, such as the concrete array element, or a specific field of a struct object.
This is where Derived symbols come into play. (an other sign of that some invalidation happened, besides conjured symbols).
Derived symbols represent a "slice" of another symbol (usually a conjured symbol) to represent the value of a small portion of the original symbol.
There are completely legit, unrelated reasons for having a conjured symbol representing a whole aggregate. For instance the by-value return value of some opaque function call (calling a function like struct ZZZ getObj() would return a conjured symbol for the whole struct ZZZ object, thus every time you would access a field of that zzz.field, a Derived symbol would "slice" the field portion of the whole object "zzz" represented by that conjured symbol). So the presence of derived symbols is not an indicator of some invalidation, just like it wasn't for conjured symbols.

In our new mental model, where we have InvalidationArtifacts, this concept still makes sense. The DerivedSymbol will just wrap an InvalidationArtifact symbol instead of a conjured one. I just wanted to bring awareness to the concept of DerivedSymbols, because back in the day I was really confused by it and it took me a long time to grasp the concept.

Now, the final twist to the story is that in the Store we may not materialize conjured symbols, because if you think about it you don't really need to in a lot of cases.
For example, if you load from a memregion, and you know it's a valid load (so there is a living object behind it), what other value could you load than "some" value of that type? This is exactly what a conjured symbol represents.

I can't exactly recall right now, but as if we had some logic in the past for handling invalidation in the Store that only dropped the bindings of the escaped regions (so that we later don't load an old value), and only rebinds the memory spaces with fresh conjured symbols.

I bet you have a lot to unpack now, so let me know if you have any questions.
I'd rather keep this interactive as I believe that is the most effective way to unblock newcomers.
And after all, you do what I always wanted to see getting implemented, and I'm grateful to see that.

For the time being, I've implemented a rough first-draft approach here: main...dgg5503:llvm-project:dgg5503/main/invalidation-diagnostic-update

In my draft, I used your code from main...steakhal:llvm-project:csa-invalidation-artifact as a foundation. I made SymbolInvalidationArtifact basically a copy/paste of SymbolConjured with the addition of InvalidationCause since it seems like conjured symbols are almost always the result of invalidation. Please note there is still much work to be done (i.e. allocating InvalidationCause properly, implementing unimplemented functions, getting all LIT tests passing, etc). I just want to make sure I'm on the right path before I invest more time. Please let me know what you think.

If you'd prefer, I'd be happy to close this PR and open a new one using the new branch I've linked above.

Thanks for the updates. I'll try to spare some time tomorrow on your code.
But please note that last week we had a week long company event, after which I of course got sick, so I still have a hard time to concentrate, but I really wanted to come back to the pending PRs to unblock you and everyone else waiting on me.

Thank you for your patience.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:static analyzer clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants