Several performance optimizations, primarily aimed at reducing garbag…

…e object creation in common cases (#258) This PR bundles together several small optimizations, most aimed at reducing garbage object creation. Collectively, these changes result in a large performance improvement for some of our largest jsonnet inputs. ## Optimizations These are somewhat coherently split into multiple smaller intermediate commits, which can help for navigating this change. I'll describe each optimization in more detail below. ### Optimizing `Obj` key lookup methods for objects without `super` The `hasKeys`, `containsKey`, `containsVisibleKey`, `allKeyNames`, and `visibleKeyNames` methods can be optimized in the common case of objects that don't have `super` objects. We already perform an optimization for `static` objects, pre-populating `allKeys`, but for non-static objects we had to run `gatherAllKeys` to populate a `LinkedHashMap` of keys and a boolean indicating visibility. If a non-static object has no `super` then we can just use `value0` to compute the keys: this avoids an additional `LinkedHashMap` allocation and also lets us pre-size the resulting string arrays, avoiding wasteful array copies from resizes or trims. In `visibleKeyNames`, I chose to pre-size the output builder based on the _total_ key count: this is based a common-case assumption that most objects don't have hidden keys. This optimization makes a huge difference in `std.foldl(std.megePatch, listOfPatches, {})`, where the intermediate merge targets' visible are repeatedly recomputed. In these cases, the intermediate objects contain _only_ visible keys, allowing this optimization to maximally avoid unnecessary array allocations. ### Pre-size various hash maps This builds on an idea from #197 : there are multiple places where we construct hashmaps that are either over- or under-sized: an over-sized map wastes space (I saw >90% of backing array slots wasted in some heap dumps) and an under-sized map wastes time and space in re-sizing upwards during construction. Here, I've generalized that PR's pre-sizing to apply in more contexts. One notable special case is the `valueCache`: if an object inherits fields then it's not free to determine the map size. As a result, I've only special-sized this for `super`-free objects. This map is a little bit different from `value0` or `allFields` because its final size is a function of whether or not object field values are actually computed. Given this, I've chosen to start pretty conservatively by avoiding changing the size in cases where it's not an obvious win; I may revisit this further in a future followup. ### Change `valueCache` from a Scala map to a Java map This was originally necessary because the Scala 2.12 version of `mutable.HashMap` does not support capacity / load factor configuration, which got in the way with the pre-sizing work described above. But a nice secondary benefit is that Java maps let me avoid closure / anonfun allocation in `map.getOrElse(k, default)` calls: even if we don't invoke `default`, we still end up doing some allocations for the lambda / closure / thunk. I had noticed this overhead previously in `Obj.value` and this optimization should fix it. ### Remove most Scala sugar in `std.mergePatch`, plus other optimizations The `recMerge` and `recSingle` methods used by `std.mergePatch` contained big Scala `for` comprehensions and used `Option` for handling nulls. This improves readability but comes at a surprising performance cost. I would have naively assumed that most of those overheads would have been optimized out but empirically this was not the case in my benchmarks. Here, I've rewritten this with Java-style imperative `while` loops and explicit null checks. ### Optimize `std.mergePatch`'s distinct key computation After fixing other bottlenecks, I noticed that the ```scala val keys: Array[String] = (l.visibleKeyNames ++ r.visibleKeyNames).distinct ``` step in `std.mergePatch` was very expensive. Under the hood, this constructs a combined array, allocates an ArrayBuilder, and uses an intermediate HashSet for detecting already-seen keys. Here, I've added an optimized fast implementation for the cases where `r.visibleKeyNames.length < 8`. I think it's much more common for the LHS of a merge to be large and the RHS to be small, in which case we're conceptually better off by building a hash set on the RHS and removing RHS elements as they're seen during the LHS traversal. But if the RHS is small enough then the cost of hashing and probing will be higher than a simple linear scan of a small RHS array. Here, `8` is a somewhat arbitrarily chosen threshold based on some local microbenchmarking. ### Special overload of `Val.Obj.mk` to skip an array copy Pretty self-explanatory: we often have an `Array[(String, Val.Member)]` and we can avoid a copy by defining a `Val.Obj.mk` overload which accepts the array directly. ### Make `PrettyNamed` implicits into constants This is pretty straightforward, just changing a `def` to a `val`, but it makes a huge different in reducing ephemeral garabge in some parts of the evaluator. ## Other changes I also added `Error.fail` calls in a couple of `case _ =>` matches which should never be hit. We weren't actually hitting these, but it felt like a potentially dangerous pitfall to silently ignore those cases.
databricks · Jan 10, 2025 · 861be49 · 861be49
1 parent 9143c58
commit 861be49
Show file tree

Hide file tree

Showing 6 changed files with 248 additions and 66 deletions.
diff --git a/sjsonnet/src/sjsonnet/Evaluator.scala b/sjsonnet/src/sjsonnet/Evaluator.scala
@@ -579,7 +579,7 @@ class Evaluator(resolver: CachedResolver,
       newScope
     }
 
-    val builder = new java.util.LinkedHashMap[String, Val.Obj.Member]
+    val builder = Util.preSizedJavaLinkedHashMap[String, Val.Obj.Member](fields.length)
     fields.foreach {
       case Member.Field(offset, fieldName, plus, null, sep, rhs) =>
         val k = visitFieldName(fieldName, offset)
@@ -604,8 +604,14 @@ class Evaluator(resolver: CachedResolver,
           builder.put(k, v)
         }
       case _ =>
+        Error.fail("This case should never be hit", objPos)
     }
-    cachedObj = new Val.Obj(objPos, builder, false, if(asserts != null) assertions else null, sup)
+    val valueCache = if (sup == null) {
+      Val.Obj.getEmptyValueCacheForObjWithoutSuper(fields.length)
+    } else {
+      new java.util.HashMap[Any, Val]()
+    }
+    cachedObj = new Val.Obj(objPos, builder, false, if(asserts != null) assertions else null, sup, valueCache)
     cachedObj
   }
 
@@ -636,7 +642,12 @@ class Evaluator(resolver: CachedResolver,
           case _ =>
         }
       }
-      new Val.Obj(e.pos, builder, false, null, sup)
+      val valueCache = if (sup == null) {
+        Val.Obj.getEmptyValueCacheForObjWithoutSuper(builder.size())
+      } else {
+        new java.util.HashMap[Any, Val]()
+      }
+      new Val.Obj(e.pos, builder, false, null, sup, valueCache)
     }
 
     newSelf

diff --git a/sjsonnet/src/sjsonnet/Expr.scala b/sjsonnet/src/sjsonnet/Expr.scala
@@ -58,7 +58,7 @@ object Expr{
     }
     case class Field(pos: Position,
                      fieldName: FieldName,
-                     plus: Boolean,
+                     plus: Boolean, // see https://jsonnet.org/ref/language.html#nested-field-inheritance
                      args: Params,
                      sep: Visibility,
                      rhs: Expr) extends Member {
@@ -175,7 +175,7 @@ object Expr{
                        preLocals: Array[Bind],
                        key: Expr,
                        value: Expr,
-                       plus: Boolean,
+                       plus: Boolean, // see https://jsonnet.org/ref/language.html#nested-field-inheritance
                        postLocals: Array[Bind],
                        first: ForSpec,
                        rest: List[CompSpec]) extends ObjBody {

diff --git a/sjsonnet/src/sjsonnet/Std.scala b/sjsonnet/src/sjsonnet/Std.scala
@@ -381,7 +381,7 @@ class Std(private val additionalNativeFunctions: Map[String, Val.Builtin] = Map.
       val func = _func.asFunc
       val obj = _obj.asObj
       val allKeys = obj.allKeyNames
-      val m = new util.LinkedHashMap[String, Val.Obj.Member]()
+      val m = Util.preSizedJavaLinkedHashMap[String, Val.Obj.Member](allKeys.length)
       var i = 0
       while(i < allKeys.length) {
         val k = allKeys(i)
@@ -392,7 +392,8 @@ class Std(private val additionalNativeFunctions: Map[String, Val.Builtin] = Map.
         m.put(k, v)
         i += 1
       }
-      new Val.Obj(pos, m, false, null, null)
+      val valueCache = Val.Obj.getEmptyValueCacheForObjWithoutSuper(allKeys.length)
+      new Val.Obj(pos, m, false, null, null, valueCache)
     }
   }
 
@@ -909,39 +910,110 @@ class Std(private val additionalNativeFunctions: Map[String, Val.Builtin] = Map.
     builtin(Range),
     builtin("mergePatch", "target", "patch"){ (pos, ev, target: Val, patch: Val) =>
       val mergePosition = pos
-      def createMember(v: => Val) = new Val.Obj.Member(false, Visibility.Normal) {
+      def createLazyMember(v: => Val) = new Val.Obj.Member(false, Visibility.Normal) {
         def invoke(self: Val.Obj, sup: Val.Obj, fs: FileScope, ev: EvalScope): Val = v
       }
       def recPair(l: Val, r: Val): Val = (l, r) match{
         case (l: Val.Obj, r: Val.Obj) =>
-          val kvs = for {
-            k <- (l.visibleKeyNames ++ r.visibleKeyNames).distinct
-            lValue = if (l.containsVisibleKey(k)) Option(l.valueRaw(k, l, pos)(ev)) else None
-            rValue = if (r.containsVisibleKey(k)) Option(r.valueRaw(k, r, pos)(ev)) else None
-            if !rValue.exists(_.isInstanceOf[Val.Null])
-          } yield (lValue, rValue) match{
-            case (Some(lChild), None) => k -> createMember{lChild}
-            case (Some(lChild: Val.Obj), Some(rChild: Val.Obj)) => k -> createMember{recPair(lChild, rChild)}
-            case (_, Some(rChild)) => k -> createMember{recSingle(rChild)}
-            case (None, None) => Error.fail("std.mergePatch: This should never happen")
+          val keys: Array[String] = distinctKeys(l.visibleKeyNames, r.visibleKeyNames)
+          val kvs: Array[(String, Val.Obj.Member)] = new Array[(String, Val.Obj.Member)](keys.length)
+          var kvsIdx = 0
+          var i = 0
+          while (i < keys.length) {
+            val key = keys(i)
+            val lValue = if (l.containsVisibleKey(key)) l.valueRaw(key, l, pos)(ev) else null
+            val rValue = if (r.containsVisibleKey(key)) r.valueRaw(key, r, pos)(ev) else null
+            if (!rValue.isInstanceOf[Val.Null]) { // if we are not removing the key
+              if (lValue != null && rValue == null) {
+                // Preserve the LHS/target value:
+                kvs(kvsIdx) = (key, new Val.Obj.ConstMember(false, Visibility.Normal, lValue))
+              } else if (lValue.isInstanceOf[Val.Obj] && rValue.isInstanceOf[Val.Obj]) {
+                // Recursively merge objects:
+                kvs(kvsIdx) = (key, createLazyMember(recPair(lValue, rValue)))
+              } else if (rValue != null) {
+                // Use the RHS/patch value and recursively remove Null or hidden fields:
+                kvs(kvsIdx) = (key, createLazyMember(recSingle(rValue)))
+              } else {
+                Error.fail("std.mergePatch: This should never happen")
+              }
+              kvsIdx += 1
+            }
+            i += 1
           }
 
-          Val.Obj.mk(mergePosition, kvs:_*)
+          val trimmedKvs = if (kvsIdx == i) kvs else kvs.slice(0, kvsIdx)
+          Val.Obj.mk(mergePosition, trimmedKvs)
 
         case (_, _) => recSingle(r)
       }
       def recSingle(v: Val): Val  = v match{
         case obj: Val.Obj =>
-          val kvs = for{
-            k <- obj.visibleKeyNames
-            value = obj.value(k, pos, obj)(ev)
-            if !value.isInstanceOf[Val.Null]
-          } yield (k, createMember{recSingle(value)})
-
-          Val.Obj.mk(obj.pos, kvs:_*)
+          val keys: Array[String] = obj.visibleKeyNames
+          val kvs: Array[(String, Val.Obj.Member)] = new Array[(String, Val.Obj.Member)](keys.length)
+          var kvsIdx = 0
+          var i = 0
+          while (i < keys.length) {
+            val key = keys(i)
+            val value = obj.value(key, pos, obj)(ev)
+            if (!value.isInstanceOf[Val.Null]) {
+              kvs(kvsIdx) = (key, createLazyMember(recSingle(value)))
+              kvsIdx += 1
+            }
+            i += 1
+          }
+          val trimmedKvs = if (kvsIdx == i) kvs else kvs.slice(0, kvsIdx)
+          Val.Obj.mk(obj.pos, trimmedKvs)
 
         case _ => v
       }
+      def distinctKeys(lKeys: Array[String], rKeys: Array[String]): Array[String] = {
+        // Fast path for small RHS size (the common case when merging a small
+        // patch into a large target object), avoiding the cost of constructing
+        // and probing a hash set: instead, perform a nested loop where the LHS
+        // is scanned and matching RHS entries are marked as null to be skipped.
+        // Via local microbenchmarks simulating a "worst-case" (RHS keys all new),
+        // the threshold of `8` was empirically determined to be a good tradeoff
+        // between allocation + hashing costs vs. nested loop array scans.
+        if (rKeys.length <= 8) {
+          val rKeysCopy = new Array[String](rKeys.length)
+          rKeys.copyToArray(rKeysCopy)
+          var i = 0
+          var numNewRKeys = rKeysCopy.length
+          while (i < lKeys.length) {
+            val lKey = lKeys(i)
+            var j = 0
+            while (j < rKeysCopy.length) {
+              // This LHS key is in the RHS, so mark it to be skipped in output:
+              if (lKey == rKeysCopy(j)) {
+                rKeysCopy(j) = null
+                numNewRKeys -= 1
+              }
+              j += 1
+            }
+            i += 1
+          }
+          // Combine lKeys with non-null elements of rKeysCopy:
+          if (numNewRKeys == 0) {
+            lKeys
+          } else {
+            val outArray = new Array[String](lKeys.length + numNewRKeys)
+            System.arraycopy(lKeys, 0, outArray, 0, lKeys.length)
+            var outIdx = lKeys.length
+            var j = 0
+            while (j < rKeysCopy.length) {
+              if (rKeysCopy(j) != null) {
+                outArray(outIdx) = rKeysCopy(j)
+                outIdx += 1
+              }
+              j += 1
+            }
+            outArray
+          }
+        } else {
+          // Fallback: Use hash-based deduplication for large RHS arrays:
+          (lKeys ++ rKeys).distinct
+        }
+      }
       recPair(target, patch)
     },
     builtin("sqrt", "x"){ (pos, ev, x: Double) =>
@@ -1404,12 +1476,12 @@ class Std(private val additionalNativeFunctions: Map[String, Val.Builtin] = Map.
       }
       def rec(x: Val): Val = x match{
         case o: Val.Obj =>
-          val bindings = for{
+          val bindings: Array[(String, Val.Obj.Member)] = for{
             k <- o.visibleKeyNames
             v = rec(o.value(k, pos.fileScope.noOffsetPos)(ev))
             if filter(v)
           }yield (k, new Val.Obj.ConstMember(false, Visibility.Normal, v))
-          Val.Obj.mk(pos, bindings: _*)
+          Val.Obj.mk(pos, bindings)
         case a: Val.Arr =>
           new Val.Arr(pos, a.asStrictArray.map(rec).filter(filter).map(identity))
         case _ => x
@@ -1500,12 +1572,12 @@ class Std(private val additionalNativeFunctions: Map[String, Val.Builtin] = Map.
       )))
     },
     builtin("objectRemoveKey", "obj", "key") { (pos, ev, o: Val.Obj, key: String) =>
-      val bindings = for{
+      val bindings: Array[(String, Val.Obj.Member)] = for{
         k <- o.visibleKeyNames
         v = o.value(k, pos.fileScope.noOffsetPos)(ev)
         if k != key
       }yield (k, new Val.Obj.ConstMember(false, Visibility.Normal, v))
-      Val.Obj.mk(pos, bindings: _*)
+      Val.Obj.mk(pos, bindings)
     },
     builtin(MinArray),
     builtin(MaxArray),

diff --git a/sjsonnet/src/sjsonnet/Util.scala b/sjsonnet/src/sjsonnet/Util.scala
@@ -52,4 +52,19 @@ object Util{
       s"<$s>"
     }
   }
-}
+
+  def preSizedJavaLinkedHashMap[K, V](expectedElems: Int): java.util.LinkedHashMap[K, V] = {
+    // Set the initial capacity to the number of elems divided by the default load factor + 1
+    // this ensures that we can fill up the map to the total number of fields without resizing.
+    // From JavaDoc - true for both Scala & Java HashMaps
+    val hashMapDefaultLoadFactor = 0.75f
+    val capacity = (expectedElems / hashMapDefaultLoadFactor).toInt + 1
+    new java.util.LinkedHashMap[K, V](capacity, hashMapDefaultLoadFactor)
+  }
+
+  def preSizedJavaHashMap[K, V](expectedElems: Int): java.util.HashMap[K, V] = {
+    val hashMapDefaultLoadFactor = 0.75f
+    val capacity = (expectedElems / hashMapDefaultLoadFactor).toInt + 1
+    new java.util.HashMap[K, V](capacity, hashMapDefaultLoadFactor)
+  }
+}