In-memory hashsort for commitment keys #13521

antonis19 · 2025-01-21T18:39:12Z

Currently both modes ModeDirect and ModeUpdate use ETL for sorting plainkeys by hash as a preprocessing step before processing commitment updates. This PR introduces a new mode : ModeInMemory which sorts plainkeys by their hash in-memory using slices.SortFunc .

Prior to sorting, the keys are hashed concurrently by a number of worker goroutines. With the concurrent hashing the benchmark tests yield neck-to-neck performance results compared to ModeDirect, with ModeInMemory sometimes faster.

awskii · 2025-01-22T12:09:35Z

erigon-lib/commitment/commitment.go

@@ -1048,6 +1061,10 @@ func (t *Updates) TouchPlainKey(key string, val []byte, fn func(c *KeyUpdate, va
 			}
 			t.keys[key] = struct{}{}
 		}
+	case ModeInMemory:
+		if _, ok := t.keys[key]; !ok {
+			t.keys[key] = struct{}{}


what if insert hashed key to list now and sort in bg?

What is bg?

If you hash the key now you will lose the benefits of parallelization, since each insert happens serially.

time span from TouchPlainKey till HashSort called is quite big and better to use it for hashing so when we do HashSort - all our keys are already hashed and sorted , this is behavior with ETL

bg=background

at least put it into some hashing queue where workers do pick and then just use sort.Sort over finished list of pairs <hashed_key, plain_key>

erigon-lib/commitment/commitment.go

awskii · 2025-01-22T12:16:55Z

erigon-lib/commitment/commitment.go

+		keysWithHash := hashKeysConcurrently(keys, t.hasher) // hash plainkeys concurrently
+
+		// sort keys by lexicographycally by hash
+		slices.SortFunc(keysWithHash, func(a, b *keyWithHash) int {


i'd suppose sort [start,end] after range is hashed, so this sort will be similar to quicksort last touch - sort partially ordered set.

IMO better hash once we see key and sort list after insertion in background. That way amortized sort time will be n log(n) but spent during collection, not actually during HashSort call made by Process or some other caller. I mean with etl they are sorted in bg actually before Load is called.

so if sort made on insert, we just iterate through already sorted list.

i'd suppose sort [start,end] after range is hashed, so this sort will be similar to quicksort last touch - sort partially ordered set.

Can you elaborate on the algorithm? So let's say each worker takes a chunk and hashes the plain keys, and finally sorts the chunk by hashed key. Now we have n sorted chunks. Then I suppose we need to combine these chunks?

"quicksort last touch" and "combine these chunks" - is the same thing.
FYI: etl does same: it using heap to merge sorted files:

erigon/erigon-lib/etl/collector.go

Line 298 in 586af97

func mergeSortFiles(logPrefix string, providers []dataProvider, loadFunc simpleLoadFunc, args TransformArgs, buf Buffer) (err error) {

Well, there is no point in re-implementing ETL then. In this PR I'm just distributing the hashing of the keys, and calling the library function slices.SortFunc. The results I got slightly improve over the results with ETL, so I don't consider it necessary to implement this feature (at least not for now).

the point was that you are splitting whole plain key list into Chunks. Then you hash each pk in chunk, my addition was to sort actually this chunk at this moment, then you will get list of Chunks and each Chunk is an ordered list of hashed keys inside. Then just have to sort everything and combine Chunks into one list which we will iterate over in HashSort call

antonis19 added 2 commits January 21, 2025 19:13

introduce ModeInMemory

b8ef955

adjust numworkers based on available CPUs

3b2e3a0

antonis19 changed the title ~~In-memory hashsort for commitment~~ In-memory hashsort for commitment keys Jan 21, 2025

antonis19 requested review from awskii and mh0lt January 21, 2025 18:39

fix linting

e2758e4

awskii requested changes Jan 22, 2025

View reviewed changes

update chunk size calculation

25416e2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

In-memory hashsort for commitment keys #13521

In-memory hashsort for commitment keys #13521

antonis19 commented Jan 21, 2025

awskii Jan 22, 2025

antonis19 Jan 22, 2025

antonis19 Jan 22, 2025

awskii Jan 24, 2025

awskii Jan 24, 2025

awskii Jan 24, 2025

awskii Jan 22, 2025

awskii Jan 22, 2025

antonis19 Jan 22, 2025

AskAlexSharov Jan 23, 2025

antonis19 Jan 24, 2025

awskii Jan 24, 2025

In-memory hashsort for commitment keys #13521

Are you sure you want to change the base?

In-memory hashsort for commitment keys #13521

Conversation

antonis19 commented Jan 21, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment