ethereum · jtraglia · Jan 9, 2025 · Dec 12, 2024 · Dec 17, 2024 · Dec 17, 2024
diff --git a/.circleci/config.yml b/.circleci/config.yml
@@ -35,13 +35,13 @@ commands:
     description: "Restore the cache with pyspec keys"
     steps:
       - restore_cached_venv:
-          venv_name: v30-pyspec
+          venv_name: v32-pyspec
           reqs_checksum: cache-{{ checksum "setup.py" }}-{{ checksum "requirements_preinstallation.txt" }}
   save_pyspec_cached_venv:
     description: Save a venv into a cache with pyspec keys"
     steps:
       - save_cached_venv:
-          venv_name: v30-pyspec
+          venv_name: v32-pyspec
           reqs_checksum: cache-{{ checksum "setup.py" }}-{{ checksum "requirements_preinstallation.txt" }}
           venv_path: ./venv
 jobs:

diff --git a/specs/phase0/p2p-interface.md b/specs/phase0/p2p-interface.md
@@ -16,6 +16,9 @@
   - [Constants](#constants)
   - [Configuration](#configuration)
   - [MetaData](#metadata)
+  - [Maximum message sizes](#maximum-message-sizes)
+    - [`max_compressed_len`](#max_compressed_len)
+    - [`max_message_size`](#max_message_size)
   - [The gossip domain: gossipsub](#the-gossip-domain-gossipsub)
     - [Topics and messages](#topics-and-messages)
       - [Global topics](#global-topics)
@@ -28,6 +31,7 @@
         - [`beacon_attestation_{subnet_id}`](#beacon_attestation_subnet_id)
       - [Attestations and Aggregation](#attestations-and-aggregation)
     - [Encodings](#encodings)
+    - [Gossipsub size limits](#gossipsub-size-limits)
   - [The Req/Resp domain](#the-reqresp-domain)
     - [Protocol identification](#protocol-identification)
     - [Req/Resp interaction](#reqresp-interaction)
@@ -102,6 +106,8 @@
     - [Why are we using Snappy for compression?](#why-are-we-using-snappy-for-compression)
     - [Can I get access to unencrypted bytes on the wire for debugging purposes?](#can-i-get-access-to-unencrypted-bytes-on-the-wire-for-debugging-purposes)
     - [What are SSZ type size bounds?](#what-are-ssz-type-size-bounds)
+    - [Why is the message size defined in terms of application payload?](#why-is-the-message-size-defined-in-terms-of-application-payload)
+    - [Why is there a limit on message sizes at all?](#why-is-there-a-limit-on-message-sizes-at-all)
 - [libp2p implementations matrix](#libp2p-implementations-matrix)
 
 <!-- END doctoc generated TOC please keep comment here to allow auto update -->
@@ -193,11 +199,10 @@ This section outlines configurations that are used in this spec.
 
 | Name | Value | Description |
 |---|---|---|
-| `GOSSIP_MAX_SIZE` | `10 * 2**20` (= 10485760, 10 MiB) | The maximum allowed size of uncompressed gossip messages. |
+| `MAX_PAYLOAD_SIZE` | `10 * 2**20` (= 10485760, 10 MiB) | The maximum allowed size of uncompressed payload in gossipsub messages / RPC chunks. |
 | `MAX_REQUEST_BLOCKS` | `2**10` (= 1024) | Maximum number of blocks in a single request |
 | `EPOCHS_PER_SUBNET_SUBSCRIPTION` | `2**8` (= 256) | Number of epochs on a subnet subscription (~27 hours) |
 | `MIN_EPOCHS_FOR_BLOCK_REQUESTS` | `MIN_VALIDATOR_WITHDRAWABILITY_DELAY + CHURN_LIMIT_QUOTIENT // 2` (= 33024, ~5 months) | The minimum epoch range over which a node must serve blocks |
-| `MAX_CHUNK_SIZE` | `10 * 2**20` (=10485760, 10 MiB) | The maximum allowed size of uncompressed req/resp chunked responses. |
 | `ATTESTATION_PROPAGATION_SLOT_RANGE` | `32` | The maximum number of slots during which an attestation can be propagated. |
 | `MAXIMUM_GOSSIP_CLOCK_DISPARITY` | `500` | The maximum **milliseconds** of clock disparity assumed between honest nodes. |
 | `MESSAGE_DOMAIN_INVALID_SNAPPY` | `DomainType('0x00000000')` | 4-byte domain for gossip message-id isolation of *invalid* snappy messages |
@@ -229,6 +234,27 @@ Where
 is entirely independent of the ENR sequence number,
 and will in most cases be out of sync with the ENR sequence number.
 
+### Maximum message sizes
+
+Maximum message sizes are derived from the maximum payload size that the network can carry according to the following functions:
+
+#### `max_compressed_len`
+
+```python
+def max_compressed_len(n: uint64) -> uint64:
+    # Worst-case compressed length for a given payload of size n when using snappy
+    # https://github.com/google/snappy/blob/32ded457c0b1fe78ceb8397632c416568d6714a0/snappy.cc#L218C1-L218C47
+    return uint64(32 + n + n / 6)
+```
+
+#### `max_message_size`
+
+```python
+def max_message_size() -> uint64:
+    # Allow 1024 bytes for framing and encoding overhead but at least 1MiB in case MAX_PAYLOAD_SIZE is small.
+    return max(max_compressed_len(MAX_PAYLOAD_SIZE) + 1024, 1024 * 1024)
+```
+
 ### The gossip domain: gossipsub
 
 Clients MUST support the [gossipsub v1](https://github.com/libp2p/specs/blob/master/pubsub/gossipsub/gossipsub-v1.0.md) libp2p Protocol
@@ -268,13 +294,11 @@ This defines both the type of data being sent on the topic and how the data fiel
 - `Encoding` - the encoding strategy describes a specific representation of bytes that will be transmitted over the wire.
   See the [Encodings](#Encodings) section for further details.
 
+Clients MUST reject messages with unknown topic.
+
 *Note*: `ForkDigestValue` is composed of values that are not known until the genesis block/state are available.
 Due to this, clients SHOULD NOT subscribe to gossipsub topics until these genesis values are known.
 
-Each gossipsub [message](https://github.com/libp2p/go-libp2p-pubsub/blob/master/pb/rpc.proto#L17-L24) has a maximum size of `GOSSIP_MAX_SIZE`.
-Clients MUST reject (fail validation) messages that are over this size limit.
-Likewise, clients MUST NOT emit or propagate messages larger than this limit.
-
 The optional `from` (1), `seqno` (3), `signature` (5) and `key` (6) protobuf fields are omitted from the message,
 since messages are identified by content, anonymous, and signed where necessary in the application layer.
 Starting from Gossipsub v1.1, clients MUST enforce this by applying the `StrictNoSign`
@@ -288,6 +312,8 @@ The `message-id` of a gossipsub message MUST be the following 20 byte value comp
   the concatenation of `MESSAGE_DOMAIN_INVALID_SNAPPY` with the raw message data,
   i.e. `SHA256(MESSAGE_DOMAIN_INVALID_SNAPPY + message.data)[:20]`.
 
+Where relevant, clients MUST reject messages with `message-id` sizes other than 20 bytes.
+
 *Note*: The above logic handles two exceptional cases:
 (1) multiple snappy `data` can decompress to the same value,
 and (2) some message `data` can fail to snappy decompress altogether.
@@ -502,6 +528,16 @@ so [basic snappy block compression](https://github.com/google/snappy/blob/master
 Implementations MUST use a single encoding for gossip.
 Changing an encoding will require coordination between participating implementations.
 
+#### Gossipsub size limits
+
+Size limits are placed both on the [`RPCMsg`](https://github.com/libp2p/specs/blob/b5f7fce29b32d4c7d0efe37b019936a11e5db872/pubsub/README.md#the-rpc) frame as well as the encoded payload in each [`Message`](https://github.com/libp2p/specs/blob/b5f7fce29b32d4c7d0efe37b019936a11e5db872/pubsub/README.md#the-message).
+
+Clients MUST reject and MUST NOT emit or propagate messages whose size exceed the following limits:
+
+* the size of the encoded `RPCMsg`, including control messages, framing, topics etc, must not exceed `max_message_size()`
+* the size of the compressed payload in the `Message.data` field must not exceed `max_compressed_len(MAX_PAYLOAD_SIZE)`.
+* the size of the uncompressed payload must not exceed `MAX_PAYLOAD_SIZE` or the [type-specific SSZ bound](#what-are-ssz-type-size-bounds), whichever is lower.
+
 ### The Req/Resp domain
 
 #### Protocol identification
@@ -551,7 +587,7 @@ All other response types (non-Lists) send a single `response_chunk`.
 For both `request`s and `response`s, the `encoding-dependent-header` MUST be valid,
 and the `encoded-payload` must be valid within the constraints of the `encoding-dependent-header`.
 This includes type-specific bounds on payload size for some encoding strategies.
-Regardless of these type specific bounds, a global maximum uncompressed byte size of `MAX_CHUNK_SIZE` MUST be applied to all method response chunks.
+Regardless of these type specific bounds, a global maximum uncompressed byte size of `MAX_PAYLOAD_SIZE` MUST be applied to all method response chunks.
 
 Clients MUST ensure that lengths are within these bounds; if not, they SHOULD reset the stream immediately.
 Clients tracking peer reputation MAY decrement the score of the misbehaving peer under this circumstance.
@@ -665,15 +701,13 @@ When snappy is applied, it can be passed through a buffered Snappy reader to dec
 
 Before reading the payload, the header MUST be validated:
 - The unsigned protobuf varint used for the length-prefix MUST not be longer than 10 bytes, which is sufficient for any `uint64`.
-- The length-prefix is within the expected [size bounds derived from the payload SSZ type](#what-are-ssz-type-size-bounds).
+- The length-prefix is within the expected [size bounds derived from the payload SSZ type](#what-are-ssz-type-size-bounds) or `MAX_PAYLOAD_SIZE`, whichever is smaller.
 
 After reading a valid header, the payload MAY be read, while maintaining the size constraints from the header.
 
-A reader SHOULD NOT read more than `max_encoded_len(n)` bytes after reading the SSZ length-prefix `n` from the header.
-- For `ssz_snappy` this is: `32 + n + n // 6`.
-  This is considered the [worst-case compression result](https://github.com/google/snappy/blob/537f4ad6240e586970fe554614542e9717df7902/snappy.cc#L98) by Snappy.
+A reader MUST NOT read more than `max_compressed_len(n)` bytes after reading the SSZ length-prefix `n` from the header.
 
-A reader SHOULD consider the following cases as invalid input:
+A reader MUST consider the following cases as invalid input:
 - Any remaining bytes, after having read the `n` SSZ bytes. An EOF is expected if more bytes are read than required.
 - An early EOF, before fully reading the declared length-prefix worth of SSZ bytes.
 
@@ -1430,7 +1464,7 @@ Nevertheless, in the case of `ssz_snappy`, messages are still length-prefixed wi
 * Alignment with protocols like gRPC over HTTP/2 that prefix with length
 * Sanity checking of message length, and enabling much stricter message length limiting based on SSZ type information,
   to provide even more DOS protection than the global message length already does.
-  E.g. a small `Status` message does not nearly require `MAX_CHUNK_SIZE` bytes.
+  E.g. a small `Status` message does not nearly require `MAX_PAYLOAD_SIZE` bytes.
 
 [Protobuf varint](https://developers.google.com/protocol-buffers/docs/encoding#varints) is an efficient technique to encode variable-length (unsigned here) ints.
 Instead of reserving a fixed-size field of as many bytes as necessary to convey the maximum possible value, this field is elastic in exchange for 1-bit overhead per byte.
@@ -1679,6 +1713,22 @@ Other types are static, they have a fixed size: no dynamic-length content is inv
 For reference, the type bounds can be computed ahead of time, [as per this example](https://gist.github.com/protolambda/db75c7faa1e94f2464787a480e5d613e).
 It is advisable to derive these lengths from the SSZ type definitions in use, to ensure that version changes do not cause out-of-sync type bounds.
 
+#### Why is the message size defined in terms of application payload?
+
+When transmitting messages over gossipsub and / or req/resp, we want to ensure that the same payload sizes are supported no matter the underlying transport, decoupling the consensus layer from libp2p-induced overhead and the particular transmission strategy.
+
+To derive "encoded size limits" from desired application sizes we take into account snappy compression and framing overhead.
+
+In the case of gossipsub, the protocol supports sending multiple application payloads as well as mixing application data with control messages in each gossipsub frame - the limit is set such that at least one max-sized application-level message together with a small amount (1kb) of gossipsub overhead is allowed - implementations are free to pack multiple smaller application messages into a single gossipsub frame, and / or combine it with control messages as they see fit.
+
+The limit is set on the uncompressed payload size in particular to protect against decompression bombs - although
+
+#### Why is there a limit on message sizes at all?
+
+The message size limit protects against several forms of DoS and network-based amplification attacks and provide upper bounds for resource (network, memory) usage in the client based on protocol requirements to decode, buffer, cache, store and re-transmit messages which in turn translate into performance and protection tradeoffs, ensuring capacity to handle worst cases during recovery from network instability.
+
+In particular, blocks which at the time of writing is the only message type without a practical SSZ-derived upper bound on size cannot be fully verified synchronously as part of gossipsub validity checks meaning that there exist cases where invalid messages signed by a validator may be amplified by the network.
+
 ## libp2p implementations matrix
 
 This section will soon contain a matrix showing the maturity/state of the libp2p features required