Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify client codec #1105

Open
wants to merge 16 commits into
base: master
Choose a base branch
from
Open

Conversation

litt3
Copy link
Contributor

@litt3 litt3 commented Jan 14, 2025

Why are these changes needed?

  • The combination of blob encoding + IFFT/FFT yields a confusing result
  • This PR splits the fundamentally unrelated concepts apart: all payloads must be encoded. After that, we decide based on configuration whether the data must be IFFTed or not

Checks

  • I've made sure the tests are passing. Note that there might be a few flaky tests, in that case, please comment that they are not relevant.
  • I've checked the new test coverage and the coverage percentage didn't drop.
  • Testing Strategy
    • Unit tests
    • Integration tests
    • This PR is not tested :(

@litt3 litt3 self-assigned this Jan 14, 2025
@litt3 litt3 requested a review from samlaf January 14, 2025 17:29
@litt3 litt3 requested a review from cody-littley January 14, 2025 20:48
@litt3 litt3 marked this pull request as ready for review January 14, 2025 22:01
@litt3 litt3 requested review from ian-shim and bxue-l2 January 14, 2025 22:01
Copy link
Contributor

@samlaf samlaf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall big big fan of these changes. Much needed. I personally think we can lean in even higher on the type system and define types for EncodedPayload and Blob. I really don't like looking at a bunch of functions that take a []byte and return another []byte. Wdyt?

)

type BlobEncodingVersion byte

const (
// This minimal blob encoding contains a 32 byte header = [0x00, version byte, uint32 len of data, 0x00, 0x00,...]
// DefaultBlobEncoding entails a 32 byte header = [0x00, version byte, uint32 len of data, 0x00, 0x00,...]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// DefaultBlobEncoding entails a 32 byte header = [0x00, version byte, uint32 len of data, 0x00, 0x00,...]
// DefaultBlobEncoding entails a 32 byte header = [0x00, version byte, big-endian uint32 len of data, 0x00, 0x00,...]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 21 to 22
// The returned bytes may be interpreted as a polynomial in Eval form, where each contained field element of
// length 32 represents the evaluation of the polynomial at an expanded root of unity
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They could also be interpreted in coeff form right? Like we had in the optimal disperse route at the top of
image

Also is it really at expanded roots of unity? This part I'm not sure. cc @bxue-l2

Copy link
Contributor Author

@litt3 litt3 Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They could also be interpreted in coeff form right

This is a tricky question, let me explain my reasoning for wording it this way.

From the perspective of the client, blob polynomials may either be dispersed in Eval form, or Coeff form. Regardless of the form the client chooses, however, the disperser chooses to interpret what it receives as a polynomial in Coeff form.

client: "EncodePayload returns a polynomial in Eval form. Depending on my configuration, I may choose to directly disperse the blob in Eval form, or I may convert the polynomial to Coeff form" first.

disperser: "No matter what the client sends me, I will interpret it as Coeff form"


One argument for this set of definitions is that it yields meaningful names for the enum which determines pre-dispersal processing: Eval means don't IFFT, and Coeff means do IFFT. If we were to define EncodePayload, such that what it returns is considered to be Coeff form in the case of "optimal dispersal", that means the polynomial sent from the client to the disperser is always in Coeff form. In this case, we would need to think of alternate names for the enum values.

Essentially, the question boils down to this: in the alternate "optimal" dispersal pathway, at what point in time do we declare that the non-iffted bytes are in Coeff form? Does the client make this declaration, or does the disperser? I chose for the disperser to make this declaration in my descriptions, but I think it's a pretty arbitrary decision. We just need to come to consensus, and stick to it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also is it really at expanded roots of unity?

I think so, but confidence is lacking. Waiting for clarification from @bxue-l2

Comment on lines 24 to 26
// The returned bytes may or may not represent a blob. If the system is configured to distribute blobs in Coeff form,
// then the data returned from this function must be IFFTed to produce the final blob. If the system is configured to
// distribute blobs in Eval form, then the data returned from this function is the final blob representation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're going to be using blob here, then we have to be 100% precise and link to its definition somewhere in the codebase.

Also what does "system is configured mean"? Isn't the system eigenda? And eigenda always interprets blobs as being in coeff form?

Copy link
Contributor Author

@litt3 litt3 Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then we have to be 100% precise and link to its definition somewhere in the codebase

Do we have any precise definitions of a blob in the codebase yet?

system is configured mean

I intended this to mean "however the various clients are configured to disperse and retrieve blobs". Do you think it is better if I say:

"If clients are configured to distribute blobs in Coeff form,..."?

)

type BlobEncodingVersion byte

const (
// This minimal blob encoding contains a 32 byte header = [0x00, version byte, uint32 len of data, 0x00, 0x00,...]
// DefaultBlobEncoding entails a 32 byte header = [0x00, version byte, uint32 len of data, 0x00, 0x00,...]
// followed by the encoded data [0x00, 31 bytes of data, 0x00, 31 bytes of data,...]
DefaultBlobEncoding BlobEncodingVersion = 0x0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
DefaultBlobEncoding BlobEncodingVersion = 0x0
BlobEncodingVersion0 BlobEncodingVersion = 0x0

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We prob shouldnt be using Default given we don't know what the future reserves, and v1 might become the default at some future point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// The returned bytes may or may not represent a blob. If the system is configured to distribute blobs in Coeff form,
// then the data returned from this function must be IFFTed to produce the final blob. If the system is configured to
// distribute blobs in Eval form, then the data returned from this function is the final blob representation.
func EncodePayload(payload []byte) []byte {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice clean function. Can we add an example encoding in the doc above like we did for the header

[0x00, version byte, uint32 len of data, 0x00, 0x00,...] + [encoded payload example ]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 15 to 16
// Number of test iterations
iterations := testRandom.Intn(100) + 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a bit weird. Why not just always test the same number of iterations?

Comment on lines 11 to 12
// TestFFT checks that data can be IFFTed and FFTed repeatedly, always getting back the original data
func TestFFT(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// TestFFT checks that data can be IFFTed and FFTed repeatedly, always getting back the original data
func TestFFT(t *testing.T) {
// TestFFT checks that data can be IFFTed and FFTed repeatedly, always getting back the original data
// TODO: we should probably be using fuzzing instead of this kind of ad-hoc random search testing
func TestFFT(t *testing.T) {

}

var codec codecs.BlobCodec
var polynomialForm codecs.PolynomialForm
if config.DisablePointVerificationMode {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be a good time to update this weirdly named flag to something more explicit like the polynomial form being passed directly?

Feel like the fact that coeff form has point verification feature is VERY context dependent on understanding the way the eigenda backend is doing things etc. This should prob just be documented in the ENUM and flag, and then users can choose which form they want to use accordingly?

Comment on lines 161 to 169
encodedPayload, err := m.blobToEncodedPayload(data)
if err != nil {
return nil, fmt.Errorf("error decoding blob: %w", err)
return nil, fmt.Errorf("blob to encoded payload: %w", err)
}

return decodedData, nil
decodedPayload, err := codecs.DecodePayload(encodedPayload)
if err != nil {
return nil, fmt.Errorf("error decoding payload: %w", err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super clean! We might even want a single function like blobToPayload which does both operations?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also might want to use blob instead of data for the thing received by RetrieveBlob?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is a blob always power of 2?

Comment on lines +416 to +417
// If the system is configured to distribute blobs in Eval form, the returned bytes exactly match the blob bytes.
// If the system is configured to distribute blobs in Coeff form, the blob is FFTed before being returned
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// If the system is configured to distribute blobs in Eval form, the returned bytes exactly match the blob bytes.
// If the system is configured to distribute blobs in Coeff form, the blob is FFTed before being returned
// If the EigenDAClient is configured to distribute blobs in Eval form, the returned bytes exactly match the blob bytes.
// If the EigenDAClient is configured to distribute blobs in Coeff form, the blob is FFTed before being returned

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants