From 01362271eac9469cb551cd1dd226119f574aacd9 Mon Sep 17 00:00:00 2001 From: Robin Berjon Date: Tue, 10 Dec 2024 11:31:44 +0100 Subject: [PATCH] move content to spec --- cid.src.html | 87 +++++++++++++++++++++++++++++++++ dcbor42.src.html | 22 +++++++++ index.html | 123 +++++++---------------------------------------- todo.md | 14 ++++++ 4 files changed, 141 insertions(+), 105 deletions(-) create mode 100644 cid.src.html create mode 100644 dcbor42.src.html create mode 100644 todo.md diff --git a/cid.src.html b/cid.src.html new file mode 100644 index 0000000..9f87c98 --- /dev/null +++ b/cid.src.html @@ -0,0 +1,87 @@ + + + + + Content IDs (CIDs) + + +

+ DASL CIDs are a strict subset of IPFS CIDs + (but you don't need to understanding anything about IPFS to either use or implement them) with the following properties: +

+ +

+ Supporting two hashes isn't ideal, but having one hash type that can stream large resources (and do incremental + verification mid-stream) is a plus. Because BLAKE3 is still far from being supported by web browsers, it is + strongly recommended that CID producers limit themselves to SHA-256 if possible. Implementations intending to + run in web contexts are likely to either forego BLAKE3 verification in-browser, outsource verification to a + trusted component, or to have to dynamically load a BLAKE3 library in the browser, which may cause latency. +

+

+ Use the following steps to parse a CID string: +

+
    +
  1. Accept a string CID.
  2. +
  3. Remove the first character from CID and store it in prefix.
  4. +
  5. If prefix is not equal to b, throw an error.
  6. +
  7. + Decode the rest of CID using the + base32 algorithm from RFC4648 with a lowercase alphabet and store the result in CID bytes. +
  8. +
  9. Return the result of applying the steps to decode a CID to CID bytes.
  10. +
+

+ Use the following steps to parse a binary CID: +

+
    +
  1. Accept an array of bytes binary CID.
  2. +
  3. + Remove the first byte in binary CID and store it in prefix. +
  4. +
  5. If prefix is not equal to 0 (a null byte, the binary base256 prefix), throw an error.
  6. +
  7. Store the rest of binary CID in CID bytes.
  8. +
  9. Return the result of applying the steps to decode a CID to CID bytes.
  10. +
+

+ Use the following steps to decode a CID: +

+
    +
  1. Accept an array of bytes CID bytes.
  2. +
  3. + Remove the first byte in CID bytes and store it in version. +
  4. +
  5. If version is not equal to 1, throw an error.
  6. +
  7. + Remove the next byte in CID bytes and store it in codec. +
  8. +
  9. If codec is not equal to 0x55 (raw) or 0x71 (dCBOR42), throw an error.
  10. +
  11. + Remove the next two bytes in CID bytes and store them in hash type and hash size, + respectively. +
  12. +
  13. If hash type is not equal to 0x12 (SHA-256) or 0x1e (BLAKE3), throw an error.
  14. +
  15. If there are fewer than hash size bytes left in CID bytes, throw an error.
  16. +
  17. Remove the first hash size bytes from CID bytes and store them in digest. Store the rest in remaining bytes.
  18. +
  19. Return version, codec, hash type, hash size, digest, and remaining bytes.
  20. +
+ + diff --git a/dcbor42.src.html b/dcbor42.src.html new file mode 100644 index 0000000..b60aaa9 --- /dev/null +++ b/dcbor42.src.html @@ -0,0 +1,22 @@ + + + + + Deterministic CBOR with tag 42 (dCBOR42) + + +

+ dCBOR42 is a form of IPLD that serializes only to deterministic CBOR, by normalizing and reducing some type + flexibility. Notably, we support no ADLs. + (See the current draft specification for dCBOR, + and Carsten Bormann's BCP document + on the underspecified determinism of Section 4.2 of the CBOR specification). For debugging purposes, either + one-way conversion to DAG-JSON or CBOR + Extended Diagnostic Notation can be used, but either way, note that the CIDs in such debugging outputs + should be the CIDs of the dCBOR42 content, not of other debugging resources. +

+

+ Further details forthcoming. +

+ + diff --git a/index.html b/index.html index 0f1e25a..5a91161 100644 --- a/index.html +++ b/index.html @@ -29,7 +29,7 @@

DASL — Data-Addressed Structures & Links

@@ -123,110 +123,23 @@

Implementations

-

Specification

-

- There are two specifications in DASL: CIDs and dCBOR42. CIDs - (Content IDs) are identifiers used for addressing resources by their contents, as in IPFS; dCBOR42 - (deterministically-serialized CBOR with optional CBOR tag 42 supported) - is a serialization format that is deterministic and aware of CID-linked graphs, i.e. "IPLD". -

-
-

Content IDs (CIDs)

-

- DASL CIDs are a strict subset of IPFS CIDs - (but you don't need to understanding anything about IPFS to either use or implement them) with the following properties: -

-
    -
  • Only modern CIDv1 CIDs are used, not legacy CIDv0.
  • -
  • - Only the lowercase base32 multibase encoding (the b prefix) is used for human-readable - (and subdomain-usable) string encoding. -
  • -
  • - Only the raw binary multicodec (0x55) and dag-cbor multicodec (0x71), with the - latter used only for dCBOR42-conformant DAGs. -
  • -
  • Only SHA-256 (0x12) and BLAKE3 hash functions (0x1e), and the latter only in certain circumstances.
  • -
  • - Regardless of size, resources should not be "chunked" into a DAG or Merkle tree (as historically done with - UnixFS canonicalization in IPFS systems) but rather hashed in their entirety and content-addressed directly. -
  • -
  • - This set of options has the added advantage that all the aforementioned single-byte prefixes require no - additional varint processing or byte-fiddling. -
  • -
-

- Supporting two hashes isn't ideal, but having one hash type that can stream large resources (and do incremental - verification mid-stream) is a plus. Because BLAKE3 is still far from being supported by web browsers, it is - strongly recommended that CID producers limit themselves to SHA-256 if possible. Implementations intending to - run in web contexts are likely to either forego BLAKE3 verification in-browser, outsource verification to a - trusted component, or to have to dynamically load a BLAKE3 library in the browser, which may cause latency. -

-

- Use the following steps to parse a CID string: -

-
    -
  1. Accept a string CID.
  2. -
  3. Remove the first character from CID and store it in prefix.
  4. -
  5. If prefix is not equal to b, throw an error.
  6. -
  7. - Decode the rest of CID using the - base32 algorithm from RFC4648 with a lowercase alphabet and store the result in CID bytes. -
  8. -
  9. Return the result of applying the steps to decode a CID to CID bytes.
  10. -
-

- Use the following steps to parse a binary CID: -

-
    -
  1. Accept an array of bytes binary CID.
  2. -
  3. - Remove the first byte in binary CID and store it in prefix. -
  4. -
  5. If prefix is not equal to 0 (a null byte, the binary base256 prefix), throw an error.
  6. -
  7. Store the rest of binary CID in CID bytes.
  8. -
  9. Return the result of applying the steps to decode a CID to CID bytes.
  10. -
-

- Use the following steps to decode a CID: -

-
    -
  1. Accept an array of bytes CID bytes.
  2. -
  3. - Remove the first byte in CID bytes and store it in version. -
  4. -
  5. If version is not equal to 1, throw an error.
  6. -
  7. - Remove the next byte in CID bytes and store it in codec. -
  8. -
  9. If codec is not equal to 0x55 (raw) or 0x71 (dCBOR42), throw an error.
  10. -
  11. - Remove the next two bytes in CID bytes and store them in hash type and hash size, - respectively. -
  12. -
  13. If hash type is not equal to 0x12 (SHA-256) or 0x1e (BLAKE3), throw an error.
  14. -
  15. If there are fewer than hash size bytes left in CID bytes, throw an error.
  16. -
  17. Remove the first hash size bytes from CID bytes and store them in digest. Store the rest in remaining bytes.
  18. -
  19. Return version, codec, hash type, hash size, digest, and remaining bytes.
  20. -
-
-
-

Deterministic CBOR with tag 42 (dCBOR42)

-

- dCBOR42 is a form of IPLD that serializes only to deterministic CBOR, by normalizing and reducing some type - flexibility. Notably, we support no ADLs. - (See the current draft specification for dCBOR, - and Carsten Bormann's BCP document - on the underspecified determinism of Section 4.2 of the CBOR specification). For debugging purposes, either - one-way conversion to DAG-JSON or CBOR - Extended Diagnostic Notation can be used, but either way, note that the CIDs in such debugging outputs - should be the CIDs of the dCBOR42 content, not of other debugging resources. -

-

- Further details forthcoming. -

-
+

Specifications

+
+
+ Content Identifiers (CIDs) +
+
+ CIDs (Content IDs) are identifiers used for addressing resources by their contents, essentially a hash + with limited metadata. +
+
+ Deterministically-serialized CBOR with Tag 42 (dCBOR42) +
+
+ dCBOR42 is a serialization format that is deterministic (so that the same data will have the same CID) + and that features native support for using CIDs as links. +
+