Skip to content

Commit

Permalink
move content to spec
Browse files Browse the repository at this point in the history
  • Loading branch information
Robin Berjon committed Dec 10, 2024
1 parent bc950fb commit 0136227
Show file tree
Hide file tree
Showing 4 changed files with 141 additions and 105 deletions.
87 changes: 87 additions & 0 deletions cid.src.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Content IDs (CIDs)</title>
</head>
<body>
<p>
DASL CIDs are a strict subset of <a href="https://docs.ipfs.tech/concepts/content-addressing/">IPFS CIDs</a>
(but you don't need to understanding anything about IPFS to either use or implement them) with the following properties:
</p>
<ul>
<li>Only modern CIDv1 CIDs are used, not legacy CIDv0.</li>
<li>
Only the lowercase base32 multibase encoding (the <code>b</code> prefix) is used for human-readable
(and subdomain-usable) string encoding.
</li>
<li>
Only the <code>raw</code> binary multicodec (0x55) and <code>dag-cbor</code> multicodec (0x71), with the
latter used only for dCBOR42-conformant DAGs.
</li>
<li>Only SHA-256 (0x12) and BLAKE3 hash functions (0x1e), and the latter only in certain circumstances.</li>
<li>
Regardless of size, resources <em>should not</em> be "chunked" into a DAG or Merkle tree (as historically done with
UnixFS canonicalization in IPFS systems) but rather hashed in their entirety and content-addressed directly.
</li>
<li>
This set of options has the added advantage that all the aforementioned single-byte prefixes require no
additional varint processing or byte-fiddling.
</li>
</ul>
<p>
Supporting two hashes isn't ideal, but having one hash type that can stream large resources (and do incremental
verification mid-stream) is a plus. Because BLAKE3 is still far from being supported by web browsers, it is
strongly recommended that CID producers limit themselves to SHA-256 if possible. Implementations intending to
run in web contexts are likely to either forego BLAKE3 verification in-browser, outsource verification to a
trusted component, or to have to dynamically load a BLAKE3 library in the browser, which may cause latency.
</p>
<p>
Use the following steps to <dfn id="parse-cid-string">parse a CID string</dfn>:
</p>
<ol>
<li>Accept a string <var>CID</var>.</li>
<li>Remove the first character from <var>CID</var> and store it in <var>prefix</var>.</li>
<li>If <var>prefix</var> is not equal to <code>b</code>, throw an error.</li>
<li>
Decode the rest of <var>CID</var> using <a href="https://datatracker.ietf.org/doc/html/rfc4648#section-6">the
base32 algorithm from RFC4648</a> with a lowercase alphabet and store the result in <var>CID bytes</var>.
</li>
<li>Return the result of applying the steps to <a href="#decode-cid">decode a CID</a> to <var>CID bytes</var>.</li>
</ol>
<p>
Use the following steps to <dfn id="parse-cid-binary">parse a binary CID</dfn>:
</p>
<ol>
<li>Accept an array of bytes <var>binary CID</var>.</li>
<li>
Remove the first byte in <var>binary CID</var> and store it in <var>prefix</var>.
</li>
<li>If <var>prefix</var> is not equal to <code>0</code> (a null byte, the binary base256 prefix), throw an error.</li>
<li>Store the rest of <var>binary CID</var> in <var>CID bytes</var>.</li>
<li>Return the result of applying the steps to <a href="#decode-cid">decode a CID</a> to <var>CID bytes</var>.</li>
</ol>
<p>
Use the following steps to <dfn id="decode-cid">decode a CID</dfn>:
</p>
<ol>
<li>Accept an array of bytes <var>CID bytes</var>.</li>
<li>
Remove the first byte in <var>CID bytes</var> and store it in <var>version</var>.
</li>
<li>If <var>version</var> is not equal to <code>1</code>, throw an error.</li>
<li>
Remove the next byte in <var>CID bytes</var> and store it in <var>codec</var>.
</li>
<li>If <var>codec</var> is not equal to <code>0x55</code> (raw) or <code>0x71</code> (dCBOR42), throw an error.</li>
<li>
Remove the next two bytes in <var>CID bytes</var> and store them in <var>hash type</var> and <var>hash size</var>,
respectively.
</li>
<li>If <var>hash type</var> is not equal to <code>0x12</code> (SHA-256) or <code>0x1e</code> (BLAKE3), throw an error.</li>
<li>If there are fewer than <var>hash size</var> bytes left in <var>CID bytes</var>, throw an error.</li>
<li>Remove the first <var>hash size</var> bytes from <var>CID bytes</var> and store them in <code>digest</code>. Store the rest in <var>remaining bytes</var>.</li>
<li>Return <var>version</var>, <var>codec</var>, <var>hash type</var>, <var>hash size</var>, <var>digest</var>, and <var>remaining bytes</var>.</li>
</ol>
</body>
</html>
22 changes: 22 additions & 0 deletions dcbor42.src.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Deterministic CBOR with tag 42 (dCBOR42)</title>
</head>
<body>
<p>
dCBOR42 is a form of IPLD that serializes only to deterministic CBOR, by normalizing and reducing some type
flexibility. Notably, we support no ADLs.
(See the <a href="https://datatracker.ietf.org/doc/draft-mcnally-deterministic-cbor/">current draft specification for dCBOR</a>,
and <a href="https://ftp.linux.cz/pub/internet-drafts/draft-bormann-cbor-det-01.html">Carsten Bormann's BCP document
on the underspecified determinism of Section 4.2 of the CBOR specification</a>). For debugging purposes, either
one-way conversion to DAG-JSON or <a href="https://datatracker.ietf.org/doc/draft-ietf-cbor-edn-literals/">CBOR
Extended Diagnostic Notation</a> can be used, but either way, note that the CIDs in such debugging outputs
should be the CIDs of the dCBOR42 content, not of other debugging resources.
</p>
<p>
Further details forthcoming.
</p>
</body>
</html>
123 changes: 18 additions & 105 deletions index.html
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ <h1>DASL — Data-Addressed Structures &amp; Links</h1>
<ul>
<li><a href="#what">What</a></li>
<li><a href="#code">Code</a></li>
<li><a href="#spec">Spec</a></li>
<li><a href="#spec">Specs</a></li>
</ul>
</nav>
</header>
Expand Down Expand Up @@ -123,110 +123,23 @@ <h2>Implementations</h2>
</ul>
</section>
<section id="spec">
<h2>Specification</h2>
<p>
There are two specifications in DASL: <a href="#cid">CIDs</a> and <a href="#dcbor42">dCBOR42</a>. CIDs
(Content IDs) are identifiers used for addressing resources by their contents, as in IPFS; dCBOR42
(deterministically-serialized CBOR with optional CBOR tag <code>42</code> supported)
is a serialization format that is deterministic and aware of CID-linked graphs, i.e. "IPLD".
</p>
<section id="cid">
<h3>Content IDs (CIDs)</h3>
<p>
DASL CIDs are a strict subset of <a href="https://docs.ipfs.tech/concepts/content-addressing/">IPFS CIDs</a>
(but you don't need to understanding anything about IPFS to either use or implement them) with the following properties:
</p>
<ul>
<li>Only modern CIDv1 CIDs are used, not legacy CIDv0.</li>
<li>
Only the lowercase base32 multibase encoding (the <code>b</code> prefix) is used for human-readable
(and subdomain-usable) string encoding.
</li>
<li>
Only the <code>raw</code> binary multicodec (0x55) and <code>dag-cbor</code> multicodec (0x71), with the
latter used only for dCBOR42-conformant DAGs.
</li>
<li>Only SHA-256 (0x12) and BLAKE3 hash functions (0x1e), and the latter only in certain circumstances.</li>
<li>
Regardless of size, resources <em>should not</em> be "chunked" into a DAG or Merkle tree (as historically done with
UnixFS canonicalization in IPFS systems) but rather hashed in their entirety and content-addressed directly.
</li>
<li>
This set of options has the added advantage that all the aforementioned single-byte prefixes require no
additional varint processing or byte-fiddling.
</li>
</ul>
<p>
Supporting two hashes isn't ideal, but having one hash type that can stream large resources (and do incremental
verification mid-stream) is a plus. Because BLAKE3 is still far from being supported by web browsers, it is
strongly recommended that CID producers limit themselves to SHA-256 if possible. Implementations intending to
run in web contexts are likely to either forego BLAKE3 verification in-browser, outsource verification to a
trusted component, or to have to dynamically load a BLAKE3 library in the browser, which may cause latency.
</p>
<p>
Use the following steps to <dfn id="parse-cid-string">parse a CID string</dfn>:
</p>
<ol>
<li>Accept a string <var>CID</var>.</li>
<li>Remove the first character from <var>CID</var> and store it in <var>prefix</var>.</li>
<li>If <var>prefix</var> is not equal to <code>b</code>, throw an error.</li>
<li>
Decode the rest of <var>CID</var> using <a href="https://datatracker.ietf.org/doc/html/rfc4648#section-6">the
base32 algorithm from RFC4648</a> with a lowercase alphabet and store the result in <var>CID bytes</var>.
</li>
<li>Return the result of applying the steps to <a href="#decode-cid">decode a CID</a> to <var>CID bytes</var>.</li>
</ol>
<p>
Use the following steps to <dfn id="parse-cid-binary">parse a binary CID</dfn>:
</p>
<ol>
<li>Accept an array of bytes <var>binary CID</var>.</li>
<li>
Remove the first byte in <var>binary CID</var> and store it in <var>prefix</var>.
</li>
<li>If <var>prefix</var> is not equal to <code>0</code> (a null byte, the binary base256 prefix), throw an error.</li>
<li>Store the rest of <var>binary CID</var> in <var>CID bytes</var>.</li>
<li>Return the result of applying the steps to <a href="#decode-cid">decode a CID</a> to <var>CID bytes</var>.</li>
</ol>
<p>
Use the following steps to <dfn id="decode-cid">decode a CID</dfn>:
</p>
<ol>
<li>Accept an array of bytes <var>CID bytes</var>.</li>
<li>
Remove the first byte in <var>CID bytes</var> and store it in <var>version</var>.
</li>
<li>If <var>version</var> is not equal to <code>1</code>, throw an error.</li>
<li>
Remove the next byte in <var>CID bytes</var> and store it in <var>codec</var>.
</li>
<li>If <var>codec</var> is not equal to <code>0x55</code> (raw) or <code>0x71</code> (dCBOR42), throw an error.</li>
<li>
Remove the next two bytes in <var>CID bytes</var> and store them in <var>hash type</var> and <var>hash size</var>,
respectively.
</li>
<li>If <var>hash type</var> is not equal to <code>0x12</code> (SHA-256) or <code>0x1e</code> (BLAKE3), throw an error.</li>
<li>If there are fewer than <var>hash size</var> bytes left in <var>CID bytes</var>, throw an error.</li>
<li>Remove the first <var>hash size</var> bytes from <var>CID bytes</var> and store them in <code>digest</code>. Store the rest in <var>remaining bytes</var>.</li>
<li>Return <var>version</var>, <var>codec</var>, <var>hash type</var>, <var>hash size</var>, <var>digest</var>, and <var>remaining bytes</var>.</li>
</ol>
</section>
<section id="dcbor42">
<h3>Deterministic CBOR with tag 42 (dCBOR42)</h3>
<p>
dCBOR42 is a form of IPLD that serializes only to deterministic CBOR, by normalizing and reducing some type
flexibility. Notably, we support no ADLs.
(See the <a href="https://datatracker.ietf.org/doc/draft-mcnally-deterministic-cbor/">current draft specification for dCBOR</a>,
and <a href="https://ftp.linux.cz/pub/internet-drafts/draft-bormann-cbor-det-01.html">Carsten Bormann's BCP document
on the underspecified determinism of Section 4.2 of the CBOR specification</a>). For debugging purposes, either
one-way conversion to DAG-JSON or <a href="https://datatracker.ietf.org/doc/draft-ietf-cbor-edn-literals/">CBOR
Extended Diagnostic Notation</a> can be used, but either way, note that the CIDs in such debugging outputs
should be the CIDs of the dCBOR42 content, not of other debugging resources.
</p>
<p>
Further details forthcoming.
</p>
</section>
<h2>Specifications</h2>
<dl>
<dt>
<a href="cid.html">Content Identifiers (CIDs)</a>
</dt>
<dd>
CIDs (Content IDs) are identifiers used for addressing resources by their contents, essentially a hash
with limited metadata.
</dd>
<dt>
<a href="dcbor42.html">Deterministically-serialized CBOR with Tag 42 (dCBOR42)</a>
</dt>
<dd>
dCBOR42 is a serialization format that is deterministic (so that the same data will have the same CID)
and that features native support for using CIDs as links.
</dd>
</dl>
</section>
<footer>
<nav>
Expand Down
14 changes: 14 additions & 0 deletions todo.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@

# TODO

- [ ] spec gen script
- [ ] refs watch (from specs and JSON)
- [ ] specs watch (all .src.html)
- [ ] spec style
- [ ] intro clarity
- [ ] Specs
- [ ] dCBOR42
- [ ] BDASL
- [ ] CAR
- [ ] MASL (Metadata for Arbitrary Structures and Links)
- [ ] RASL

0 comments on commit 0136227

Please sign in to comment.