move content to spec

darobin · Dec 10, 2024 · 0136227 · 0136227
1 parent bc950fb
commit 0136227
Show file tree

Hide file tree

Showing 4 changed files with 141 additions and 105 deletions.
diff --git a/cid.src.html b/cid.src.html
@@ -0,0 +1,87 @@
+<html lang="en">
+  <head>
+    <meta charset="UTF-8">
+    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+    <title>Content IDs (CIDs)</title>
+  </head>
+  <body>
+    <p>
+      DASL CIDs are a strict subset of <a href="https://docs.ipfs.tech/concepts/content-addressing/">IPFS CIDs</a>
+      (but you don't need to understanding anything about IPFS to either use or implement them) with the following properties:
+    </p>
+    <ul>
+      <li>Only modern CIDv1 CIDs are used, not legacy CIDv0.</li>
+      <li>
+        Only the lowercase base32 multibase encoding (the <code>b</code> prefix) is used for human-readable
+        (and subdomain-usable) string encoding.
+      </li>
+      <li>
+        Only the <code>raw</code> binary multicodec (0x55) and <code>dag-cbor</code> multicodec (0x71), with the
+        latter used only for dCBOR42-conformant DAGs.
+      </li>
+      <li>Only SHA-256 (0x12) and BLAKE3 hash functions (0x1e), and the latter only in certain circumstances.</li>
+      <li>
+        Regardless of size, resources <em>should not</em> be "chunked" into a DAG or Merkle tree (as historically done with
+        UnixFS canonicalization in IPFS systems) but rather hashed in their entirety and content-addressed directly.
+      </li>
+      <li>
+        This set of options has the added advantage that all the aforementioned single-byte prefixes require no
+        additional varint processing or byte-fiddling.
+      </li>
+    </ul>
+    <p>
+      Supporting two hashes isn't ideal, but having one hash type that can stream large resources (and do incremental
+      verification mid-stream) is a plus. Because BLAKE3 is still far from being supported by web browsers, it is
+      strongly recommended that CID producers limit themselves to SHA-256 if possible. Implementations intending to
+      run in web contexts are likely to either forego BLAKE3 verification in-browser, outsource verification to a
+      trusted component, or to have to dynamically load a BLAKE3 library in the browser, which may cause latency.
+    </p>
+    <p>
+      Use the following steps to <dfn id="parse-cid-string">parse a CID string</dfn>:
+    </p>
+    <ol>
+      <li>Accept a string <var>CID</var>.</li>
+      <li>Remove the first character from <var>CID</var> and store it in <var>prefix</var>.</li>
+      <li>If <var>prefix</var> is not equal to <code>b</code>, throw an error.</li>
+      <li>
+        Decode the rest of <var>CID</var> using <a href="https://datatracker.ietf.org/doc/html/rfc4648#section-6">the
+        base32 algorithm from RFC4648</a> with a lowercase alphabet and store the result in <var>CID bytes</var>.
+      </li>
+      <li>Return the result of applying the steps to <a href="#decode-cid">decode a CID</a> to <var>CID bytes</var>.</li>
+    </ol>
+    <p>
+      Use the following steps to <dfn id="parse-cid-binary">parse a binary CID</dfn>:
+    </p>
+    <ol>
+      <li>Accept an array of bytes <var>binary CID</var>.</li>
+      <li>
+        Remove the first byte in <var>binary CID</var> and store it in <var>prefix</var>.
+      </li>
+      <li>If <var>prefix</var> is not equal to <code>0</code> (a null byte, the binary base256 prefix), throw an error.</li>
+      <li>Store the rest of <var>binary CID</var> in <var>CID bytes</var>.</li>
+      <li>Return the result of applying the steps to <a href="#decode-cid">decode a CID</a> to <var>CID bytes</var>.</li>
+    </ol>
+    <p>
+      Use the following steps to <dfn id="decode-cid">decode a CID</dfn>:
+    </p>
+    <ol>
+      <li>Accept an array of bytes <var>CID bytes</var>.</li>
+      <li>
+        Remove the first byte in <var>CID bytes</var> and store it in <var>version</var>.
+      </li>
+      <li>If <var>version</var> is not equal to <code>1</code>, throw an error.</li>
+      <li>
+        Remove the next byte in <var>CID bytes</var> and store it in <var>codec</var>.
+      </li>
+      <li>If <var>codec</var> is not equal to <code>0x55</code> (raw) or <code>0x71</code> (dCBOR42), throw an error.</li>
+      <li>
+        Remove the next two bytes in <var>CID bytes</var> and store them in <var>hash type</var> and <var>hash size</var>,
+        respectively.
+      </li>
+      <li>If <var>hash type</var> is not equal to <code>0x12</code> (SHA-256) or <code>0x1e</code> (BLAKE3), throw an error.</li>
+      <li>If there are fewer than <var>hash size</var> bytes left in <var>CID bytes</var>, throw an error.</li>
+      <li>Remove the first <var>hash size</var> bytes from <var>CID bytes</var> and store them in <code>digest</code>. Store the rest in <var>remaining bytes</var>.</li>
+      <li>Return <var>version</var>, <var>codec</var>, <var>hash type</var>, <var>hash size</var>, <var>digest</var>, and <var>remaining bytes</var>.</li>
+    </ol>
+  </body>
+</html>
diff --git a/dcbor42.src.html b/dcbor42.src.html
@@ -0,0 +1,22 @@
+<html lang="en">
+  <head>
+    <meta charset="UTF-8">
+    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+    <title>Deterministic CBOR with tag 42 (dCBOR42)</title>
+  </head>
+  <body>
+    <p>
+      dCBOR42 is a form of IPLD that serializes only to deterministic CBOR, by normalizing and reducing some type
+      flexibility. Notably, we support no ADLs.
+      (See the <a href="https://datatracker.ietf.org/doc/draft-mcnally-deterministic-cbor/">current draft specification for dCBOR</a>,
+      and <a href="https://ftp.linux.cz/pub/internet-drafts/draft-bormann-cbor-det-01.html">Carsten Bormann's BCP document
+      on the underspecified determinism of Section 4.2 of the CBOR specification</a>). For debugging purposes, either
+      one-way conversion to DAG-JSON or <a href="https://datatracker.ietf.org/doc/draft-ietf-cbor-edn-literals/">CBOR
+      Extended Diagnostic Notation</a> can be used, but either way, note that the CIDs in such debugging outputs
+      should be the CIDs of the dCBOR42 content, not of other debugging resources.
+    </p>
+    <p>
+      Further details forthcoming.
+    </p>
+  </body>
+</html>
diff --git a/index.html b/index.html
@@ -29,7 +29,7 @@ <h1>DASL — Data-Addressed Structures &amp; Links</h1>
           <ul>
             <li><a href="#what">What</a></li>
             <li><a href="#code">Code</a></li>
-            <li><a href="#spec">Spec</a></li>
+            <li><a href="#spec">Specs</a></li>
           </ul>
         </nav>
       </header>
@@ -123,110 +123,23 @@ <h2>Implementations</h2>
         </ul>
       </section>
       <section id="spec">
-        <h2>Specification</h2>
-        <p>
-          There are two specifications in DASL: <a href="#cid">CIDs</a> and <a href="#dcbor42">dCBOR42</a>. CIDs
-          (Content IDs) are identifiers used for addressing resources by their contents, as in IPFS; dCBOR42
-          (deterministically-serialized CBOR with optional CBOR tag <code>42</code> supported)
-          is a serialization format that is deterministic and aware of CID-linked graphs, i.e. "IPLD".
-        </p>
-        <section id="cid">
-          <h3>Content IDs (CIDs)</h3>
-          <p>
-            DASL CIDs are a strict subset of <a href="https://docs.ipfs.tech/concepts/content-addressing/">IPFS CIDs</a>
-            (but you don't need to understanding anything about IPFS to either use or implement them) with the following properties:
-          </p>
-          <ul>
-            <li>Only modern CIDv1 CIDs are used, not legacy CIDv0.</li>
-            <li>
-              Only the lowercase base32 multibase encoding (the <code>b</code> prefix) is used for human-readable
-              (and subdomain-usable) string encoding.
-            </li>
-            <li>
-              Only the <code>raw</code> binary multicodec (0x55) and <code>dag-cbor</code> multicodec (0x71), with the
-              latter used only for dCBOR42-conformant DAGs.
-            </li>
-            <li>Only SHA-256 (0x12) and BLAKE3 hash functions (0x1e), and the latter only in certain circumstances.</li>
-            <li>
-              Regardless of size, resources <em>should not</em> be "chunked" into a DAG or Merkle tree (as historically done with
-              UnixFS canonicalization in IPFS systems) but rather hashed in their entirety and content-addressed directly.
-            </li>
-            <li>
-              This set of options has the added advantage that all the aforementioned single-byte prefixes require no
-              additional varint processing or byte-fiddling.
-            </li>
-          </ul>
-          <p>
-            Supporting two hashes isn't ideal, but having one hash type that can stream large resources (and do incremental
-            verification mid-stream) is a plus. Because BLAKE3 is still far from being supported by web browsers, it is
-            strongly recommended that CID producers limit themselves to SHA-256 if possible. Implementations intending to
-            run in web contexts are likely to either forego BLAKE3 verification in-browser, outsource verification to a
-            trusted component, or to have to dynamically load a BLAKE3 library in the browser, which may cause latency.
-          </p>
-          <p>
-            Use the following steps to <dfn id="parse-cid-string">parse a CID string</dfn>:
-          </p>
-          <ol>
-            <li>Accept a string <var>CID</var>.</li>
-            <li>Remove the first character from <var>CID</var> and store it in <var>prefix</var>.</li>
-            <li>If <var>prefix</var> is not equal to <code>b</code>, throw an error.</li>
-            <li>
-              Decode the rest of <var>CID</var> using <a href="https://datatracker.ietf.org/doc/html/rfc4648#section-6">the
-              base32 algorithm from RFC4648</a> with a lowercase alphabet and store the result in <var>CID bytes</var>.
-            </li>
-            <li>Return the result of applying the steps to <a href="#decode-cid">decode a CID</a> to <var>CID bytes</var>.</li>
-          </ol>
-          <p>
-            Use the following steps to <dfn id="parse-cid-binary">parse a binary CID</dfn>:
-          </p>
-          <ol>
-            <li>Accept an array of bytes <var>binary CID</var>.</li>
-            <li>
-              Remove the first byte in <var>binary CID</var> and store it in <var>prefix</var>.
-            </li>
-            <li>If <var>prefix</var> is not equal to <code>0</code> (a null byte, the binary base256 prefix), throw an error.</li>
-            <li>Store the rest of <var>binary CID</var> in <var>CID bytes</var>.</li>
-            <li>Return the result of applying the steps to <a href="#decode-cid">decode a CID</a> to <var>CID bytes</var>.</li>
-          </ol>
-          <p>
-            Use the following steps to <dfn id="decode-cid">decode a CID</dfn>:
-          </p>
-          <ol>
-            <li>Accept an array of bytes <var>CID bytes</var>.</li>
-            <li>
-              Remove the first byte in <var>CID bytes</var> and store it in <var>version</var>.
-            </li>
-            <li>If <var>version</var> is not equal to <code>1</code>, throw an error.</li>
-            <li>
-              Remove the next byte in <var>CID bytes</var> and store it in <var>codec</var>.
-            </li>
-            <li>If <var>codec</var> is not equal to <code>0x55</code> (raw) or <code>0x71</code> (dCBOR42), throw an error.</li>
-            <li>
-              Remove the next two bytes in <var>CID bytes</var> and store them in <var>hash type</var> and <var>hash size</var>,
-              respectively.
-            </li>
-            <li>If <var>hash type</var> is not equal to <code>0x12</code> (SHA-256) or <code>0x1e</code> (BLAKE3), throw an error.</li>
-            <li>If there are fewer than <var>hash size</var> bytes left in <var>CID bytes</var>, throw an error.</li>
-            <li>Remove the first <var>hash size</var> bytes from <var>CID bytes</var> and store them in <code>digest</code>. Store the rest in <var>remaining bytes</var>.</li>
-            <li>Return <var>version</var>, <var>codec</var>, <var>hash type</var>, <var>hash size</var>, <var>digest</var>, and <var>remaining bytes</var>.</li>
-          </ol>
-        </section>
-        <section id="dcbor42">
-          <h3>Deterministic CBOR with tag 42 (dCBOR42)</h3>
-          <p>
-            dCBOR42 is a form of IPLD that serializes only to deterministic CBOR, by normalizing and reducing some type
-            flexibility. Notably, we support no ADLs.
-            (See the <a href="https://datatracker.ietf.org/doc/draft-mcnally-deterministic-cbor/">current draft specification for dCBOR</a>,
-            and <a href="https://ftp.linux.cz/pub/internet-drafts/draft-bormann-cbor-det-01.html">Carsten Bormann's BCP document
-            on the underspecified determinism of Section 4.2 of the CBOR specification</a>). For debugging purposes, either
-            one-way conversion to DAG-JSON or <a href="https://datatracker.ietf.org/doc/draft-ietf-cbor-edn-literals/">CBOR
-            Extended Diagnostic Notation</a> can be used, but either way, note that the CIDs in such debugging outputs
-            should be the CIDs of the dCBOR42 content, not of other debugging resources.
-          </p>
-          <p>
-            Further details forthcoming.
-          </p>
-        </section>
+        <h2>Specifications</h2>
+        <dl>
+          <dt>
+            <a href="cid.html">Content Identifiers (CIDs)</a>
+          </dt>
+          <dd>
+            CIDs (Content IDs) are identifiers used for addressing resources by their contents, essentially a hash
+            with limited metadata.
+          </dd>
+          <dt>
+            <a href="dcbor42.html">Deterministically-serialized CBOR with Tag 42 (dCBOR42)</a>
+          </dt>
+          <dd>
+            dCBOR42 is a serialization format that is deterministic (so that the same data will have the same CID)
+            and that features native support for using CIDs as links.
+          </dd>
+        </dl>
       </section>
       <footer>
         <nav>

diff --git a/todo.md b/todo.md
@@ -0,0 +1,14 @@
+
+# TODO
+
+- [ ] spec gen script
+  - [ ] refs watch (from specs and JSON)
+  - [ ] specs watch (all .src.html)
+  - [ ] spec style
+- [ ] intro clarity
+- [ ] Specs
+  - [ ] dCBOR42
+  - [ ] BDASL
+  - [ ] CAR
+  - [ ] MASL (Metadata for Arbitrary Structures and Links)
+  - [ ] RASL