-
Notifications
You must be signed in to change notification settings - Fork 1
Compression format
Compression format is based on LZ77 algorithm. Compressed file is bit-based (byte boundaries are meaningless). It's a sequence of nodes:
- Node type, 1 bit
- Depending on node type:
- 1: output next 8 bits
- 0: output length + 2 bytes from window, starting from offset (measured from the start of window)
- 12 bits: offset
- 4 bits: length
So, "simple" node is 9 bits long, "window reference" node is 17 bits long.
Bits are in "big-endian" order (don't confuse with big-endian byte order).
Window is 0x1000 bytes long. As bytes are outputted, they're also written to the window at current write pointer, which is initially at position 1 (the second byte of window, 0 is the first). All writes (and possibly reads, should be checked) of window use wraparound, i.e. 3 bytes starting at 0x0ffe would be at 0x0ffe, 0x0fff, 0x0000.
You should know when to end decoding, it's encoded in header of parent [obfuscated format]. You shouldn't rely on "less than 9 bits left" as EOF.