-
Notifications
You must be signed in to change notification settings - Fork 1
Digram Serialization
To serialize several Digrams we simply add the serialized Digrams to each other.
To set the start of a digram, we simply use the complement of the first edge label index. Hence to seperate two digrams, we can search for the next Integer with the leading bit. This is then the next ELI1 and the start of the next digram, thus setting the boundaries of the internals.
IAS is the Internal Array Size Flag which represents if the Digram has two or one internals.
-
0
for 1 Internal Node -
1
for 2 Internal Nodes
The size flag represents the Number Class of the internals, if the internals are all smaller than the Maximum amount possible using a short, it would be a waste of space to save them as Integers. So the internals will be saved in the lowest number class all of them can be contained in.
-
00
for byte -
01
for short -
10
for integer -
11
for long
An external index has 2 bits, as we exploit the fact that we only have 1 or 2 external indexes The indexes are one of [1,2,3,4] which can be represented as:
- 1:
00
- 2:
01
- 3:
10
- 4:
11
In the case that we only have one external Index, external Index 2 will be set to the same as external 1.
In the following graph the compression ratio of this method in comparison to saving each Digram Occurences naively as a 24 byte object (N1 E1 N2 N3 E2 N4
) is shown:
this_approach/naive
A digram where the first node is an external and the last node, with the edges named :p1
and :p2
and their indexes 1
and 2
has the following occurences:
:n1 :p1 :n2 ; :p2 :n3
, :n4 :p1 :n5 ; :p2 :n6
will be saved as signed values (using | for seperation, spaces are only for representation of bytes wont be saved )
-1|2|-125|2|5
or as bits
11111111 11111111 11111111 11111111|00000000 00000000 00000000 00000010|10000011|00000010|00000101
or as hex
FF FF FF FF 00 00 00 02 83 02 05