-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarify the name tokeniser uncomp_len calculation (PR #803) #803
base: master
Are you sure you want to change the base?
Conversation
Changed PDFs as of bece1f7: CRAMcodecs (diff). |
CRAMcodecs.tex
Outdated
The serialised data stream starts with two unsigned little endian | ||
32-bit integers holding the total size of uncompressed name buffer and | ||
the number of read names. This is followed the array elements | ||
themselves. Note the uncompressed size the sum of all name lengths |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
themselves. Note the uncompressed size the sum of all name lengths | |
themselves. Note the uncompressed size is calculated as the sum of all name lengths, |
Looks good - thanks! |
This includes all visible read name bytes plus 1 termination byte per name (e.g. '\0'). Fixes samtools#802
bece1f7
to
4982e03
Compare
Changed PDFs as of 4982e03: CRAMcodecs (diff). |
the number of read names. This is followed the array elements | ||
themselves. Note the uncompressed size is calculated as the sum of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The serialised data stream starts with two unsigned little endian 32-bit integers... This is followed the array elements themselves.
This is unrelated to the length calculation, but note there is also a 1 byte flag between the 2 integers and the data stream:
Bytes | Type | Name |
---|---|---|
4 | uint32 | uncomp_length |
4 | uint32 | num_reads |
1 | uint8 | use_arith |
This includes all visible read name bytes plus 1 termination byte per name (e.g. '\0').
Fixes #802