Clarify the name tokeniser uncomp_len calculation (PR #803) #803

jkbonfield · 2025-01-07T14:45:07Z

This includes all visible read name bytes plus 1 termination byte per name (e.g. '\0').

Fixes #802

github-actions · 2025-01-07T14:48:22Z

Changed PDFs as of bece1f7: CRAMcodecs (diff).

cmnbroad · 2025-01-07T15:04:08Z

CRAMcodecs.tex

+The serialised data stream starts with two unsigned little endian
+32-bit integers holding the total size of uncompressed name buffer and
+the number of read names.  This is followed the array elements
+themselves.  Note the uncompressed size the sum of all name lengths


Suggested change

themselves. Note the uncompressed size the sum of all name lengths

themselves. Note the uncompressed size is calculated as the sum of all name lengths,

cmnbroad · 2025-01-07T15:04:26Z

Looks good - thanks!

This includes all visible read name bytes plus 1 termination byte per name (e.g. '\0'). Fixes samtools#802

github-actions · 2025-01-07T16:04:22Z

Changed PDFs as of 4982e03: CRAMcodecs (diff).

zaeleus · 2025-01-22T20:14:31Z

CRAMcodecs.tex

+the number of read names.  This is followed the array elements
+themselves.  Note the uncompressed size is calculated as the sum of


The serialised data stream starts with two unsigned little endian 32-bit integers... This is followed the array elements themselves.

This is unrelated to the length calculation, but note there is also a 1 byte flag between the 2 integers and the data stream:

Bytes Type Name

4 uint32 uncomp_length

4 uint32 num_reads

1 uint8 use_arith

cmnbroad reviewed Jan 7, 2025

View reviewed changes

jkbonfield added sam cram and removed sam labels Jan 7, 2025

Clarify the name tokeniser uncomp_len calculation (PR samtools#803)

4982e03

This includes all visible read name bytes plus 1 termination byte per name (e.g. '\0'). Fixes samtools#802

jkbonfield force-pushed the name-tok-size branch from bece1f7 to 4982e03 Compare January 7, 2025 16:02

zaeleus reviewed Jan 22, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarify the name tokeniser uncomp_len calculation (PR #803) #803

Clarify the name tokeniser uncomp_len calculation (PR #803) #803

jkbonfield commented Jan 7, 2025

github-actions bot commented Jan 7, 2025

cmnbroad Jan 7, 2025

cmnbroad commented Jan 7, 2025

github-actions bot commented Jan 7, 2025

zaeleus Jan 22, 2025

	themselves. Note the uncompressed size the sum of all name lengths
	themselves. Note the uncompressed size is calculated as the sum of all name lengths,

		the number of read names. This is followed the array elements
		themselves. Note the uncompressed size is calculated as the sum of

Clarify the name tokeniser uncomp_len calculation (PR #803) #803

Are you sure you want to change the base?

Clarify the name tokeniser uncomp_len calculation (PR #803) #803

Conversation

jkbonfield commented Jan 7, 2025

github-actions bot commented Jan 7, 2025

cmnbroad Jan 7, 2025

Choose a reason for hiding this comment

cmnbroad commented Jan 7, 2025

github-actions bot commented Jan 7, 2025

zaeleus Jan 22, 2025

Choose a reason for hiding this comment