Skip to content

Proposal for very large resolution images

Dirk Farin edited this page Jul 24, 2024 · 22 revisions

The HEIF grid image item is limited to images of less than 256x256 tiles because the number of tiles per row/column is stored in an 8 bit integer and also because the number of references in iref is limited to 65535. Moreover, it has significant overhead because each tile image has a copy of the metadata iinf, ipma, iref, iloc that sum to >3.3 MB for 256x255 tile images. This metadata is significant because it has to be loaded completely before decoding the image can start.

In order to support larger images, I propose to introduce a new image item_type, e.g. 'tild' that is conceptually similar to grid, but with support for larger images and with much less overhead that enables streaming the image content over the internet with small initial setup delays. Moreover, this proposal supports tiled images in which some tiles are not covered with image data.

tild image item syntax:

class TiledImage {
  unsigned int(8) version = 0;
  unsigned int(8) flags;

  DimensionFieldLength = (flags & 1) ? 64 : 32;
  unsigned int(DimensionFieldLength) output_width;
  unsigned int(DimensionFieldLength) output_height;

  unsigned int(32) tile_width;
  unsigned int(32) tile_height;

  unsigned int(32) tile_compression_type;

  TileColumns = (output_width + tile_width -1)/tile_width;
  TileRows    = (output_height + tile_height -1)/tile_height;

  OffsetFieldLength = (flags & 2) ? 64 : 32;

  for (int i=0; i<TileColumns*TileRows ; i++) {
    unsigned int(OffsetFieldLength) tile_start_offset[i];
  }

  SequentialOrder = (flags & 4);

  // ... followed by compressed tile data ...
}

Semantics

  • output_width, output_height is the total image size. This does not have to be an even multiple of tile_width, tile_height.
  • tile_width, tile_height is the size of a single tile. All tiles have the same size.
  • tile_compression_type is the four-character code that would have been used as tile item type in a grid image. E.g. hvc1 for h265 compression or j2k1 for JPEG2000.
  • tile_start_offset points to the start of the compressed data of the tile. The position is given relative to the start of the tild data. If a tile is not coded, the tile_start_offset[i] shall be 0. Note that this is not a file offset, but an offset into the tild data that can potentially span several iloc extents.
  • SequentialOrder is a hint to the decoder whether the compressed tile data is stored in sequential order.

Notes

  • The tild item shall have associated properties that are implicitly assigned to each tile. E.g. a tild image with tile_compression_type=hvc1 shall have an associated hvcC box that describes the coded stream of each tile.

  • The pixi item associated with the tild defines the size of the tild image, not the size of a tile.

  • The tild data shall be stored in an mdat box. This enables to read the starting positions of the tiles on-demand instead of having to the read them entirely at startup as is would be required when the tiles were each referenced in an iloc.

  • Even though the compressed tile data logically follows continuously after the metadata, we can still write the data interleaved into the file (e.g. intermixed with other tild resolution layers) by employing iloc extents.

  • Compressed data for the tiles can be stored in the file in any order.

  • The compressed tile size is not stored because the length of each tile can be computed from the start positions of the tiles. This is even possible if the tiles are not stored in sequential order. In that case, it is necessary to sort the tile_start_offset[] array. Whether the sorting step can be skipped is indicated by the SequentialOrder flag.

File structure

Simple file with single tild image

file1

File with two interleaved tild images

file2

tild, grid, and unci coexistence

When building a multi-resolution pymd pyramid, different image types can be used for each layer. For example, it would be possible to use grid images for the lower resolution layers so that these can be read with software that does not understand tild image types. Software support for tild is only needed for the high resolution layers.

pyramid