Skip to content

Proposal for very large resolution images

Dirk Farin edited this page Aug 15, 2024 · 22 revisions

The HEIF grid image item is limited to images with less than 256x256 tiles because the number of tiles per row/column is stored in an 8 bit integer and also because the number of references in iref is limited to 65535. Moreover, it has significant overhead because each tile image has a copy of the metadata iinf, ipma, iref, iloc that sum to >3.3 MB for a 256x255 tile image. This metadata is significant because it has to be loaded completely before decoding the image can start.

In order to support larger images, I propose to introduce a new image item_type, e.g. 'tild', as an alternative to grid.

Design Considerations

These features have been taken into account when designing the tild syntax:

  • support for very large resolutions,
  • much less overhead than grid,
  • enable streaming the image content over the internet with small initial setup delays,
  • support tiled images in which some tiles are not covered with image data,
  • saving tiles in arbitrary order to allow gradually growing files,
  • ability to order the tile storage locations such that locally neighboring tiles are closer together,
  • interleaved storage of multiple tiled images, e.g. for multi-resolution pyramids where storage of the lower resolution layer is interleaved with the higher-resolution layers,
  • ability to build multi-resolution pyramids with a mixture of grid, tild, and unci images in order to have partial compatibility to software without tild support.

tild Image Item Syntax

class TiledImage {
  unsigned int(8) version = 0;
  unsigned int(8) flags;
  unsigned int(8) number_of_extra_dimensions;

  DimensionFieldLength = (flags & 0x20) ? 64 : 32;
  unsigned int(DimensionFieldLength) output_width;
  unsigned int(DimensionFieldLength) output_height;

  for (int i=0; i<number_of_extra_dimensions; i++) {
    unsigned int(DimensionFieldLength) dimension_size[i];
  }

  unsigned int(32) tile_width;
  unsigned int(32) tile_height;

  unsigned int(32) tile_compression_type;

  TileColumns = (output_width + tile_width -1)/tile_width;
  TileRows    = (output_height + tile_height -1)/tile_height;

  OffsetFieldLength = OFFS_LEN[flags & 0x03];
  SizeFieldLength   = SIZE_LEN[(flags>>2) & 0x03];

  for (int i=0; i<TileColumns * TileRows * dimension_size[...] ; i++) {
    unsigned int(OffsetFieldLength) tile_start_offset[i];
    unsigned int(SizeFieldLength) tile_size[i];         // note: not present if SizeFieldLength==0
  }

  SequentialOrder = (flags & 0x10);

  // ... followed by compressed tile data ...
}

OFFS_LEN[] = [ 32, 40, 48, 64 ];
SIZE_LEN[] = [  0, 24, 32, 64 ];

Semantics

  • output_width, output_height is the total image size. This does not have to be an even multiple of tile_width, tile_height.
  • tile_width, tile_height is the size of a single tile. All tiles have the same size.
  • tile_compression_type is the four-character code that would have been used as tile item type in a grid image. E.g. hvc1 for h265 compression or j2k1 for JPEG2000.
  • tile_start_offset points to the start of the compressed data of the tile. The position is given relative to the start of the tild data. Note that this is not a file offset, but an offset into the tild data that can potentially span several iloc extents. If a tile is not coded, the tile_start_offset[i] shall be 0. If a tile is not coded, but the displayed image should be taken from a lower-resolution layer (in a pymd stack), tile_start_offset[i] shall be 1.
  • tile_size (if present) indicates the number of bytes of the coded tile bitstream.
  • SequentialOrder is a hint to the decoder whether the compressed tile data is stored in sequential order.

Notes

  • The tild item shall have associated properties that are implicitly assigned to each tile. E.g. a tild image with tile_compression_type=hvc1 shall have an associated hvcC box that describes the coded stream of each tile.

  • The ispe item associated with the tild defines the size of the tild image, not the size of a tile.

  • While tild allows to specify images sizes with 64 bit, ispe is currently limited to 32 bit. Since ispe is mandatory, this limits the size fields of the image to 32 bit.

  • The tild data should preferably not be stored in an idat box. This enables to read the starting positions of the tiles on-demand instead of having to the read them entirely at startup as is would be required when the tiles were each referenced in an iloc.

  • Even though the compressed tile data logically follows continuously after the metadata, we can still write the data interleaved into the file (e.g. intermixed with other tild resolution layers) by employing iloc extents.

  • Compressed data for the tiles can be stored in the file in any order.

  • The compressed tile sizes may be omitted. In this case, the decoder should compute the tile sizes from the start positions of the tiles. This is even possible if the tiles are not stored in sequential order. In that case, it is necessary to sort the tile_start_offset[] array. Whether the sorting step can be skipped is indicated by the SequentialOrder flag.

  • There are four different offset pointers sizes which corresponds to these maximum image sizes:

    pointer length maximum image size
    32 bit 4 GB
    40 bit 1 TB
    48 bit 256 TB
    64 bit 16 EB
  • There are four different tile size field length, corresponding to these maximum tile sizes:

    size field length maximum tile size
    0 depending on pointer length
    24 bit 16 MB
    32 bit 4 GB
    64 bit 16 EB
  • We support skipped tiles by using the special offset value 0 and 1. The special value 1 means that the tile does not exist at this resolution level, but the area is covered in a lower resolution layer. This can be used for maps where large areas contain not much detail, like water areas.

  • Support 3D volumes and higher dimension is supported by extra dimensions. The tiles are ordered in the same way as C-language multi-dimensional arrays are ordered in memory. 2D images are simply the case where number_of_extra_dimensions=0. (Other tile orientations, e.g. in the x/z plane, could later be defined as an image property, similar to irot).

Example of a 3D volume time series (2 extra dimensions): multidim

Tiles are index by [x][y][z][t]. Example: tile[0][1][1][0] = 6.

File structure

Simple file with single tild image

file1

File with two interleaved tild images

file2

tild, grid, and unci coexistence

When building a multi-resolution pymd pyramid, different image types can be used for each layer. For example, it would be possible to use grid images for the lower resolution layers so that these can be read with software that does not understand tild image types. Software support for tild is only needed for the high resolution layers.

pyramid