-
-
Notifications
You must be signed in to change notification settings - Fork 312
Proposal for very large resolution images
The HEIF grid
image item is limited to images with less than 256x256 tiles because the number of tiles per row/column is stored in an 8 bit integer and also because the number of references in iref
is limited to 65535.
Moreover, it has significant overhead because each tile image has a copy of the metadata iinf
, ipma
, iref
, iloc
that sum to >3.3 MB for a 256x255 tile image.
This metadata is significant because it has to be loaded completely before decoding the image can start.
To support larger images, this proposal introduces a new image item_type = 'tili'
, as an alternative to grid
.
- Store 2D images in a tiled memory layout, supporting random access to any chosen tile via byte range access.
- Support 2D tiling in 3D data structures, such as 2D single wavelength tiles in multicomponent 2D images, such as hyperspectral images.
- Support 2D tiling in 4D data structures, where the four dimensions are occupied by 2D images, multiple color (wavelength) components as the third dimension, and time as the 4 th dimension. An example is multi or hyperspectral video or image sequences.
- Note: before we list others, a literature search on MPEG (OMAF, etc.) and other 3D/4D tiling schemes (Cesium 3D tiles, etc.) is warranted to ensure there isn’t unwanted duplication.
The following requirements are defined for the tiled media content using the tili
syntax:
- support for arbitrarily large resolutions (over 1M x 1M pixels in a single image)
- much less overhead than
grid
, - enable streaming the image content over the internet with small initial setup delays,
- support tiled images in which some tiles are blank and not covered with image data,
- saving tiles in arbitrary order to allow gradually growing files,
- interleaved storage of multiple tiled images, e.g. for multi-resolution pyramids where storage of the lower resolution layer is interleaved with the higher-resolution layers,
- ability to build multi-resolution pyramids with a mixture of
grid
,tili
, andunci
images to have partial compatibility to software withouttili
support.
An image item of type tili
is an image stored as independently compressed image tiles.
The compressed data of all tiles is concatenated and a table of offset pointers to the start of the individual tiles is stored in front of the compressed data. This allows to load both the image tiles and also the pointers to the tile on demand.
A tili
image item also enables storing volumetric or higher dimensional data (e.g. hyper-spectral images or time series) as a set of 2D image tiles.
- Box type: 'tilC'
- Container: ItemPropertyContainerBox
- Property type: Descriptive item property
- Mandatory (per item): Yes, for an image item of type 'tili'
The TiledImageConfigurationBox
specifies the tile resolution and the compression codec used to store the image tiles in an image of type tili
. For N-dimensional (N>2) images, it also specifies the resolution of these extra dimensions.
aligned(8) class TiledImageConfigurationBox
extends ItemFullProperty('tilC', version=0, flags) {
unsigned int(32) tile_width;
unsigned int(32) tile_height;
unsigned int(32) tile_compression_type;
unsigned int(8) number_of_extra_dimensions;
for (int i=0; i<number_of_extra_dimensions; i++) {
unsigned int(32) dimension_size[i];
}
}
-
tile_width
,tile_height
is the size of a single tile. All tiles have the same size. Tiles at the right or bottom border may extend beyond the total image size. -
tile_compression_type
specifies the compression codec used for all the individual tile images.tile_compression_type
is one of the possible four-character types of ordinary image items (e.g.hvc1
for h265 compression orj2k1
for JPEG2000). -
number_of_extra_dimensions
specifies the number of dimensions of the N-dimensional image asnumber_of_extra_dimensions = N - 2
. A 2D image hasnumber_of_extra_dimensions=0
. -
dimension_size[i]
specifies the size of dimensioni+2
of the N-dimensional image. The size of the first two dimensions is obtained from the mandatoryispe
item property. -
OffsetFieldLength = OFFS_LEN[flags & 0x03]
defines the number of bits used to store the offset to the image data of a specific tile.OFFS_LEN[] = [ 32, 40, 48, 64 ]
-
SizeFieldLength = SIZE_LEN[(flags>>2) & 0x03]
defines the number of bits used to store the length of the image data of a specific tile.SIZE_LEN[] = [ 0, 24, 32, 64 ]
-
(flags & 0x10)
is a hint to a decoder whether the compressed tile image data is stored consecutively in sequential order.
The item data consists of an offset pointer table TiledImageOffsetTable
, followed by the compressed image data.
The number of tile offsets stored in the table (NumTiles
) is computed by
TileColumns = (ispe_width + tile_width -1)/tile_width;
TileRows = (ispe_height + tile_height -1)/tile_height;
NumTiles = TileColumns * TileRows;
for (i=0; i<number_of_extra_dimensions; i++) {
NumTiles = NumTiles * dimension_size[i];
}
ispe_width
and ispe_height
is the total image size as specified in the mandatory ispe
item property.
aligned(8) class TiledImageOffsetTable {
for (int i=0; i < NumTiles ; i++) {
unsigned int(OffsetFieldLength) tile_start_offset[i];
unsigned int(SizeFieldLength) tile_size[i]; // note: not present if SizeFieldLength==0
}
}
// ... followed by compressed tile data ...
-
tile_start_offset[i]
points to the start of the compressed data of the tile. The position is given relative to the start of theTiledImageOffsetTable
data. If a tile is not coded and the corresponding image area is undefined, thetile_start_offset[i]
shall be 0. If a tile is not coded, but the displayed image should be taken from a lower-resolution layer (in apymd
stack),tile_start_offset[i]
shall be 1. (Note: this can be used for maps where large areas contain not much detail, like water areas.) Note that this is not a file offset, but an offset into the item's data that can potentially span severaliloc
extents. -
tile_size[i]
(if present) indicates the number of bytes of the coded tile bitstream.
The entries in the offset table are ordered in row-major sequence. I.e. for a 2D image, they are indexed as [y][x]
, a three dimensional volumetric image (extra dimension z
) would be indexed as [z][y][x]
.
The compressed tiles data may be stored in the file in arbitrary order, i.e. the tile_start_offset[]
s are not necessarily in increasing order.
If the tile_size[i]
variables are not present, the decoder has to infer them from the tile_start_offset[]
s. For the case that the tiles are stored in sequential order (flags & 0x10 == 0x10
), the tile_size[i]
can be computed as tile_start_offset[i+1] - tile_start_offset[i]
with the exception of the last tile, which extends until the end of the data.
If the tiles are not stored in sequential order, the decoder first has to sort the tile start offsets before it can again compute the size from the difference to the next tile start. Note that in this case, the decoder cannot read the offset table on-demand. Thus, we advise to store the tile sizes in this case if on-demand access of the tile offsets is desired.
It is allowed that multiple tiles use the same tile_start_offset
to reference a similar image content. This case has to be taken care of when computing the tile sizes.
The tili
item shall have associated properties that are implicitly assigned to each tile. E.g. a tili
image with tile_compression_type=hvc1
shall have an associated hvcC
box that describes the coded stream of each tile.
The ispe
item associated with the tili
defines the total of the tili
image, not the size of a tile. If this total image size is not an integer multiple of the tile size, the image data of the tiles at the right and bottom border is cropped to the total image size.
Decoding of a single tile shall be done equivalently to the following steps:
- create a virtual image item of type
tile_compression_type
- assign an
ispe
item property of size (tile_width
,tile_height
) to the virtual image item - assign the mandatory item properties for an image item of type
tile_compression_type
from thetili
item to the virtual image item - decode the virtual image item
-
Even though the compressed tile data logically follows continuously after the metadata, we can still write the data interleaved into the file (e.g. intermixed with other
tili
resolution layers) by employingiloc
extents. -
The four different offset pointer sizes correspond to these maximum
tili
image file sizes:pointer length maximum compressed image size 32 bit 4 GB 40 bit 1 TB 48 bit 256 TB 64 bit 16 EB -
The four different tile size field lengths correspond to these maximum compressed tile sizes:
size field length maximum tile size 0 depending on pointer length 24 bit 16 MB 32 bit 4 GB 64 bit 16 EB
Example of a 3D volume time series (2 extra dimensions 'z' and 't'):
Tiles are indexed by [t][z][y][x]. The number in each tile denotes the sequence order in the offset table. Example: tile [1][1][1][0]
is stored as entry 14.
When building a multi-resolution pymd
pyramid, different image types can be used for each layer.
For example, it would be possible to use grid
images for the lower resolution layers so that these can be read with software that does not understand tili
image types. Software support for tili
is only needed for the high-resolution layers.