diff --git a/docs/source/tutorials/writing.md b/docs/source/tutorials/writing.md index 479ba8196..b5998a4e3 100644 --- a/docs/source/tutorials/writing.md +++ b/docs/source/tutorials/writing.md @@ -84,6 +84,57 @@ Read the data. 2 3 6 ``` +In some scenarios, you may want to write your data a chunk at a time, rather than sending it all at once. This might be in cases where the full data is not available at once, or the data is too large for memory. This can be achieved in two ways: + +The first one is to stack them before saving back to client using the above mentioned `write_array` method. This works when the size of data is small. + +When the size of merged data becomes an issue for memory, or in cases when you want to save the result on-the-fly as each individual array is generated, this could be achieved by using the `write_block` method with a pre-allocated space in client. + +```python +# This approach will require you to know the final array dimension beforehand. + +# Assuming you have five 2d arrays (eg. images), each in shape of 32 by 32. +>>> stacked_array_shape = (5, 32, 32) + +# Define a tiled ArrayStructure based on shape +>>> import numpy +>>> from tiled.structures.array import ArrayStructure + +>>> structure = ArrayStructure.from_array(numpy.zeros(stacked_array_shape, dtype=numpy.int8)) # A good practice to keep the dtype the same as your final results to avoid mismatch. +>>> structure +ArrayStructure(data_type=BuiltinDtype(endianness='not_applicable', kind=, itemsize=1), chunks=((5,), (32,), (32,)), shape=(5, 32, 32), dims=None, resizable=False) + +# Re-define the chunk size to allow single array to be saved. +# In our example, this becomes ((1, 1, 1, 1, 1), (32,), (32,)) +>>> structure.chunks = ((1,) * stacked_array_shape[0], (stacked_array_shape[1],), (stacked_array_shape[2],)) + +# Now to see that the chunk for the first axis has been divided. +>>> structure +ArrayStructure(data_type=BuiltinDtype(endianness='not_applicable', kind=, itemsize=1), chunks=((1, 1, 1, 1, 1), (32,), (32,)), shape=(5, 32, 32), dims=None, resizable=False) + +# Allocate a new array client in tiled +# Note: the following line of code works for tiled version <= v.0.1.0a114 +>>> array_client = client.new(structure_family="array", structure=structure, key="stacked_result", metadata={"color": "yellow", "barcode": 13}) + +# For tiled version >= v0.1.0a115, consider the following +>>> from tiled.structures.data_source import DataSource +>>> data_source = DataSource(structure=structure, structure_family="array") +>>> array_client = client.new(structure_family="array", data_sources=[data_source], key ="stacked_result", metadata={"color": "yellow", "barcode": 13}) + +>>> array_client + + +# Save a single slice with specific index +# Save to the first array (first block index 0) +>>> first_array = numpy.random.rand(32, 32).astype(numpy.int8) +>>> array_client.write_block(first_array, block=(0, 0, 0)) + +# Save to the 3rd array (first block index 2) +>>> third_array = numpy.random.rand(32, 32).astype(numpy.int8) +>>> array_client.write_block(third_array, block=(2, 0, 0)) +``` + + ## Launch catalog with persistent data First, we initialize a file which Tiled will use as a database.