Skip to content

Commit

Permalink
Add getting started section to docs
Browse files Browse the repository at this point in the history
Add docs about the datacubes
  • Loading branch information
relativityhd committed Jan 9, 2025
1 parent 0bc3d79 commit 9f96ad4
Show file tree
Hide file tree
Showing 16 changed files with 206 additions and 40 deletions.
20 changes: 8 additions & 12 deletions darts-acquisition/src/darts_acquisition/__init__.py
Original file line number Diff line number Diff line change
@@ -1,14 +1,10 @@
"""Acquisition of data from various sources for the DARTS dataset."""


def hello(name: str) -> str:
"""Say hello to the user.
Args:
name (str): Name of the user.
Returns:
str: Greating message.
"""
return f"Hello, {name}, from darts-acquisition!"
from darts_acquisition.admin import download_admin_files as download_admin_files
from darts_acquisition.arcticdem.datacube import load_arcticdem as load_arcticdem
from darts_acquisition.arcticdem.vrt import load_arcticdem_from_vrt as load_arcticdem_from_vrt
from darts_acquisition.planet import load_planet_masks as load_planet_masks
from darts_acquisition.planet import load_planet_scene as load_planet_scene
from darts_acquisition.s2 import load_s2_masks as load_s2_masks
from darts_acquisition.s2 import load_s2_scene as load_s2_scene
from darts_acquisition.tcvis import load_tcvis as load_tcvis
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
"""ArcticDEM related data loading."""

from darts_acquisition.arcticdem.datacube import create_empty_datacube as create_empty_datacube
from darts_acquisition.arcticdem.datacube import load_arcticdem_tile as load_arcticdem_tile
from darts_acquisition.arcticdem.datacube import load_arcticdem as load_arcticdem
from darts_acquisition.arcticdem.datacube import procedural_download_datacube as procedural_download_datacube
from darts_acquisition.arcticdem.vrt import create_arcticdem_vrt as create_arcticdem_vrt
from darts_acquisition.arcticdem.vrt import load_arcticdem_from_vrt as load_arcticdem_from_vrt
37 changes: 32 additions & 5 deletions darts-acquisition/src/darts_acquisition/arcticdem/datacube.py
Original file line number Diff line number Diff line change
Expand Up @@ -249,18 +249,18 @@ def procedural_download_datacube(storage: zarr.storage.Store, tiles: gpd.GeoData
logger.info(f"Procedural download of {len(new_tiles)} tiles completed in {tick_fend - tick_fstart:.2f} seconds")


def load_arcticdem_tile(
def load_arcticdem(
geobox: GeoBox,
data_dir: Path,
data_dir: Path | str,
resolution: RESOLUTIONS,
buffer: int = 0,
persist: bool = True,
) -> xr.Dataset:
"""Get the corresponding ArcticDEM tile for the given geobox.
"""Load the ArcticDEM for the given geobox, fetch new data from the STAC server if necessary.
Args:
geobox (GeoBox): The geobox for which the tile should be loaded.
data_dir (Path): The directory where the ArcticDEM data is stored.
data_dir (Path | str): The directory where the ArcticDEM data is stored.
resolution (Literal[2, 10, 32]): The resolution of the ArcticDEM data in m.
buffer (int, optional): The buffer around the projected (epsg:3413) geobox in pixels. Defaults to 0.
persist (bool, optional): If the data should be persisted in memory.
Expand All @@ -274,9 +274,36 @@ def load_arcticdem_tile(
Warning:
Geobox must be in a meter based CRS.
"""
Usage:
Since the API of the `load_arcticdem` is based on GeoBox, one can load a specific ROI based on an existing Xarray DataArray:
```python
import xarray as xr
import odc.geo.xr
from darts_aquisition import load_arcticdem
# Assume "optical" is an already loaded s2 based dataarray
arcticdem = load_arcticdem(
optical.odc.geobox,
"/path/to/arcticdem-parent-directory",
resolution=2,
buffer=ceil(self.tpi_outer_radius / 2 * sqrt(2))
)
# Now we can for example match the resolution and extent of the optical data:
arcticdem = arcticdem.odc.reproject(optical.odc.geobox, resampling="cubic")
```
The `buffer` parameter is used to extend the region of interest by a certain amount of pixels.
This comes handy when calculating e.g. the Topographic Position Index (TPI), which requires a buffer around the region of interest to remove edge effects.
""" # noqa: E501
tick_fstart = time.perf_counter()

data_dir = Path(data_dir) if isinstance(data_dir, str) else data_dir

datacube_fpath = data_dir / f"datacube_{resolution}m_v4.1.zarr"
storage = zarr.storage.FSStore(datacube_fpath)
logger.debug(f"Getting ArcticDEM tile from {datacube_fpath.resolve()}")
Expand Down
30 changes: 26 additions & 4 deletions darts-acquisition/src/darts_acquisition/tcvis.py
Original file line number Diff line number Diff line change
Expand Up @@ -162,25 +162,47 @@ def procedural_download_datacube(storage: zarr.storage.Store, geobox: GeoBox):

def load_tcvis(
geobox: GeoBox,
data_dir: Path,
data_dir: Path | str,
buffer: int = 0,
persist: bool = True,
) -> xr.Dataset:
"""Load the Landsat Trends (TCVIS) from Google Earth Engine.
"""Load the TCVIS for the given geobox, fetch new data from GEE if necessary.
Args:
geobox (GeoBox): The geobox to load the data for.
data_dir (Path): The directory to store the downloaded data for faster access for consecutive calls.
data_dir (Path | str): The directory to store the downloaded data for faster access for consecutive calls.
buffer (int, optional): The buffer around the geobox in pixels. Defaults to 0.
persist (bool, optional): If the data should be persisted in memory.
If not, this will return a Dask backed Dataset. Defaults to True.
Returns:
xr.Dataset: The TCVIS dataset.
"""
Usage:
Since the API of the `load_tcvis` is based on GeoBox, one can load a specific ROI based on an existing Xarray DataArray:
```python
import xarray as xr
import odc.geo.xr
from darts_aquisition import load_tcvis
# Assume "optical" is an already loaded s2 based dataarray
tcvis = load_tcvis(
optical.odc.geobox,
"/path/to/tcvis-parent-directory",
)
# Now we can for example match the resolution and extent of the optical data:
tcvis = tcvis.odc.reproject(optical.odc.geobox, resampling="cubic")
```
""" # noqa: E501
tick_fstart = time.perf_counter()

data_dir = Path(data_dir) if isinstance(data_dir, str) else data_dir

datacube_fpath = data_dir / "tcvis_2000-2019.zarr"
storage = zarr.storage.FSStore(datacube_fpath)
logger.debug(f"Loading TCVis from {datacube_fpath.resolve()}")
Expand Down
6 changes: 3 additions & 3 deletions darts-segmentation/src/darts_segmentation/segment.py
Original file line number Diff line number Diff line change
Expand Up @@ -130,7 +130,7 @@ def segment_tile(
patch_size (int): The size of the patches. Defaults to 1024.
overlap (int): The size of the overlap. Defaults to 16.
batch_size (int): The batch size for the prediction, NOT the batch_size of input tiles.
Tensor will be sliced into patches and these again will be infered in batches. Defaults to 8.
Tensor will be sliced into patches and these again will be infered in batches. Defaults to 8.
reflection (int): Reflection-Padding which will be applied to the edges of the tensor. Defaults to 0.
Returns:
Expand Down Expand Up @@ -176,7 +176,7 @@ def segment_tile_batched(
patch_size (int): The size of the patches. Defaults to 1024.
overlap (int): The size of the overlap. Defaults to 16.
batch_size (int): The batch size for the prediction, NOT the batch_size of input tiles.
Tensor will be sliced into patches and these again will be infered in batches. Defaults to 8.
Tensor will be sliced into patches and these again will be infered in batches. Defaults to 8.
reflection (int): Reflection-Padding which will be applied to the edges of the tensor. Defaults to 0.
Returns:
Expand Down Expand Up @@ -225,7 +225,7 @@ def __call__(
patch_size (int): The size of the patches. Defaults to 1024.
overlap (int): The size of the overlap. Defaults to 16.
batch_size (int): The batch size for the prediction, NOT the batch_size of input tiles.
Tensor will be sliced into patches and these again will be infered in batches. Defaults to 8.
Tensor will be sliced into patches and these again will be infered in batches. Defaults to 8.
reflection (int): Reflection-Padding which will be applied to the edges of the tensor. Defaults to 0.
Returns:
Expand Down
6 changes: 2 additions & 4 deletions darts/src/darts/legacy_pipeline/planet_fast.py
Original file line number Diff line number Diff line change
Expand Up @@ -55,12 +55,10 @@ class LegacyNativePlanetPipelineFast(_FastMixin, _PlanetMixin, _BasePipeline):
"""

def _get_data(self, fpath: Path):
from darts_acquisition.arcticdem import load_arcticdem_tile
from darts_acquisition.planet import load_planet_masks, load_planet_scene
from darts_acquisition.tcvis import load_tcvis
from darts_acquisition import load_arcticdem, load_planet_masks, load_planet_scene, load_tcvis

optical = load_planet_scene(fpath)
arcticdem = load_arcticdem_tile(
arcticdem = load_arcticdem(
optical.odc.geobox, self.arcticdem_dir, resolution=2, buffer=ceil(self.tpi_outer_radius / 2 * sqrt(2))
)
tcvis = load_tcvis(optical.odc.geobox, self.tcvis_dir)
Expand Down
6 changes: 2 additions & 4 deletions darts/src/darts/legacy_pipeline/s2_fast.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,12 +53,10 @@ class LegacyNativeSentinel2PipelineFast(_FastMixin, _S2Mixin, _BasePipeline):
"""

def _get_data(self, fpath: Path):
from darts_acquisition.arcticdem import load_arcticdem_tile
from darts_acquisition.s2 import load_s2_masks, load_s2_scene
from darts_acquisition.tcvis import load_tcvis
from darts_acquisition import load_arcticdem, load_s2_masks, load_s2_scene, load_tcvis

optical = load_s2_scene(fpath)
arcticdem = load_arcticdem_tile(
arcticdem = load_arcticdem(
optical.odc.geobox, self.arcticdem_dir, resolution=10, buffer=ceil(self.tpi_outer_radius / 10 * sqrt(2))
)
tcvis = load_tcvis(optical.odc.geobox, self.tcvis_dir)
Expand Down
7 changes: 3 additions & 4 deletions darts/src/darts/legacy_training/preprocess.py
Original file line number Diff line number Diff line change
Expand Up @@ -169,9 +169,8 @@ def preprocess_s2_train_data(
import toml
import torch
import xarray as xr
from darts_acquisition.arcticdem import load_arcticdem_tile
from darts_acquisition.s2 import load_s2_masks, load_s2_scene, parse_s2_tile_id
from darts_acquisition.tcvis import load_tcvis
from darts_acquisition import load_arcticdem, load_s2_masks, load_s2_scene, load_tcvis
from darts_acquisition.s2 import parse_s2_tile_id
from darts_preprocessing import preprocess_legacy_fast
from darts_segmentation.training.prepare_training import create_training_patches
from dask.distributed import Client, LocalCluster
Expand Down Expand Up @@ -237,7 +236,7 @@ def preprocess_s2_train_data(
logger.info(f"Found optical tile with size {optical.sizes}")
arctidem_res = 10
arcticdem_buffer = ceil(tpi_outer_radius / arctidem_res * sqrt(2))
arcticdem = load_arcticdem_tile(
arcticdem = load_arcticdem(
optical.odc.geobox, arcticdem_dir, resolution=arctidem_res, buffer=arcticdem_buffer
)
tcvis = load_tcvis(optical.odc.geobox, tcvis_dir)
Expand Down
Binary file added docs/assets/arcticdem_procdownload.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/procdownload_comparison.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/tcvis_procdownload.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
6 changes: 3 additions & 3 deletions docs/dev/arch.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ The following modules are planned or potential ideas for future expansion of the
| `darts-train-utils` | Train | Shared utilities for training | PyTorch |

The packages should follow this architecture:
![DARTS nextgen architecture](../assets/darts_nextgen_architecture.png)
![DARTS nextgen architecture](../assets/darts_nextgen_architecture.png){ loading=lazy }

The `darts-nextgen` is planned to utilize [Ray](https://docs.ray.io/en/latest/index.html) to automaticly parallize the different computations.
However, each package should be designed so that one could build their own pipeline without Ray.
Expand Down Expand Up @@ -91,7 +91,7 @@ The following things needs to be updates:
## APIs between pipeline steps

The following diagram visualizes the steps of the major `packages` of the pipeline:
![DARTS nextgen pipeline steps](../assets/darts_nextgen_pipeline-steps.png)
![DARTS nextgen pipeline steps](../assets/darts_nextgen_pipeline-steps.png){ loading=lazy }

Each Tile should be represented as a single `xr.Dataset` with each feature / band as `DataVariable`.
Each DataVariable should have their `data_source` documented in the `attrs`, aswell as `long_name` and `units` if any for plotting.
Expand Down Expand Up @@ -319,4 +319,4 @@ Ray expects batched data to be in either numpy or pandas format and can't work w
Hence, a wrapper with custom stacking functions is needed.
This tradeoff is not small, however, the benefits in terms of maintainability and readability are worth it.

![Xarray overhead with Ray](../assets/xarray_ray_overhead.png)
![Xarray overhead with Ray](../assets/xarray_ray_overhead.png){ loading=lazy }
3 changes: 3 additions & 0 deletions docs/dev/config.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Configuration

TODO
117 changes: 117 additions & 0 deletions docs/getting_started.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
---
hide:
- navigation
---

# Getting Started

This is a guide to help you, as a user / data engineer, get started with the project.

## Installation

To setup the environment for the project, you need to install [Rye](https://rye.astral.sh/) and run the following command:

```sh
UV_INDEX_STRATEGY="unsafe-best-match" rye sync --features cuda12
```

For other CUDA versions or optional GDAL functionality, see the [contribution guide](contribute.md).

## Data Preparation

As of now, none of the implemented pipelines dowloads PLANET or Sentinel 2 data automatically.
Hence, you need to download the data manually and place it a directory of you choice.

Example directory structure of a PLANET Orthotile:

```sh
data/input/planet/PSOrthoTile/
├── 4372514/
│ └── 5790392_4372514_2022-07-16_2459/
│ ├── 5790392_4372514_2022-07-16_2459_BGRN_Analytic_metadata.xml
│ ├── 5790392_4372514_2022-07-16_2459_BGRN_DN_udm.tif
│ ├── 5790392_4372514_2022-07-16_2459_BGRN_SR.tif
│ ├── 5790392_4372514_2022-07-16_2459_metadata.json
│ ├── 5790392_4372514_2022-07-16_2459_udm2.tif
│ └── Thumbs.db
└── 4974017/
└── 5854937_4974017_2022-08-14_2475/
├── 5854937_4974017_2022-08-14_2475_BGRN_Analytic_metadata.xml
├── 5854937_4974017_2022-08-14_2475_BGRN_DN_udm.tif
├── 5854937_4974017_2022-08-14_2475_BGRN_SR.tif
├── 5854937_4974017_2022-08-14_2475_metadata.json
├── 5854937_4974017_2022-08-14_2475_udm2.tif
└── Thumbs.db
```

Example directory structure of a PLANET Scene:

```sh
data/input/planet/PSScene/
├── 20230703_194241_43_2427/
│ ├── 20230703_194241_43_2427.json
│ ├── 20230703_194241_43_2427_3B_AnalyticMS_metadata.xml
│ ├── 20230703_194241_43_2427_3B_AnalyticMS_SR.tif
│ ├── 20230703_194241_43_2427_3B_udm2.tif
│ └── 20230703_194241_43_2427_metadata.json
└── 20230703_194243_54_2427/
├── 20230703_194243_54_2427.json
├── 20230703_194243_54_2427_3B_AnalyticMS_metadata.xml
├── 20230703_194243_54_2427_3B_AnalyticMS_SR.tif
├── 20230703_194243_54_2427_3B_udm2.tif
└── 20230703_194243_54_2427_metadata.json
```

Example directory structure of a Sentinel 2 Scene:

!!! warning
For backcompatability reasons the Sentinel 2 data is expected to be in the same directory structure as the PLANET data.
Hence, data from Google EarthEngine or from the Copernicus Cloud needs to be adjusted.
This will probably change in the future.

```sh
data/input/sentinel2/
├── 20210818T223529_20210818T223531_T03WXP/
│ ├── 20210818T223529_20210818T223531_T03WXP_SCL_clip.tif
│ └── 20210818T223529_20210818T223531_T03WXP_SR_clip.tif
└── 20220826T200911_20220826T200905_T17XMJ/
├── 20220826T200911_20220826T200905_T17XMJ_SCL_clip.tif
└── 20220826T200911_20220826T200905_T17XMJ_SR_clip.tif
```

## Create a config file

An example config file can be found in the root of this repository called `config.toml.example`.
You can copy this file to either `configs/` or copy and rename it to `config.toml`, so that you personal config will be ignored by git.

Please change `sentinel2-dir`, `orthotiles-dir` and `scenes-dir` according to your PLANET or Sentinel 2 download directory.

You also need to specify the paths the model checkpoints (`model-dir`, `tcvis-model-name` and `notcvis-model-name`) you want to use.
Note, that only special checkpoints can be used, as described in the [architecture guide](dev/arch.md)

Auxiliary data (TCVIS and ArcticDEM) will be downloaded on demand into a datacube, which paths needs to be specified as well (`arcticdem-dir` and `tcvis-dir`).

Finally, specify an output directory (`output-dir`), where you want to save the results of the pipeline.

Of course you can tweak all other options aswell, also via the CLI.
A list of all options can be found in the [config guide](dev/config.md) or by running a command with the `--help` parameter.

## Run a pipeline

Example for PLANET

```sh
rye run darts run-native-planet-pipeline-fast --config-file path/to/your/config.toml
```

Example for Sentinel 2

```sh
rye run darts run-native-sentinel2-pipeline-fast --config-file path/to/your/config.toml
```

## Getting help

The CLI provides a help view with short explanations on the input settings with the `--help` parameter.

Of course, you are also welcome to contact the developers.
Loading

0 comments on commit 9f96ad4

Please sign in to comment.