Skip to content

Commit

Permalink
Merge pull request #6 from genematx/composite
Browse files Browse the repository at this point in the history
Composite
  • Loading branch information
danielballan authored Jan 13, 2025
2 parents 2548cea + 0b2f013 commit a074b4f
Show file tree
Hide file tree
Showing 78 changed files with 2,460 additions and 1,300 deletions.
22 changes: 20 additions & 2 deletions .github/workflows/publish-image.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
# separate terms of service, privacy policy, and support
# documentation.

name: Create and publish image
name: Create and publish image and chart

on:
push:
Expand All @@ -15,7 +15,7 @@ env:
IMAGE_NAME: ${{ github.repository }}

jobs:
build-and-push-image:
build-and-push:
runs-on: ubuntu-latest
permissions:
contents: read
Expand All @@ -39,6 +39,8 @@ jobs:
uses: docker/metadata-action@98669ae865ea3cffbcbaa878cf57c20bbf1c6c38
with:
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
tags: |
type=semver,pattern={{version}}
- name: Build and push Docker image
uses: docker/build-push-action@ad44023a93711e3deb337508980b4b5e9bcdc5dc
Expand All @@ -48,3 +50,19 @@ jobs:
push: true
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}

- name: Install Helm
uses: Azure/setup-helm@v3
with:
token: ${{ secrets.GITHUB_TOKEN }}
id: install

- name: Log in to the Chart registry
run: |
echo ${{ secrets.GITHUB_TOKEN }} | helm registry login ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }} --username ${{ github.repository_owner }} --password-stdin
- name: Package and Push chart
run: |
helm dependencies update helm/tiled
helm package helm/tiled --version ${{ steps.meta.outputs.version }} --app-version ${{ steps.meta.outputs.version }} -d /tmp/
helm push /tmp/tiled-${{ steps.meta.outputs.version }}.tgz oci://ghcr.io/bluesky/charts
2 changes: 2 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@ repos:
- id: check-merge-conflict
- id: check-symlinks
- id: check-yaml
# These yaml files output valid yaml only after templating
exclude: ^helm/tiled/templates/
- id: debug-statements

- repo: https://github.com/pycqa/flake8
Expand Down
42 changes: 42 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,48 @@ Write the date in place of the "Unreleased" in the case a new version is release

# Changelog

## Unreleased

### Maintenance

- Addressed DeprecationWarnings from Python and dependencies

## v0.1.0-b13 (2024-01-09)

### Added

- `docker-compose.yml` now uses the healthcheck endpoint `/healthz`
- In client, support specifying API key expiration time as string with
units, like ``"7d"` or `"10m"`.
- Fix bug where access policies were not applied to child nodes during request
- Add metadata-based access control to SimpleAccessPolicy
- Add example test of metadata-based allowed_scopes which requires the path to the target node
- Added Helm chart with deployable default configuration

### Fixed

- Bug in Python client resulted in error when accessing data sources on a
just-created object.
- Fix bug where access policies were not applied to child nodes during request

### Changed

- The argument `prompt_for_reauthentication` is now ignored and warns.
Tiled will never prompt for reauthentication after the client is constructed;
if a session expires or is revoked, it will raise `CannotRefreshAuthentication`.
- The arguments `username` and `password` have been removed from the client
constructor functions. Tiled will always prompt for these interactively.
See the Authentication How-to Guide for more information, including on
how applications built on Tiled can customize this.
- The argument `remember_me` has been added to the client constructor
functions and to `Context.authenticate` and its alias `Context.login`.
This can be used to clear and avoid storing any tokens related to
the session.
- Change access policy API to be async for filters and allowed_scopes
- Pinned zarr to `<3` because Zarr 3 is still working on adding support for
certain features that we rely on from Zarr 2.


## 2024-12-09

### Added
Expand Down
8 changes: 7 additions & 1 deletion docker-compose.yml
Original file line number Diff line number Diff line change
@@ -1,12 +1,18 @@
version: "3.2"
services:
tiled:
image: ghcr.io/bluesky/tiled:v0.1.0b12
image: ghcr.io/bluesky/tiled:v0.1.0-b13
environment:
- TILED_SINGLE_USER_API_KEY=${TILED_SINGLE_USER_API_KEY}
ports:
- 8000:8000
restart: unless-stopped
healthcheck:
test: curl --fail http://localhost:8000/healthz || exit 1
interval: 60s
timeout: 10s
retries: 3
start_period: 30s

# Below we additionally configure monitoring with Prometheus and Grafana.
# This is optional; it is not required for Tiled to function.
Expand Down
2 changes: 1 addition & 1 deletion docs/source/explanations/catalog.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ and `assets`, describes the format, structure, and location of the data.
- `management` --- enum indicating whether the data is registered `"external"` data
or `"writable"` data managed by Tiled
- `structure_family` --- enum of structure types (`"container"`, `"array"`, `"table"`,
etc. -- except for `consolidated`, which can not be assigned to a Data Source)
etc. -- except for `composite`, which can not be assigned to a Data Source)
- `structure_id` --- a foreign key to the `structures` table
- `node_id` --- foreign key to `nodes`
- `id` --- integer primary key
Expand Down
23 changes: 12 additions & 11 deletions docs/source/explanations/structures.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,12 +10,12 @@ potentially any language.
The structure families are:

* array --- a strided array, like a [numpy](https://numpy.org) array
* awkward --- nested, variable-sized data (as implemented by [AwkwardArray](https://awkward-array.org/))
* consolidated --- a container-like structure to combine tables and arrays in a common namespace
* container --- a collection of other structures, akin to a dictionary or a directory
* sparse --- a sparse array (i.e. an array which is mostly zeros)
* table --- tabular data, as in [Apache Arrow](https://arrow.apache.org) or
[pandas](https://pandas.pydata.org/)
* container --- a collection of other structures, akin to a dictionary or a directory
* composite --- a container-like structure to combine table columns and arrays in a common namespace
* sparse --- a sparse array (i.e. an array which is mostly zeros)
* awkward --- nested, variable-sized data (as implemented by [AwkwardArray](https://awkward-array.org/))

## How structure is encoded

Expand Down Expand Up @@ -577,22 +577,22 @@ response.
}
```

### Consolidated
### Composite

This is a specialized container-like structure designed to link together multiple tables and arrays that store
related scientific data. It does not support nesting but provides a common namespace across all columns of the
contained tables along with the arrays (thus, name collisions are forbidden). This allows to further abstract out
the disparate internal storage mechanisms (e.g. Parquet for tables and zarr for arrays) and present the user with a
smooth homogeneous interface for data access. Consolidated structures do not support pagination and are not
smooth homogeneous interface for data access. Composite structures do not support pagination and are not
recommended for "wide" datasets with more than ~1000 items (cloumns and arrays) in the namespace.

Below is an example of a Consolidated structure that describes two tables and two arrays of various sizes. Their
respective structures are specfied in the `parts` list, and `all_keys` defines the internal namespace of directly
addressible columns and arrays.
Below is an example of a Composite structure that describes two tables and two arrays of various sizes. It sis very
similar to a usual Container structure, where `contents` list the structures of its constituents; additionally,
`flat_keys` defines the internal namespace of directly addressible columns and arrays.

```json
{
"parts": [
"contents": [
{
"structure_family": "table",
"structure": {
Expand Down Expand Up @@ -646,6 +646,7 @@ addressible columns and arrays.
"name": "G"
}
],
"all_keys": ["A", "B", "C", "D", "E", "F", "G"]
"flat_keys": ["A", "B", "C", "D", "E", "F", "G"],
"count": 7
}
```
7 changes: 5 additions & 2 deletions docs/source/how-to/api-keys.md
Original file line number Diff line number Diff line change
Expand Up @@ -167,14 +167,17 @@ as the user who is for. If an API key will be used for a specific task, it is
good security hygiene to give it only the privileges it needs for that task. It
is also recommended to set a limited lifetimes so that if the key is
unknowingly leaked it will not continue to work forever. For example, this
command creates an API key that will expire in 10 minutes (600 seconds) and can
command creates an API key that will expire in 10 minutes and can
search/list metadata but cannot download array data.

```
$ tiled api_key create --expires-in 600 --scopes read:metadata
$ tiled api_key create --expires-in 10m --scopes read:metadata
ba9af604023a829ab22edb786168d6e1b97cef68c54c6d95d7fad5e3e6347fa131263581
```

Expiration can be given in units of years `y`, days `d`, hours `h`, minutes
`m`, or seconds `s`.

See {doc}`../reference/scopes` for the full list of scopes and their capabilities.

```
Expand Down
158 changes: 158 additions & 0 deletions docs/source/how-to/authentication.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,158 @@
# Python Client Authentication

This covers authentication from the user (client) perspective. To learn how to
_deploy_ authenticated Tiled servers, see {doc}`../explanations/security`.

## Interactive Login

Some Tiled servers are configured to let users connect anonymously without
authenticating.

```py
>>> from tiled.client import from_uri
>>> client = from_uri("https://...")
>>> <Container ...>
```

Logging in may enable you to see more datasets that may not be public.
Log in works in one of two ways, depending on the server.

1. Username and password ("OAuth2 password grant")

```py
>>> client.login()
Username: ...
Password:
```

2. Via a web browser ("OAuth2 device code grant")

```py
>>> client.login()
You have 15 minutes visit this URL

https://...

and enter the code: XXXX-XXXX
```

In the future, Tiled will log you into this server automatically, without
re-prompting for credentials, until your session expires.

```py
>>> from tiled.client import from_uri
>>> client = from_uri("https://...")
# Automatically logged in!

# This is a quick way to verify whether you are already logged in
>>> client.context
<Context authenticated as '...'>
```

To opt out of this, set `remember_me=False`:

```py
>>> from tiled.client import from_uri
>>> client = from_uri("https://...", remember_me=False)
```

```{note}
Tiled stores OAuth2 tokens (it _never_ stores your password) in files
with properly restricted permissions under `$XDG_CACHE_DIR/tiled/tokens`,
typically `~/.config/tiled/tokens` on Linux and MacOS.
To customize the location of this storage, set the environment variable
`TILED_CACHE_DIR`.
```

Some Tiled servers are configured to always require login, disallowing any
anonymous access. For those, the client will prompt immediately, such as:

>>> from tiled.client import from_uri
>>> client = from_uri("https://...")
Username:
```
## Noninteractive Authentication (API keys)
There are environments where logging in interactively is not possible,
such as running a batch script. For these applications, we recommend
using an API key. These can be created from the CLI:
```sh
$ tiled login
$ tiled api_key create --expires-in 7d --note "for this week's experiment"
```

or from an interactive Python session:

```py
>>> client = from_uri("https://...")
>>> client.login()
>>> client.create_api_key(expires_in="7d", note="for this week's experiment")
{"secret": ...}
```

The expiration and note are optional, but recommended. Expiration can be given
in units of years `y`, days `d`, hours `h`, minutes `m`, or seconds `s`.

```
The best way to provide an API key is to set the environment variable
`TILED_API_KEY`. A script like this:
```py
from tiled.client import from_uri
client = from_uri("https://....")
```

will detect that `TILED_API_KEY` is set and use that API key for
authentication with Tiled. This is equivalent to:

```py
import os
from tiled.client import from_uri

client = from_uri("https://....", api_key=os.environ["TILED_API_KEY"])
```

Avoid typing the API key in to the code:

```py
from_uri("https://...", api_key="secret!") # DON'T
```

as it is easy to accidentally share or leak.

## Custom Applications

Custom applications, such as a graphical interfaces that wrap Tiled, may not be
able to use Tiled commandline-based prompts. They should avoid using the
convenience functions `tiled.client.construtors.from_uri` and
`tiled.client.construtors.from_profile`.

They may implement their own interfaces for collecting credentials (for
password grants) or launching a browser and waiting for the user to authorize a
session (for device code grants). The functions
`tiled.client.context.password_grant` and
`tiled.client.context.device_code_grant` may be useful building blocks. The
tokens obtained from this process may then be passed directly in to the Tiled
client like so.


```py
from tiled.client import Context

URI = "https://..."
context, node_path_parts = Context.from_any_uri(URI)
tokens, remember_me = launch_custom_interface()
context.configure_auth(tokens, remember_me=remember_me)
client = from_context(context, node_path_parts=node_path_parts)
```

The client will transparently handle OAuth2 refresh flow. If the session is
revoked or expires, and an attempt at refreshing the tokens is thus rejected
by the server, the exception `tiled.client.auth.CannotRefreshAuthentication`
will be raised. The application should be prepared to catch that exception
and reinitiate authentication.
Loading

0 comments on commit a074b4f

Please sign in to comment.