Skip to content

(GPU accelerated) Multi-arch (linux/amd64, linux/arm64/v8) Data Science dev containers for R, Python, Julia and MAX/Mojo

License

Notifications You must be signed in to change notification settings

b-data/data-science-devcontainers

Repository files navigation

(CUDA-based) Data Science dev containers

minimal-readme compliant Project Status: Active – The project has reached a stable, usable state and is being actively developed. Donate using Liberapay Open in GitHub Codespaces

(GPU accelerated) Multi-arch (linux/amd64, linux/arm64/v8) Data Science dev containers:

  • (CUDA) Julia base, pubtools
  • MAX/Mojo base, scipy
  • CUDA MAX base, scipy
  • (CUDA) Python base, scipy
  • (CUDA) R base, tidyverse, verse, geospatial, qgisprocess

Dev containers considered stable for

  • Julia versions ≥ 1.7.3
  • Mojo versions ≥ 24.3.0
    • MAX versions ≥ 24.5.0
  • Python versions ≥ 3.10.5
  • R versions ≥ 4.2.0

CUDA Screenshot

Parent images

Extended to match the (CUDA-based) JupyterLab docker stacks, except that

  • GPU accelerated dev containers are based on the NVIDIA CUDA runtime flavoured image.
    • The JupyterLab docker stacks are based on the NVIDIA CUDA devel flavoured image.
  • Dev containers' Oh My Zsh uses the devcontainers theme + default font.
    • The JupyterLab docker stacks' Oh My Zsh uses Powerlevel10k theme + MesloLGS NF font.
Features

  • JupyterLab: A web-based interactive development environment for Jupyter notebooks, code, and data.
  • Git: A distributed version-control system for tracking changes in source code.
  • Git LFS: A Git extension for versioning large files.
  • GRASS GIS: A free and open source Geographic Information System (GIS).
    ℹ️ R qgisprocess image
  • Orfeo Toolbox: An open-source project for state-of-the-art remote sensing.
    ℹ️ R qgisprocess image (amd64 only)
  • Julia1: A high-level, high-performance dynamic language for technical computing.
  • MAX1: A high-performance generative AI framework.
  • Mojo1: A programming language for AI developers.
  • Pandoc: A universal markup converter.
  • Python: An interpreted, object-oriented, high-level programming language with dynamic semantics.
  • QGIS: A free, open source, cross platform (lin/win/mac) geographical information system (GIS).
    ℹ️ R qgisprocess image
  • Quarto: A scientific and technical publishing system built on Pandoc.
    ℹ️ Julia pubtools, MAX/Mojo/Python scipy, R verse+ images
  • R1: A language and environment for statistical computing and graphics.
  • SAGA GIS: A Geographic Information System (GIS) software with immense capabilities for geodata processing and analysis.
    ℹ️ R qgisprocess image
  • TinyTeX: A lightweight, cross-platform, portable, and easy-to-maintain LaTeX distribution based on TeX Live.
    ℹ️ Julia pubtools, MAX/Mojo/Python scipy, R verse+ images
  • Zsh: A shell designed for interactive use, although it is also a powerful scripting language.

👉 See the Version Matrices for detailed information:

Pre-installed extensions

Table of Contents

Prerequisites

Dev containers require either Docker or Podman2 to be installed. CUDA-based versions require the following in addition:

  • NVIDIA GPU
  • NVIDIA Linux driver
  • NVIDIA Container Toolkit

ℹ️ The host running the GPU accelerated dev containers only requires the NVIDIA driver, the CUDA toolkit does not have to be installed.

Use driver version 535 (Long Term Support Branch) with NVIDIA Data Center GPUs or select NGC-Ready NVIDIA RTX boards to ensure forward compatibility until June 2026.

Install

Codespaces require no installation, but do not currently offer machines with NVIDIA GPUs.

Docker

To install Docker, follow the instructions for your platform:

Podman

To install Podman, follow the instructions for your platform:

CUDA

To install the NVIDIA Container Toolkit, follow the instructions for your platform:

Usage

The default dev container is meant to work on this repository.

Every other configuration is a custom Data Science dev container that behaves in a unique way:

  1. Default mount3:
    • source: empty directory
    • target: /home/vscode
    • type: volume
  2. Codespace only mount:
    • source: root of this repository
    • target: /workspaces
    • type: misc
  3. Default path: /home/vscode
  4. Default user4: vscode
    • uid: 1000 (auto-assigned)
    • gid: 1000 (auto-assigned)
  5. Lifecycle scripts:
    • onCreateCommand: home directory setup
    • postStartCommands
      • docker: Silently remove all unused images and all build cache (Codespace only)
      • julia: Copy user-specific startup files
      • r: Copy QGIS stuff from skeleton directory; Create R user library
    • postAttachCommand: Codespace only: Check for dev container updates

To disable the postStartCommand or postAttachCommand, comment out line 8 in ~/.local/bin/dockerSystemPrune.sh or ~/.local/bin/checkForUpdates.sh.

Codespace

  1. Click the <> Code button, then click the Codespaces tab.
    A message is displayed at the bottom of the dialog telling you who will pay for the codespace.
  2. Create your codespace after configuring advanced options:
    • Configure advanced options
      To configure advanced options for your codespace, such as a different machine type or a particular devcontainer.json file:
      • At the top right of the Codespaces tab, select ... and click New with options....
      • On the options page for your codespace, choose your preferred options from the dropdown menus.
      • Click Create codespace.

Creating a codespace for a repository - GitHub Docs

To open your codespace in JupyterLab:

  1. Execute

    jupyter-lab \
      --ServerApp.allow_origin='*' \
      --ServerApp.cookie_options="{'Same Site': 'None', 'Secure': True}" \
      --ServerApp.tornado_settings="{'headers':{'Content-Security-Policy':\"frame-ancestors 'self' https://*.github.dev\", 'Access-Control-Allow-Headers': 'accept, content-type, authorization, x-xsrftoken, x-github-token'}}" \
      --notebook-dir=/home/vscode \
      --no-browser
    
  2. Ctrl+click on one of the URLs shown in the Terminal.

ℹ️ Opening your codespace in JupyterLab according to the GitHub Docs sets the default path to /workspaces/<repository-name> that you can not escape.

Local/'Remote SSH'

Use the Dev Containers: Reopen in Container command from the Command Palette (F1, ⇧⌘P (Windows, Linux Ctrl+Shift+P))

To start JupyterLab:

  1. Execute

    jupyter-lab
    
  2. Ctrl+click on one of the URLs shown in the Terminal.

Persistence

Data in the following locations is persisted:

  1. The user's home directory (/home/vscode)5
  2. The dev container's workspace (/workspaces)

This is accomplished either via a volume or bind mount (or loop device on Codespaces) and is preconfigured.

Codespaces: A 'Full Rebuild Container' resets the home directory!
ℹ️ This is never necessary unless you want exactly that.

Similar project

What makes this project different:

  1. Multi-arch: linux/amd64, linux/arm64/v8
    ℹ️ Runs on Apple M series using Docker Desktop.
  2. Base image: Debian instead of Ubuntu
    ℹ️ CUDA-based images use Ubuntu.
  3. IDE: JupyterLab next to VS Code
  4. Just Python – no Conda / Mamba

CUDA-based images:

  1. Derived from nvidia/cuda:12.6.3-runtime-ubuntu22.04
  2. TensortRT and TensorRT plugin libraries
    • Dropped for images with Python versions ≥ 3.13.0

See Notes and CUDA Notes for tweaks, settings, etc.

Contributing

PRs accepted.

This project follows the Contributor Covenant Code of Conduct.

Support

Community support: Open a new discussion here.

Commercial support: Contact b-data by email.

License

Copyright © 2023 b-data GmbH

Distributed under the terms of the MIT License.

Footnotes

  1. Depending on which dev container configuration is selected. 2 3 4 5 6 7

  2. See issue #1 about limitations in Podman.

  3. See issue #2 about changing the mount type.

  4. See issue #3 about running as root.

  5. Alternatively for the root user (/root). Use with Docker/Podman in rootless mode.