Skip to content

25Q1

No due date 30% complete
  • Cloud auto-scaling using CSP APIs
  • Compose stack (single-session / multi-sesison) to bundle multiple models and services
  • More RBAC support (introducing new features based on it, such as project administrators)
  • Account manager for SSO
  • More Relay-compliant GraphQL schema (maybe full migration at some point)
  • Raftify-based HA setup
  • Extensive Prometheus/OpenTe…
  • Cloud auto-scaling using CSP APIs
  • Compose stack (single-session / multi-sesison) to bundle multiple models and services
  • More RBAC support (introducing new features based on it, such as project administrators)
  • Account manager for SSO
  • More Relay-compliant GraphQL schema (maybe full migration at some point)
  • Raftify-based HA setup
  • Extensive Prometheus/OpenTelemetry integration across entire project
  • Enhanced container registry integration (per-project registry, per-project quota, etc.)
  • Unified storage resource group (storage proxy + storage agent with "direct access" SFTP/filebrowser containers)
  • VFolder abstractions for object storage buckets (Minio / S3)
  • Rolling update of Backend.AI cluster (or at least agents)
  • Multi-license support in a single license server
  • Retire of keypair resource policies and migration to user resource policies
    • Also need to update the owner-access-key option to use user identities
  • Project-first architecture – per-user "workspace"
  • Project-level sharing of sessions (#2346)
  • Project-level container image visibility (including user-committed images)
  • User, vfolder operation audit logs
  • Migration of resource allocation maps from agent to manager for more holistic scheduling optimization (e.g., guaranteeing no fragmentation of GPUs)
  • Hierarchical managers to parallelize per-resource-group scheduling and idle checks
  • Session template revamps
  • Live propagation of configurations (e.g., fGPU options) via etcd watch
  • Easier (multi-node) installation of open-source edition
  • Logging contexts and request IDs
  • Make idle checkers scoped within resource groups
  • Optimized App Proxy traffic routing (probably via native modules and/or with Cilium)
  • User-defined network partitions via flexible SDN control plane integration
  • Snapshot and lineage tracking of vfolders (when the underlying storage backend supports)
  • Virtual agents to proxy external container orchestrators and node pools
  • Experimental agent backends like Singularity and native processes
  • Improved documentation for various plugins and SDK
Loading