From feedb8eb9789b73f87cc3ab3ffa61681ffceba23 Mon Sep 17 00:00:00 2001
From: Sean Murphy <sean.murphy@sdsc.ethz.ch>
Date: Tue, 24 Jan 2023 09:23:28 +0100
Subject: [PATCH 01/13] docs(ssh-support): partial text on backend ssh solution

---
 .../011-add-session-ssh-support.md            | 91 +++++++++++++++++++
 .../rfc-template.md                           | 45 +++++++++
 2 files changed, 136 insertions(+)
 create mode 100644 rfcs/011-add-session-ssh-support/011-add-session-ssh-support.md
 create mode 100644 rfcs/011-add-session-ssh-support/rfc-template.md
diff --git a/rfcs/011-add-session-ssh-support/011-add-session-ssh-support.md b/rfcs/011-add-session-ssh-support/011-add-session-ssh-support.md
new file mode 100644
index 0000000..12a9b88
--- /dev/null
+++ b/rfcs/011-add-session-ssh-support/011-add-session-ssh-support.md
@@ -0,0 +1,91 @@
+- Start Date: 2023-01-20
+- Status: Proposed
+
+# Add ssh support to sessions
+
+## Summary
+
+> One paragraph explanation of the change.
+
+An important requirement which must be met by Renku is to enable users to
+log in to their Renku session via SSH; a key driver for this is to support
+working with VS Code within Renku sessions.
+
+This has been discussed within the team and an imterim solution is currently
+being developed; however, that solution is suboptimal as it requires creating
+a jumphost which has access to the cluster and users need to log in via the
+jumphost. This RFC details an alternative solution which is more in line with
+the Renku design and incorporates operational concerns more explicitly.
+
+## Motivation
+
+> Why are we doing this? What use cases does it support? What is the expected
+outcome?
+
+There is a
+[Shape-up](https://www.notion.so/Support-RenkuLab-compute-access-from-local-terminal-SSH-VSCode-f896d3b391c94bcc87c56e375eb531d6)
+document which provides the motivation for introducing this functionality.
+
+## Problem Definition
+
+> This section should include a detailed description of the problem and, importantly,
+include any specific constraints that arise. It can be as technical as required.
+
+The basic problem is to provide low friction, secure access to user sessions
+via ssh. 
+
+The standard ssh keypair workflow is as follows:
+- create keypair
+- ensure public key is stored on server to be accessed and ssh daemon is running
+- log in to session for the first time
+- confirm validity of remote server host key
+- access remote session (if keypair matches)
+
+From a user perspective, we need to take the following into account:
+- user may want to use an existing keypair which is stored in `$HOME/.ssh`
+- user may have existing keypairs but wants to have a new keypair for this scenario
+- user may have a keypair which is accessible via the `ssh-agent`
+- user may want to use PuTTY? (I've no idea how this intersects with VS Code)
+- user should be able to log into the session with a simple `ssh` command
+  - a standard `ssh <username>@<session>` should enable the user to log in
+- user should obtain a sensible shell configuration when they log in
+  - (not sure if something like devcontainer is required here)
+
+***(It may be necessary to differentiate the above into mandatory and nice to
+have).***
+
+From a system perspective, the following needs to be taken into account:
+- 
+
+***(It may be necessary to differentiate the above into mandatory and nice to
+have).***
+
+We need to de
+
+## Possible solutions
+
+
+## Drawbacks
+
+> Why should we *not* do this? Please consider the impact on users,
+on the integration of this change with other existing and planned features etc.
+
+> There are tradeoffs to choosing any path, please attempt to identify them here.
+
+## Rationale and Alternatives
+
+> Why is this design the best in the space of possible designs?
+
+> What other designs have been considered and what is the rationale for not choosing them?
+
+Two other designs have been considered:
+
+> What is the impact of not doing this?
+
+## Unresolved questions
+
+> What parts of the design do you expect to resolve through the RFC process before this gets merged?
+
+> What parts of the design do you expect to resolve through the implementation of this feature before stabilisation?
+
+> What related issues do you consider out of scope for this RFC that could be addressed in the future independently of the solution that comes out of this RFC?
diff --git a/rfcs/011-add-session-ssh-support/rfc-template.md b/rfcs/011-add-session-ssh-support/rfc-template.md
new file mode 100644
index 0000000..0d2925a
--- /dev/null
+++ b/rfcs/011-add-session-ssh-support/rfc-template.md
@@ -0,0 +1,45 @@
+- Start Date: (fill me in with today's date, DD-MM-YYYY)
+- Status: (One of Proposed, Accepted or Rejected)
+
+# (RFC title goes here)
+
+## Summary
+
+> One paragraph explanation of the change.
+
+## Motivation
+
+> Why are we doing this? What use cases does it support? What is the expected
+outcome?
+
+## Design Detail
+
+> This is the bulk of the RFC.
+
+> Explain the design in enough detail for somebody
+familiar with the infrastructure to understand. This should get into specifics and corner-cases,
+and include examples of how the service is used. Any new terminology should be
+defined here.
+
+## Drawbacks
+
+> Why should we *not* do this? Please consider the impact on users,
+on the integration of this change with other existing and planned features etc.
+
+> There are tradeoffs to choosing any path, please attempt to identify them here.
+
+## Rationale and Alternatives
+
+> Why is this design the best in the space of possible designs?
+
+> What other designs have been considered and what is the rationale for not choosing them?
+
+> What is the impact of not doing this?
+
+## Unresolved questions
+
+> What parts of the design do you expect to resolve through the RFC process before this gets merged?
+
+> What parts of the design do you expect to resolve through the implementation of this feature before stabilisation?
+
+> What related issues do you consider out of scope for this RFC that could be addressed in the future independently of the solution that comes out of this RFC?

From ab9e2598ac3bf6ed0c2b30b324f7445e1afc0b63 Mon Sep 17 00:00:00 2001
From: Sean Murphy <sean.murphy@sdsc.ethz.ch>
Date: Tue, 24 Jan 2023 17:28:19 +0100
Subject: [PATCH 02/13] docs(ssh-sessions): update content on ssh session
 support

---
 .../011-add-session-ssh-support.md            | 129 ++++++++++++++++--
 1 file changed, 117 insertions(+), 12 deletions(-)

diff --git a/rfcs/011-add-session-ssh-support/011-add-session-ssh-support.md b/rfcs/011-add-session-ssh-support/011-add-session-ssh-support.md
index 12a9b88..ae3702e 100644
--- a/rfcs/011-add-session-ssh-support/011-add-session-ssh-support.md
+++ b/rfcs/011-add-session-ssh-support/011-add-session-ssh-support.md
@@ -11,11 +11,10 @@ An important requirement which must be met by Renku is to enable users to
 log in to their Renku session via SSH; a key driver for this is to support
 working with VS Code within Renku sessions.
 
-This has been discussed within the team and an imterim solution is currently
-being developed; however, that solution is suboptimal as it requires creating
-a jumphost which has access to the cluster and users need to log in via the
-jumphost. This RFC details an alternative solution which is more in line with
-the Renku design and incorporates operational concerns more explicitly.
+This has been discussed within the team and work is ongoing on an interim
+solution which meets the basic requirements. This RFC provides the context,
+supports discussion of possible solutions and will be used to document the
+final agreed solution.
 
 ## Motivation
 
@@ -34,14 +33,14 @@ include any specific constraints that arise. It can be as technical as required.
 The basic problem is to provide low friction, secure access to user sessions
 via ssh. 
 
-The standard ssh keypair workflow is as follows:
+The standard ssh (keypair-based) workflow is as follows:
 - create keypair
 - ensure public key is stored on server to be accessed and ssh daemon is running
 - log in to session for the first time
 - confirm validity of remote server host key
 - access remote session (if keypair matches)
 
-From a user perspective, we need to take the following into account:
+From a user perspective, the following needs to be taken into account:
 - user may want to use an existing keypair which is stored in `$HOME/.ssh`
 - user may have existing keypairs but wants to have a new keypair for this scenario
 - user may have a keypair which is accessible via the `ssh-agent`
@@ -51,19 +50,125 @@ From a user perspective, we need to take the following into account:
 - user should obtain a sensible shell configuration when they log in
   - (not sure if something like devcontainer is required here)
 
-***(It may be necessary to differentiate the above into mandatory and nice to
-have).***
-
 From a system perspective, the following needs to be taken into account:
-- 
+- Unlike HTTP/S, SSH does not have support for server identification and hence
+  it is not trivial to perform aggregation of SSH sessions; further, this
+  carries non-negligible security risks, so it is not a standard approach.
+- The Renku platform should never see user private keys
+- A solution which does not have many moving parts is definitely preferred
+  (e.g. we should avoid publishing DNS entries dynamically if possible)
+- Solutions which consume large amounts of IP addresses should be avoided as
+  they can be costly
+- A solution which supports monitoring in a convenient manner is desirable
 
 ***(It may be necessary to differentiate the above into mandatory and nice to
 have).***
 
-We need to de
+## Key Assumptions
+
+The following key assumptions apply:
+- a solution is required which can be deployed both in the cloud provider
+  context and in the Switch/Openstack context
+- we are not targetting the most basic users; we assume they understand the
+  need for some security and are willing to put the time/energy required into a
+  reasonably sensible ssh configuration
+- password authentication to ssh sessions exposed to the Internet is not
+  permitted
 
 ## Possible solutions
 
+As this work has some kind of history, three solutions have been discussed in
+various conversations - these are:
+- an approach in which an ssh port is exposed on each session directly to the
+Internet;
+- an approach in which a dedicated jumphost is used and the standard ssh
+proxying mechanisms are used to access the session via the jumphost;
+- an approach in which a dedicated proxy is provided as part of the Renku
+platform which provides ssh access without needing to use the ssh client
+proxy capabilities.
+Each of these approaches is discussed in more detail below.
+
+Other solutions could be envisaged: it could be possible to provide a tunnel
+over HTTP/S into the session and access it via ssh locally or perhaps some
+wireguard solution might work but these approaches would probably encounter
+issues with different OS versions, would not be easy to configure and
+ultimately would likely have a detrimental impact on user experience and hence
+are not considered further.
+
+### Exposing SSH port from each session directly
+
+In this approach, each user session exposes at least 2 ports: an HTTP port
+for the Jupyter lab server and an SSH port for the SSH session. These ports
+have corresponding services running inside the pod.
+
+Services are then linked to these sessions with the ports on the services
+mapping to the ports exposed by the pod.
+
+In this solution, an ingress is mapped directly to the SSH services with a
+dedicated ingress for each SSH session. Each ingress should have a unique name 
+and also a unique IP address as the SSH protocol operates
+on an IP address basis (i.e., it has no specific server identification such as
+SNI which can enable a single server to handle requests for multiple endpoints). 
+This solution can suffer from DNS propagation delays, depending on the DNS
+provider and DNS configuration. Further, exposing ssh servers running inside
+user sessions directly to the Internet brings some security risks and it would
+need to be clear that users cannot easily change the ssh daemon configuration
+within their sessions to make their session very exposed.
+
+A couple of further points relating to this approach:
+- ingresses on cloud providers can typically be expensive; having one per
+  session is likely to incur significant cost;
+- on the Switch deployments, the entire cluster is exposed via a single IP
+  address; as such it is not straightforward to devise a solution in which
+  different sessions would map to different IP addresses.
+
+### Adding a proxy jumphost
+
+Another solution is to have a jumphost and to use this to access the pods. In
+this case, a jumphost must be installed inside the kubernetes cluster as it 
+requires direct access to the ssh server running on the pods.
+
+This requires modifications to the cluster ingress configuration to forward TCP 
+traffic incident on a specific port to the jumphost pod on its SSH port; this
+pod must have its sshd configured to support proxying. Also, as ssh proxying
+approaches require authentication against both the proxy and the destination
+server, an approach in which some combination of either no credentials and
+a keypair for which the user possesses the private key must be performed.
+
+The most natural solution in this case is to use the user's public key within
+the session and have open access with no ability to obtain a shell within the
+jumphost.
+
+In this case, the command to ssh into a session requires specifying both the
+jumphost and the destination; as such it is a more complex command and introduces
+some user friction when accessing sessions. VSCode supports such proxying
+scenarios without problem.
+
+### Using a Man In the Middle Proxy solution
+
+The third approach is the one followed by gitpod; in this approach a dedicated
+proxy is provided which affords access to the sessions. As with the proxying
+approach above, this must run on the kubernetes cluster.
+
+Unlike the above approach, this requires writing and maintaining a software
+component. The key difference is that this proxy does not use the proxying
+capabilities provided by an SSH client in which one SSH session is embedded
+inside another; effectively this approach concatenates two distinct SSH
+sessions with two distinct authentication flows. The proxy terminates one
+SSH session and forwards all traffic to another SSH session.
+
+This solution has the following benefits:
+- it is possible to log in via a simple ssh command which looks much cleaner/more
+professional;
+- there is a single entry point which reduces the attack surface;
+- it is reasonably straightforward to add instrumentation such that SSH session
+information data can be easily collected.
+
+The primary downside is that it involves writing and maintaining another
+component; however initial work by Tasko indicates that this can be done with
+a modest amount of coding effort.
+
+## Proposed Solution
 
 ## Drawbacks
 

From 63c5efe745e44ac8ab64b2f3cdf81f6408bd552a Mon Sep 17 00:00:00 2001
From: Sean Murphy <sean.murphy@sdsc.ethz.ch>
Date: Mon, 20 Mar 2023 13:03:34 +0100
Subject: [PATCH 03/13] WIP(ssh-sessions): add more content on ssh sessions

---
 .../011-add-session-ssh-support.md            | 152 +++++++++++++++---
 1 file changed, 128 insertions(+), 24 deletions(-)

diff --git a/rfcs/011-add-session-ssh-support/011-add-session-ssh-support.md b/rfcs/011-add-session-ssh-support/011-add-session-ssh-support.md
index ae3702e..b63e96f 100644
--- a/rfcs/011-add-session-ssh-support/011-add-session-ssh-support.md
+++ b/rfcs/011-add-session-ssh-support/011-add-session-ssh-support.md
@@ -16,14 +16,18 @@ solution which meets the basic requirements. This RFC provides the context,
 supports discussion of possible solutions and will be used to document the
 final agreed solution.
 
+Note that this RFC must (somehow) take into account decisions which are
+being made in the current push to deliver a workable solution as soon as
+possible.
+
 ## Motivation
 
 > Why are we doing this? What use cases does it support? What is the expected
 outcome?
 
-There is a
-[Shape-up](https://www.notion.so/Support-RenkuLab-compute-access-from-local-terminal-SSH-VSCode-f896d3b391c94bcc87c56e375eb531d6)
-document which provides the motivation for introducing this functionality.
+There is a [Shape-up
+document](https://www.notion.so/Support-RenkuLab-compute-access-from-local-terminal-SSH-VSCode-f896d3b391c94bcc87c56e375eb531d6)
+which provides the motivation for introducing this functionality.
 
 ## Problem Definition
 
@@ -35,25 +39,40 @@ via ssh.
 
 The standard ssh (keypair-based) workflow is as follows:
 - create keypair
-- ensure public key is stored on server to be accessed and ssh daemon is running
+- ensure public key is stored on server to be accessed and ssh daemon is running there
 - log in to session for the first time
 - confirm validity of remote server host key
 - access remote session (if keypair matches)
 
-From a user perspective, the following needs to be taken into account:
-- user may want to use an existing keypair which is stored in `$HOME/.ssh`
-- user may have existing keypairs but wants to have a new keypair for this scenario
-- user may have a keypair which is accessible via the `ssh-agent`
-- user may want to use PuTTY? (I've no idea how this intersects with VS Code)
-- user should be able to log into the session with a simple `ssh` command
-  - a standard `ssh <username>@<session>` should enable the user to log in
-- user should obtain a sensible shell configuration when they log in
-  - (not sure if something like devcontainer is required here)
+From a user perspective, there are multiple aspects which must be considered:
+- key setup
+  - user may want to use an existing keypair which is stored in `$HOME/.ssh`
+  - user may have existing keypairs but wants to have a new keypair for this scenario
+    - having this generated by renku is probably the best option but the user
+      prob wants to be able to control the naming of the file on the local
+      machine
+  - user may have a password protected key which is accessible via the `ssh-agent`
+  - in all of the above cases, the public key must be present in or accessible to
+    renku 
+    - This may required `ForwardAgent` to be configured on the client side
+- host key verification
+  - if user attempts to log into the server with ssh and it is not a recognized server,
+    the user should add the server to the known host keys
+  - (need to validate how this works with VS Code - it prob adds to the known hosts with some dialog box)
+- interaction modes
+  - primary use case is one in which the user connects via VS Code; this uses
+    `ssh` based tunneling mechanisms to support diverse interactions with the
+    remote server
+  - user should obtain a sensible shell configuration when they log in
+  - user should be able to log into the session with a simple `ssh` command
+    - a standard `ssh <username>@<session>` should enable the user to log in
+  - user may want to use PuTTY? (I've no idea how this intersects with VS Code)
 
 From a system perspective, the following needs to be taken into account:
 - Unlike HTTP/S, SSH does not have support for server identification and hence
-  it is not trivial to perform aggregation of SSH sessions; further, this
-  carries non-negligible security risks, so it is not a standard approach.
+  it is not trivial to perform aggregation (or disaggregation) of SSH sessions; 
+  further, this carries non-negligible security risks, so it is not a standard 
+  approach.
 - The Renku platform should never see user private keys
 - A solution which does not have many moving parts is definitely preferred
   (e.g. we should avoid publishing DNS entries dynamically if possible)
@@ -64,6 +83,33 @@ From a system perspective, the following needs to be taken into account:
 ***(It may be necessary to differentiate the above into mandatory and nice to
 have).***
 
+FIXME - integrate this content in a meaningful way...
+
+- Scenario 1: User is used to using ssh keys - bring you own key
+  - User imports ssh key as is standard in cloud services
+  - (Could even grab key from gitlab if it is there...)
+
+- Scenario 2: User is used to using ssh keys but wants renku to create new key
+  - User asks renku to generate key via cli and it gets stored in the .ssh folder
+  - User uses this key to log in to renku
+   
+- Scenario 3: User is not so familiar with keys; wants renku to generate keys via web interface
+  - (User does this because the key based approaches have less friction) 
+  - Private key generated by web UI can be downloaded once and is not persisted
+  - Public key is retained by renku
+  - (Will need to be able to deal with the case that user loses key - support creation of new key)
+  - (Will probably need the command line to perform some check to see if the configuration is correct)
+
+- Scenario 4: User is not familiar with terminal interaction at all; wants handheld VSCode option
+  - (Add support for OIDC authorization code flow in UI/CLI - code is generated once per login)
+  - Amount of friction depends on the longevity of the authorization code
+  - This will require more functionality on our side as it involves supporting a separate login modus
+  - Can also provide this as a step towards using ssh keys when the user
+    realizes that this approach has friction
+
+- Do not consider using ssh keys for acting on behalf of the user; tokens
+  should be used for that purpose.
+
 ## Key Assumptions
 
 The following key assumptions apply:
@@ -105,15 +151,15 @@ Services are then linked to these sessions with the ports on the services
 mapping to the ports exposed by the pod.
 
 In this solution, an ingress is mapped directly to the SSH services with a
-dedicated ingress for each SSH session. Each ingress should have a unique name 
-and also a unique IP address as the SSH protocol operates
-on an IP address basis (i.e., it has no specific server identification such as
-SNI which can enable a single server to handle requests for multiple endpoints). 
-This solution can suffer from DNS propagation delays, depending on the DNS
-provider and DNS configuration. Further, exposing ssh servers running inside
-user sessions directly to the Internet brings some security risks and it would
-need to be clear that users cannot easily change the ssh daemon configuration
-within their sessions to make their session very exposed.
+dedicated ingress for each SSH session. Each ingress should have a unique name
+and also a unique IP address as the SSH protocol operates on an IP address
+basis (i.e., it has no specific server identification such as SNI which can
+enable a single server to handle requests for multiple endpoints). This
+solution can suffer from DNS propagation delays, depending on the DNS provider
+and DNS configuration. Further, exposing ssh servers running inside user
+sessions directly to the Internet brings some security risks and it would need
+to be clear that users cannot easily change the ssh daemon configuration within
+their sessions to make their session very exposed.
 
 A couple of further points relating to this approach:
 - ingresses on cloud providers can typically be expensive; having one per
@@ -168,8 +214,61 @@ The primary downside is that it involves writing and maintaining another
 component; however initial work by Tasko indicates that this can be done with
 a modest amount of coding effort.
 
+### Using OAuth2/OIDC
+
+After presenting these initial ideas to the team, there were some questions
+regarding whether use of OIDC can reduce friction in this process, making
+for a smoother user experience. OIDC could obviate the need for keys/passwords
+resulting in a simpler user experience.
+
+SSH has support for such mechanisms, providing (a) a means to present
+information to the user about how the login process, eg providing a web link
+which can be clicked and (b) a mechanism by which another service can be used
+to validate login credentials.
+
+OIDC supports two modes which should be considered in this context:
+- Authorization Code Flow
+- Device Code Flow
+
+The latter is intended for long-lived authentication of a device, such that the
+device can act on behalf of the user - Smart TVs are a typical example. It is
+common that the tokens issued via the Device Code Flow would no expiry. For
+this reasons and for the mechanisms involved, it was not considered further in
+this analysis.
+
+The Authorization Code Flow is intended for use cases such as this, where some
+authentication provider manages user credentials and a token is generated which
+can be used to act on behalf of the user. In this case, the token is generated
+out of band via the web browser or perhaps the renku cli and this is entered
+via SSH. The SSH server then validates the generated token. More specifically,
+the token can be generated via keycloak and in the validation process, the SSH
+server requests keycloak to validate the token provided.
+
 ## Proposed Solution
 
+- Functionalities required:
+  - API:
+    - CRUD operations on public keys for logged in user
+    - Session operations?
+  - CLI
+    - Push public key for logged in user (check if it is valid public key)
+    - Create keypair locally, put key in appropriate folder and push public key
+    - Perform some validation of key setup without needing to launch a user session
+      - (probably needs to have an endpoint which is always available in the absence of a session) 
+    - For OIDC modus:
+      - Support generation of token for logged in user
+      - Support combined token generation and login via pty?
+      - Support login with token generated via web browser
+  - UI
+    - Import public key for logged in user (check if it is valid public key)
+    - Create keypair locally, support download of private key and give user
+      instructions on what to do with this
+    - For OIDC modus:
+      - Support generation of token in browser
+  - SSH Proxy
+    - Develop new proxy which supports public key login
+    - Need to have clarity on the solution for mapping login names to user sessions
+
 ## Drawbacks
 
 > Why should we *not* do this? Please consider the impact on users,
@@ -177,6 +276,7 @@ on the integration of this change with other existing and planned features etc.
 
 > There are tradeoffs to choosing any path, please attempt to identify them here.
 
+
 ## Rationale and Alternatives
 
 > Why is this design the best in the space of possible designs?
@@ -194,3 +294,7 @@ Two other designs have been considered:
 > What parts of the design do you expect to resolve through the implementation of this feature before stabilisation?
 
 > What related issues do you consider out of scope for this RFC that could be addressed in the future independently of the solution that comes out of this RFC?
+
+## Extras
+
+

From 9c268a1addebeb52b4339b484ddc4ca21e9c51e3 Mon Sep 17 00:00:00 2001
From: Sean Murphy <sean.murphy@sdsc.ethz.ch>
Date: Mon, 20 Mar 2023 16:04:52 +0100
Subject: [PATCH 04/13] WIP(ssh sessions): added more content on approach to
 delivering ssh sessions in renku

---
 .../011-add-session-ssh-support.md            | 148 +++++++++++-------
 1 file changed, 88 insertions(+), 60 deletions(-)

diff --git a/rfcs/011-add-session-ssh-support/011-add-session-ssh-support.md b/rfcs/011-add-session-ssh-support/011-add-session-ssh-support.md
index b63e96f..bd5d220 100644
--- a/rfcs/011-add-session-ssh-support/011-add-session-ssh-support.md
+++ b/rfcs/011-add-session-ssh-support/011-add-session-ssh-support.md
@@ -8,7 +8,7 @@
 > One paragraph explanation of the change.
 
 An important requirement which must be met by Renku is to enable users to
-log in to their Renku session via SSH; a key driver for this is to support
+log in to their Renku session via ssh; a key driver for this is to support
 working with VS Code within Renku sessions.
 
 This has been discussed within the team and work is ongoing on an interim
@@ -16,9 +16,10 @@ solution which meets the basic requirements. This RFC provides the context,
 supports discussion of possible solutions and will be used to document the
 final agreed solution.
 
-Note that this RFC must (somehow) take into account decisions which are
-being made in the current push to deliver a workable solution as soon as
-possible.
+Note that this RFC must take into account the interim solution used to provide
+ssh access as well as any experience gained including (a) user experience with
+the current solution, (b) experience developing this solution and (c) experience
+operating and managing this solution.
 
 ## Motivation
 
@@ -46,44 +47,48 @@ The standard ssh (keypair-based) workflow is as follows:
 
 From a user perspective, there are multiple aspects which must be considered:
 - key setup
-  - user may want to use an existing keypair which is stored in `$HOME/.ssh`
-  - user may have existing keypairs but wants to have a new keypair for this scenario
+  - user may want to use an existing keypair which is stored in `$HOME/.ssh` on
+    their local machine
+  - user may have existing keypairs but wants to have a new keypair for this
+    scenario
     - having this generated by renku is probably the best option but the user
-      prob wants to be able to control the naming of the file on the local
+      may want to be able to control the naming of the file on the local
       machine
   - user may have a password protected key which is accessible via the `ssh-agent`
-  - in all of the above cases, the public key must be present in or accessible to
-    renku 
-    - This may required `ForwardAgent` to be configured on the client side
+    - this may require `ForwardAgent` to be configured on the client side
 - host key verification
-  - if user attempts to log into the server with ssh and it is not a recognized server,
-    the user should add the server to the known host keys
-  - (need to validate how this works with VS Code - it prob adds to the known hosts with some dialog box)
+  - if user attempts to log into the server with ssh and it is not a recognized
+    server, the user should add the server to the known host keys
+  - (need to validate how this works with VS Code - it prob adds to the known
+    hosts with some dialog box)
 - interaction modes
   - primary use case is one in which the user connects via VS Code; this uses
     `ssh` based tunneling mechanisms to support diverse interactions with the
-    remote server
-  - user should obtain a sensible shell configuration when they log in
+    remote server including access to remote files, a remote terminal and
+    interaction via eg a Jupyter Notebook
+  - user should obtain a sensible shell configuration when a terminal is
+    created; for ssh-agent forwarding, this likely involves configuring the
+    `SSH_AUTH_SOCK`
   - user should be able to log into the session with a simple `ssh` command
     - a standard `ssh <username>@<session>` should enable the user to log in
-  - user may want to use PuTTY? (I've no idea how this intersects with VS Code)
 
 From a system perspective, the following needs to be taken into account:
-- Unlike HTTP/S, SSH does not have support for server identification and hence
-  it is not trivial to perform aggregation (or disaggregation) of SSH sessions; 
-  further, this carries non-negligible security risks, so it is not a standard 
-  approach.
+- Unlike HTTP/S, ssh does not have support for server identification and hence
+  it is not trivial to perform aggregation (or disaggregation) of ssh sessions,
+  ie, it is not so straightforward to have a single ssh server to which users
+  connect and it will proxy/redirect ssh sessions to the appropriate renku user
+  sessions; further, as this carries non-negligible security risks in general,
+  this approach is not widely used and there are few off the shelf software
+  components designed around such an architecture
 - The Renku platform should never see user private keys
 - A solution which does not have many moving parts is definitely preferred
   (e.g. we should avoid publishing DNS entries dynamically if possible)
 - Solutions which consume large amounts of IP addresses should be avoided as
   they can be costly
-- A solution which supports monitoring in a convenient manner is desirable
+- A solution which supports convenient monitoring using the existing Renku
+  monitoring approaches is desirable
 
-***(It may be necessary to differentiate the above into mandatory and nice to
-have).***
-
-FIXME - integrate this content in a meaningful way...
+**FIXME - integrate this content in a meaningful way...**
 
 - Scenario 1: User is used to using ssh keys - bring you own key
   - User imports ssh key as is standard in cloud services
@@ -114,8 +119,10 @@ FIXME - integrate this content in a meaningful way...
 
 The following key assumptions apply:
 - a solution is required which can be deployed both in the cloud provider
-  context and in the Switch/Openstack context
-- we are not targetting the most basic users; we assume they understand the
+  context and in the Switch/Openstack context - assumptions relating to
+  ingress configuration and getting access to ssh services must be clear for
+  both contexts and map to the design of the different platforms
+- we are not targeting the most basic users; we assume they understand the
   need for some security and are willing to put the time/energy required into a
   reasonably sensible ssh configuration
 - password authentication to ssh sessions exposed to the Internet is not
@@ -141,20 +148,19 @@ issues with different OS versions, would not be easy to configure and
 ultimately would likely have a detrimental impact on user experience and hence
 are not considered further.
 
-### Exposing SSH port from each session directly
+### Exposing ssh port from each session directly
 
 In this approach, each user session exposes at least 2 ports: an HTTP port
-for the Jupyter lab server and an SSH port for the SSH session. These ports
+for the Jupyter lab server and an ssh port for the ssh session. These ports
 have corresponding services running inside the pod.
 
 Services are then linked to these sessions with the ports on the services
 mapping to the ports exposed by the pod.
 
-In this solution, an ingress is mapped directly to the SSH services with a
-dedicated ingress for each SSH session. Each ingress should have a unique name
-and also a unique IP address as the SSH protocol operates on an IP address
-basis (i.e., it has no specific server identification such as SNI which can
-enable a single server to handle requests for multiple endpoints). This
+In this solution, an ingress is mapped directly to the ssh services with a
+dedicated ingress for each ssh session. Each ingress should have a unique name
+and also a unique IP address as the ssh protocol operates on an IP address
+basis (i.e., it has no specific server identification as noted above). This
 solution can suffer from DNS propagation delays, depending on the DNS provider
 and DNS configuration. Further, exposing ssh servers running inside user
 sessions directly to the Internet brings some security risks and it would need
@@ -175,11 +181,11 @@ this case, a jumphost must be installed inside the kubernetes cluster as it
 requires direct access to the ssh server running on the pods.
 
 This requires modifications to the cluster ingress configuration to forward TCP 
-traffic incident on a specific port to the jumphost pod on its SSH port; this
-pod must have its sshd configured to support proxying. Also, as ssh proxying
+traffic incident on a specific port to the jumphost pod on its ssh port; this
+pod must have its sshd configured to support proxying. Also, ssh proxying
 approaches require authentication against both the proxy and the destination
-server, an approach in which some combination of either no credentials and
-a keypair for which the user possesses the private key must be performed.
+server; this necessitates an approach in which some combination of no credentials and
+a keypair must be deployed to the proxy and/or user session.
 
 The most natural solution in this case is to use the user's public key within
 the session and have open access with no ability to obtain a shell within the
@@ -188,7 +194,7 @@ jumphost.
 In this case, the command to ssh into a session requires specifying both the
 jumphost and the destination; as such it is a more complex command and introduces
 some user friction when accessing sessions. VSCode supports such proxying
-scenarios without problem.
+scenarios without any problems.
 
 ### Using a Man In the Middle Proxy solution
 
@@ -198,16 +204,16 @@ approach above, this must run on the kubernetes cluster.
 
 Unlike the above approach, this requires writing and maintaining a software
 component. The key difference is that this proxy does not use the proxying
-capabilities provided by an SSH client in which one SSH session is embedded
-inside another; effectively this approach concatenates two distinct SSH
+capabilities provided by an ssh client (in which one ssh session is embedded
+inside another); effectively this approach concatenates two distinct ssh
 sessions with two distinct authentication flows. The proxy terminates one
-SSH session and forwards all traffic to another SSH session.
+ssh session and forwards all traffic to another ssh session.
 
 This solution has the following benefits:
 - it is possible to log in via a simple ssh command which looks much cleaner/more
 professional;
 - there is a single entry point which reduces the attack surface;
-- it is reasonably straightforward to add instrumentation such that SSH session
+- it is reasonably straightforward to add instrumentation such that ssh session
 information data can be easily collected.
 
 The primary downside is that it involves writing and maintaining another
@@ -216,12 +222,12 @@ a modest amount of coding effort.
 
 ### Using OAuth2/OIDC
 
-After presenting these initial ideas to the team, there were some questions
+After presenting initial ideas above to the team, there were some questions
 regarding whether use of OIDC can reduce friction in this process, making
 for a smoother user experience. OIDC could obviate the need for keys/passwords
 resulting in a simpler user experience.
 
-SSH has support for such mechanisms, providing (a) a means to present
+ssh has support for such mechanisms, providing (a) a means to present
 information to the user about how the login process, eg providing a web link
 which can be clicked and (b) a mechanism by which another service can be used
 to validate login credentials.
@@ -231,21 +237,44 @@ OIDC supports two modes which should be considered in this context:
 - Device Code Flow
 
 The latter is intended for long-lived authentication of a device, such that the
-device can act on behalf of the user - Smart TVs are a typical example. It is
-common that the tokens issued via the Device Code Flow would no expiry. For
-this reasons and for the mechanisms involved, it was not considered further in
-this analysis.
+device can act on behalf of the user - Smart TVs are a typical example. In this
+approach, a device obtains a token from the authentication service; an authentication
+flow takes place *on a different device* which uses this device specific token.
+Once that authentication flow completes, the device can act on behalf of the user.
+In this case, we don't have a specific, fixed device which should be acting on
+behalf of the user; for this reason the Device Code flow is not considered
+further.
 
 The Authorization Code Flow is intended for use cases such as this, where some
 authentication provider manages user credentials and a token is generated which
-can be used to act on behalf of the user. In this case, the token is generated
-out of band via the web browser or perhaps the renku cli and this is entered
-via SSH. The SSH server then validates the generated token. More specifically,
-the token can be generated via keycloak and in the validation process, the SSH
-server requests keycloak to validate the token provided.
+can be used to act on behalf of the user outside the browser. In this case, the
+token is generated out of band via the web browser or perhaps the renku cli and
+this is entered via ssh. The ssh server then validates the generated token.
+More specifically, the token can be generated via keycloak and in the
+validation process, the ssh server requests keycloak to validate the token
+provided.
+
+## Comparison of different approaches
+
+A table showing how the different approaches compare is shown below:
+
+|                                                | Ssh port exposed by session directly | Proxy jumphost    | MITM Proxy         | OAuth2/OIDC       |
+|------------------------------------------------|--------------------------------------|-------------------|--------------------|-------------------|
+| - Efficient IP address usage                   | :x                                   | :heavy_check_mark | i:heavy_check_mark | :heavy_check_mark |
+| - Login controlled by Renku                    | :x                                   | :x                | :heavy_check_mark  | :heavy_check_mark |
+| - Flexible monitoring support                  | :x                                   | :x                | :heavy_check_mark  | :x                |
+| - Key management responsibilities with users   | :x                                   | :heavy_check_mark | :heavy_check_mark  | :x                |
+| - Increased complexity of components to manage | :x                                   | :x                | :x                 | :heavy_check_mark |
+| - Control over session to ssh name             | :x                                   | :heavy_check_mark | :heavy_check_mark  | :heavy_check_mark |
 
 ## Proposed Solution
 
+The proposed solution is based on the MITM Proxy described above. At an
+architectural level, this involves the addition of a component to the Renku
+platform which will terminate ssh sessions and route them to the appropriate
+user session. It will necessitate a solution for mapping between username, 
+keys used for login and the user session.
+
 - Functionalities required:
   - API:
     - CRUD operations on public keys for logged in user
@@ -265,7 +294,7 @@ server requests keycloak to validate the token provided.
       instructions on what to do with this
     - For OIDC modus:
       - Support generation of token in browser
-  - SSH Proxy
+  - ssh Proxy
     - Develop new proxy which supports public key login
     - Need to have clarity on the solution for mapping login names to user sessions
 
@@ -276,6 +305,8 @@ on the integration of this change with other existing and planned features etc.
 
 > There are tradeoffs to choosing any path, please attempt to identify them here.
 
+One reason to not do this is that there is currently an interim ssh solution in
+place which may be adequate/sufficient; this may become a permanent solution.
 
 ## Rationale and Alternatives
 
@@ -283,10 +314,10 @@ on the integration of this change with other existing and planned features etc.
 
 > What other designs have been considered and what is the rationale for not choosing them?
 
-Two other designs have been considered:
-
 > What is the impact of not doing this?
 
+See comparison of approaches above.
+
 ## Unresolved questions
 
 > What parts of the design do you expect to resolve through the RFC process before this gets merged?
@@ -295,6 +326,3 @@ Two other designs have been considered:
 
 > What related issues do you consider out of scope for this RFC that could be addressed in the future independently of the solution that comes out of this RFC?
 
-## Extras
-
-

From d753d51e9fc15492192b0f6061b19e4e2d83f240 Mon Sep 17 00:00:00 2001
From: Sean Murphy <sean.murphy@sdsc.ethz.ch>
Date: Mon, 20 Mar 2023 16:08:10 +0100
Subject: [PATCH 05/13] WIP(ssh sessions): fix github markdown crosses

---
 .../011-add-session-ssh-support.md                   | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/rfcs/011-add-session-ssh-support/011-add-session-ssh-support.md b/rfcs/011-add-session-ssh-support/011-add-session-ssh-support.md
index bd5d220..c3f2553 100644
--- a/rfcs/011-add-session-ssh-support/011-add-session-ssh-support.md
+++ b/rfcs/011-add-session-ssh-support/011-add-session-ssh-support.md
@@ -260,12 +260,12 @@ A table showing how the different approaches compare is shown below:
 
 |                                                | Ssh port exposed by session directly | Proxy jumphost    | MITM Proxy         | OAuth2/OIDC       |
 |------------------------------------------------|--------------------------------------|-------------------|--------------------|-------------------|
-| - Efficient IP address usage                   | :x                                   | :heavy_check_mark | i:heavy_check_mark | :heavy_check_mark |
-| - Login controlled by Renku                    | :x                                   | :x                | :heavy_check_mark  | :heavy_check_mark |
-| - Flexible monitoring support                  | :x                                   | :x                | :heavy_check_mark  | :x                |
-| - Key management responsibilities with users   | :x                                   | :heavy_check_mark | :heavy_check_mark  | :x                |
-| - Increased complexity of components to manage | :x                                   | :x                | :x                 | :heavy_check_mark |
-| - Control over session to ssh name             | :x                                   | :heavy_check_mark | :heavy_check_mark  | :heavy_check_mark |
+| Efficient IP address usage                   | :x:                                   | :heavy_check_mark | i:heavy_check_mark | :heavy_check_mark |
+| Login controlled by Renku                    | :x:                                   | :x:                | :heavy_check_mark  | :heavy_check_mark |
+| Flexible monitoring support                  | :x:                                   | :x:                | :heavy_check_mark  | :x:                |
+| Key management responsibilities with users   | :x:                                   | :heavy_check_mark | :heavy_check_mark  | :x:                |
+| Increased complexity of components to manage | :x:                                   | :x:                | :x:                 | :heavy_check_mark |
+| Control over session to ssh name             | :x:                                   | :heavy_check_mark | :heavy_check_mark  | :heavy_check_mark |
 
 ## Proposed Solution
 

From ff43a9b03f7cabad1e315fea4d3da6bc8c2d4b03 Mon Sep 17 00:00:00 2001
From: Sean Murphy <sean.murphy@sdsc.ethz.ch>
Date: Mon, 20 Mar 2023 16:09:05 +0100
Subject: [PATCH 06/13] WIP(ssh sessions): fix checkmarks

---
 .../011-add-session-ssh-support.md                   | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/rfcs/011-add-session-ssh-support/011-add-session-ssh-support.md b/rfcs/011-add-session-ssh-support/011-add-session-ssh-support.md
index c3f2553..463b74c 100644
--- a/rfcs/011-add-session-ssh-support/011-add-session-ssh-support.md
+++ b/rfcs/011-add-session-ssh-support/011-add-session-ssh-support.md
@@ -260,12 +260,12 @@ A table showing how the different approaches compare is shown below:
 
 |                                                | Ssh port exposed by session directly | Proxy jumphost    | MITM Proxy         | OAuth2/OIDC       |
 |------------------------------------------------|--------------------------------------|-------------------|--------------------|-------------------|
-| Efficient IP address usage                   | :x:                                   | :heavy_check_mark | i:heavy_check_mark | :heavy_check_mark |
-| Login controlled by Renku                    | :x:                                   | :x:                | :heavy_check_mark  | :heavy_check_mark |
-| Flexible monitoring support                  | :x:                                   | :x:                | :heavy_check_mark  | :x:                |
-| Key management responsibilities with users   | :x:                                   | :heavy_check_mark | :heavy_check_mark  | :x:                |
-| Increased complexity of components to manage | :x:                                   | :x:                | :x:                 | :heavy_check_mark |
-| Control over session to ssh name             | :x:                                   | :heavy_check_mark | :heavy_check_mark  | :heavy_check_mark |
+| Efficient IP address usage                   | :x:                                   | :heavy_check_mark: | i:heavy_check_mark: | :heavy_check_mark: |
+| Login controlled by Renku                    | :x:                                   | :x:                | :heavy_check_mark:  | :heavy_check_mark: |
+| Flexible monitoring support                  | :x:                                   | :x:                | :heavy_check_mark:  | :x:                |
+| Key management responsibilities with users   | :x:                                   | :heavy_check_mark: | :heavy_check_mark:  | :x:                |
+| Increased complexity of components to manage | :x:                                   | :x:                | :x:                 | :heavy_check_mark: |
+| Control over session to ssh name             | :x:                                   | :heavy_check_mark: | :heavy_check_mark:  | :heavy_check_mark: |
 
 ## Proposed Solution
 

From 702b595b4ee1f79e082ea7cead33101ea6a6c633 Mon Sep 17 00:00:00 2001
From: Sean Murphy <sean.murphy@sdsc.ethz.ch>
Date: Mon, 20 Mar 2023 17:22:51 +0100
Subject: [PATCH 07/13] WIP(ssh sessions): more content on ssh sessions

---
 .../011-add-session-ssh-support.md            | 115 +++++++++---------
 1 file changed, 56 insertions(+), 59 deletions(-)

diff --git a/rfcs/011-add-session-ssh-support/011-add-session-ssh-support.md b/rfcs/011-add-session-ssh-support/011-add-session-ssh-support.md
index 463b74c..c0d5b74 100644
--- a/rfcs/011-add-session-ssh-support/011-add-session-ssh-support.md
+++ b/rfcs/011-add-session-ssh-support/011-add-session-ssh-support.md
@@ -38,14 +38,7 @@ include any specific constraints that arise. It can be as technical as required.
 The basic problem is to provide low friction, secure access to user sessions
 via ssh. 
 
-The standard ssh (keypair-based) workflow is as follows:
-- create keypair
-- ensure public key is stored on server to be accessed and ssh daemon is running there
-- log in to session for the first time
-- confirm validity of remote server host key
-- access remote session (if keypair matches)
-
-From a user perspective, there are multiple aspects which must be considered:
+From a user perspective, the following needs to be considered:
 - key setup
   - user may want to use an existing keypair which is stored in `$HOME/.ssh` on
     their local machine
@@ -56,23 +49,26 @@ From a user perspective, there are multiple aspects which must be considered:
       machine
   - user may have a password protected key which is accessible via the `ssh-agent`
     - this may require `ForwardAgent` to be configured on the client side
+  - user's key may already be in use within gitlab/github
 - host key verification
   - if user attempts to log into the server with ssh and it is not a recognized
     server, the user should add the server to the known host keys
-  - (need to validate how this works with VS Code - it prob adds to the known
-    hosts with some dialog box)
+    - we should not have too much friction in this process; ie if new host keys
+      are generated for each user session, the user will be asked each time to
+      confirm that the host key is valid
+  - for vscode host key verification is less intrusive
 - interaction modes
   - primary use case is one in which the user connects via VS Code; this uses
     `ssh` based tunneling mechanisms to support diverse interactions with the
     remote server including access to remote files, a remote terminal and
-    interaction via eg a Jupyter Notebook
+    interaction with eg a Jupyter Notebook
   - user should obtain a sensible shell configuration when a terminal is
-    created; for ssh-agent forwarding, this likely involves configuring the
-    `SSH_AUTH_SOCK`
+    created; for ssh-agent forwarding, this likely involves ensuring that the
+    `SSH_AUTH_SOCK` is configured
   - user should be able to log into the session with a simple `ssh` command
     - a standard `ssh <username>@<session>` should enable the user to log in
 
-From a system perspective, the following needs to be taken into account:
+From a system perspective, the following needs to be considered:
 - Unlike HTTP/S, ssh does not have support for server identification and hence
   it is not trivial to perform aggregation (or disaggregation) of ssh sessions,
   ie, it is not so straightforward to have a single ssh server to which users
@@ -88,32 +84,24 @@ From a system perspective, the following needs to be taken into account:
 - A solution which supports convenient monitoring using the existing Renku
   monitoring approaches is desirable
 
-**FIXME - integrate this content in a meaningful way...**
-
-- Scenario 1: User is used to using ssh keys - bring you own key
-  - User imports ssh key as is standard in cloud services
-  - (Could even grab key from gitlab if it is there...)
-
+We consider the following different user scenarios:
+- Scenario 1: User is comfortable using ssh keys - bring you own key
+  - User imports ssh key to Renku as is common with cloud services
 - Scenario 2: User is used to using ssh keys but wants renku to create new key
   - User asks renku to generate key via cli and it gets stored in the .ssh folder
   - User uses this key to log in to renku
-   
 - Scenario 3: User is not so familiar with keys; wants renku to generate keys via web interface
-  - (User does this because the key based approaches have less friction) 
-  - Private key generated by web UI can be downloaded once and is not persisted
+  - Private key generated by web UI can be downloaded once and is not
+    persisted; user is provided information on what to do with the private key
   - Public key is retained by renku
-  - (Will need to be able to deal with the case that user loses key - support creation of new key)
-  - (Will probably need the command line to perform some check to see if the configuration is correct)
-
-- Scenario 4: User is not familiar with terminal interaction at all; wants handheld VSCode option
-  - (Add support for OIDC authorization code flow in UI/CLI - code is generated once per login)
-  - Amount of friction depends on the longevity of the authorization code
-  - This will require more functionality on our side as it involves supporting a separate login modus
-  - Can also provide this as a step towards using ssh keys when the user
-    realizes that this approach has friction
-
-- Do not consider using ssh keys for acting on behalf of the user; tokens
-  should be used for that purpose.
+  - (Will need to be able to deal with the case that user loses key - support
+    creation of new key)
+  - (Will probably need the command line to perform some check to see if the
+    configuration is correct)
+- Scenario 4: User is not familiar with terminal interaction at all; wants
+  handheld VSCode option
+  - It is not clear if VSCode can work with OAuth/OIDC mechanisms in a low
+    friction manner
 
 ## Key Assumptions
 
@@ -131,14 +119,18 @@ The following key assumptions apply:
 ## Possible solutions
 
 As this work has some kind of history, three solutions have been discussed in
-various conversations - these are:
+various conversations - those are the first three options below; a fourth
+approach arose when presenting these ideas to the team and that is the last
+item in the list below:
 - an approach in which an ssh port is exposed on each session directly to the
-Internet;
+  Internet;
 - an approach in which a dedicated jumphost is used and the standard ssh
-proxying mechanisms are used to access the session via the jumphost;
+  proxying mechanisms are used to access the session via the jumphost;
 - an approach in which a dedicated proxy is provided as part of the Renku
-platform which provides ssh access without needing to use the ssh client
-proxy capabilities.
+  platform which provides ssh access without needing to use the ssh client
+  proxy capabilities.
+- an approach which uses OAuth/OIDC to enable login without using either keys
+  or long-term passwords
 Each of these approaches is discussed in more detail below.
 
 Other solutions could be envisaged: it could be possible to provide a tunnel
@@ -150,22 +142,22 @@ are not considered further.
 
 ### Exposing ssh port from each session directly
 
-In this approach, each user session exposes at least 2 ports: an HTTP port
+In this approach, each user session pod exposes at least 2 ports: an HTTP port
 for the Jupyter lab server and an ssh port for the ssh session. These ports
 have corresponding services running inside the pod.
 
-Services are then linked to these sessions with the ports on the services
-mapping to the ports exposed by the pod.
+Kubernetes services are then linked to these sessions with the ports on the
+services mapping to the ports exposed by the pod. An ingress is then mapped to
+the ssh services with a dedicated ingress for each ssh session.
 
-In this solution, an ingress is mapped directly to the ssh services with a
-dedicated ingress for each ssh session. Each ingress should have a unique name
-and also a unique IP address as the ssh protocol operates on an IP address
-basis (i.e., it has no specific server identification as noted above). This
-solution can suffer from DNS propagation delays, depending on the DNS provider
-and DNS configuration. Further, exposing ssh servers running inside user
-sessions directly to the Internet brings some security risks and it would need
-to be clear that users cannot easily change the ssh daemon configuration within
-their sessions to make their session very exposed.
+Each ingress should have a unique name and also a unique IP address as the ssh
+protocol operates on an IP address basis (i.e., it has no specific server
+identification as noted above). This solution can suffer from DNS propagation
+delays, depending on the DNS provider and DNS configuration. Further, exposing
+ssh servers running inside user sessions directly to the Internet brings some
+security risks and it would need to be clear that users cannot easily change
+the ssh daemon configuration within their sessions to make their session very
+exposed.
 
 A couple of further points relating to this approach:
 - ingresses on cloud providers can typically be expensive; having one per
@@ -189,14 +181,16 @@ a keypair must be deployed to the proxy and/or user session.
 
 The most natural solution in this case is to use the user's public key within
 the session and have open access with no ability to obtain a shell within the
-jumphost.
+jumphost. Key management then becomes a user responsibility, with user's having
+to do this within their renku projects; as such keys are project entities,
+rather than user entities.
 
 In this case, the command to ssh into a session requires specifying both the
-jumphost and the destination; as such it is a more complex command and introduces
+jumphost and the destination; hence it is a more complex command and introduces
 some user friction when accessing sessions. VSCode supports such proxying
 scenarios without any problems.
 
-### Using a Man In the Middle Proxy solution
+### Using a Man In the Middle (MITM) Proxy solution
 
 The third approach is the one followed by gitpod; in this approach a dedicated
 proxy is provided which affords access to the sessions. As with the proxying
@@ -210,11 +204,11 @@ sessions with two distinct authentication flows. The proxy terminates one
 ssh session and forwards all traffic to another ssh session.
 
 This solution has the following benefits:
-- it is possible to log in via a simple ssh command which looks much cleaner/more
-professional;
+- it is possible to log in via a simple ssh command which looks much
+  cleaner/more professional;
 - there is a single entry point which reduces the attack surface;
 - it is reasonably straightforward to add instrumentation such that ssh session
-information data can be easily collected.
+  information data can be easily collected.
 
 The primary downside is that it involves writing and maintaining another
 component; however initial work by Tasko indicates that this can be done with
@@ -254,14 +248,17 @@ More specifically, the token can be generated via keycloak and in the
 validation process, the ssh server requests keycloak to validate the token
 provided.
 
+Use of these mechanisms is not recommended within VSCode.
+
 ## Comparison of different approaches
 
 A table showing how the different approaches compare is shown below:
 
 |                                                | Ssh port exposed by session directly | Proxy jumphost    | MITM Proxy         | OAuth2/OIDC       |
 |------------------------------------------------|--------------------------------------|-------------------|--------------------|-------------------|
-| Efficient IP address usage                   | :x:                                   | :heavy_check_mark: | i:heavy_check_mark: | :heavy_check_mark: |
+| Efficient IP address usage                   | :x:                                   | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |
 | Login controlled by Renku                    | :x:                                   | :x:                | :heavy_check_mark:  | :heavy_check_mark: |
+| Keys linked to user rather than project      | :x:                                   | :x:                | :heavy_check_mark:  | :heavy_check_mark: |
 | Flexible monitoring support                  | :x:                                   | :x:                | :heavy_check_mark:  | :x:                |
 | Key management responsibilities with users   | :x:                                   | :heavy_check_mark: | :heavy_check_mark:  | :x:                |
 | Increased complexity of components to manage | :x:                                   | :x:                | :x:                 | :heavy_check_mark: |

From 692a7ab17d07c80e277d3f4bec231e23cb353c71 Mon Sep 17 00:00:00 2001
From: Sean Murphy <sean.murphy@sdsc.ethz.ch>
Date: Wed, 22 Mar 2023 15:03:36 +0100
Subject: [PATCH 08/13] WIP(ssh-sessions): updates to ssh session content

---
 .../011-add-session-ssh-support.md            | 341 +++++++++++-------
 1 file changed, 219 insertions(+), 122 deletions(-)

diff --git a/rfcs/011-add-session-ssh-support/011-add-session-ssh-support.md b/rfcs/011-add-session-ssh-support/011-add-session-ssh-support.md
index c0d5b74..60735fe 100644
--- a/rfcs/011-add-session-ssh-support/011-add-session-ssh-support.md
+++ b/rfcs/011-add-session-ssh-support/011-add-session-ssh-support.md
@@ -11,15 +11,10 @@ An important requirement which must be met by Renku is to enable users to
 log in to their Renku session via ssh; a key driver for this is to support
 working with VS Code within Renku sessions.
 
-This has been discussed within the team and work is ongoing on an interim
-solution which meets the basic requirements. This RFC provides the context,
-supports discussion of possible solutions and will be used to document the
-final agreed solution.
-
-Note that this RFC must take into account the interim solution used to provide
-ssh access as well as any experience gained including (a) user experience with
-the current solution, (b) experience developing this solution and (c) experience
-operating and managing this solution.
+This has been discussed within the team and an interim solution has been
+developed which is currently in operation. That solution, however, has some
+limitations; this RFC provides context, documents possible solutions and will
+be used to justify the final agreed solution.
 
 ## Motivation
 
@@ -38,78 +33,89 @@ include any specific constraints that arise. It can be as technical as required.
 The basic problem is to provide low friction, secure access to user sessions
 via ssh. 
 
+Although non key based mechansisms are possible and are discussed to some extent
+below, VS Code primarily uses key based mechansims; they are generally the most
+widely used with ssh and are known to be trusted and secure - for this reason,
+there is a strong focus on key-based mechanisms here.
+
 From a user perspective, the following needs to be considered:
 - key setup
   - user may want to use an existing keypair which is stored in `$HOME/.ssh` on
     their local machine
   - user may have existing keypairs but wants to have a new keypair for this
     scenario
-    - having this generated by renku is probably the best option but the user
+    - having this generated by Renku is probably the best option but the user
       may want to be able to control the naming of the file on the local
       machine
   - user may have a password protected key which is accessible via the `ssh-agent`
-    - this may require `ForwardAgent` to be configured on the client side
-  - user's key may already be in use within gitlab/github
+    - this may require the solution to support `ssh-agent` forwarding 
+  - user's key may already be in use within gitlab/github and can be eaily
+    obtained from those sources
 - host key verification
   - if user attempts to log into the server with ssh and it is not a recognized
     server, the user should add the server to the known host keys
     - we should not have too much friction in this process; ie if new host keys
-      are generated for each user session, the user will be asked each time to
-      confirm that the host key is valid
-  - for vscode host key verification is less intrusive
+      are generated for each user session, the user may be asked each time to
+      confirm that the host key is valid in some configurations and clearly this
+      could become an irritation
+  - for VS Code host key verification is less intrusive
 - interaction modes
-  - primary use case is one in which the user connects via VS Code; this uses
+  - the primary use case is one in which the user connects via VS Code; this uses
     `ssh` based tunneling mechanisms to support diverse interactions with the
     remote server including access to remote files, a remote terminal and
     interaction with eg a Jupyter Notebook
-  - user should obtain a sensible shell configuration when a terminal is
-    created; for ssh-agent forwarding, this likely involves ensuring that the
-    `SSH_AUTH_SOCK` is configured
-  - user should be able to log into the session with a simple `ssh` command
-    - a standard `ssh <username>@<session>` should enable the user to log in
-
-From a system perspective, the following needs to be considered:
-- Unlike HTTP/S, ssh does not have support for server identification and hence
+    - in this case the user should obtain a sensible shell configuration when a
+      terminal is opened (note that in VS Code, this may not be a login shell)
+  - direct `ssh` access is also possible/considered and in this case, the user
+    should also obtain a sensible shell configuration when logged in; for ssh-agent
+    forwarding, this may involve ensuring that the `SSH_AUTH_SOCK` is
+    configured
+    - user login should be quite intuitive with a clear username/hostname
+      pattern for logggin in; ssh config should also be configured analogously
+      with ssh login targets 
+
+From a system perspective, the following need to be considered:
+- unlike HTTP/S, ssh does not have support for server identification and hence
   it is not trivial to perform aggregation (or disaggregation) of ssh sessions,
   ie, it is not so straightforward to have a single ssh server to which users
-  connect and it will proxy/redirect ssh sessions to the appropriate renku user
+  connect which will proxy/redirect ssh sessions to the appropriate Renku user
   sessions; further, as this carries non-negligible security risks in general,
   this approach is not widely used and there are few off the shelf software
   components designed around such an architecture
-- The Renku platform should never see user private keys
-- A solution which does not have many moving parts is definitely preferred
+- the Renku platform should never see user private keys
+- a solution which does not have many moving parts is definitely preferred
   (e.g. we should avoid publishing DNS entries dynamically if possible)
-- Solutions which consume large amounts of IP addresses should be avoided as
+- solutions which consume large amounts of IP addresses should be avoided as
   they can be costly
-- A solution which supports convenient monitoring using the existing Renku
+- a solution which supports convenient monitoring using the existing Renku
   monitoring approaches is desirable
+- a solution in which keys are associated with users and are dynamically mapped
+  to Renku user sessions is most sensible
 
 We consider the following different user scenarios:
 - Scenario 1: User is comfortable using ssh keys - bring you own key
   - User imports ssh key to Renku as is common with cloud services
-- Scenario 2: User is used to using ssh keys but wants renku to create new key
-  - User asks renku to generate key via cli and it gets stored in the .ssh folder
-  - User uses this key to log in to renku
-- Scenario 3: User is not so familiar with keys; wants renku to generate keys via web interface
-  - Private key generated by web UI can be downloaded once and is not
-    persisted; user is provided information on what to do with the private key
-  - Public key is retained by renku
+- Scenario 2: User is used to using ssh keys but wants Renku to create new key
+  - User asks Renku to generate key via cli and it gets stored in the .ssh folder
+  - User uses this key to log in to Renku
+- Scenario 3: User is not so familiar with keys; wants Renku to generate keys via web interface
+  - Private key generated within browser, never uploaded to Renku and can be
+    downloaded once - it is not persisted anywhere persisted; user is provided
+    information on what to do with the private key
+  - Public key is retained by Renku
   - (Will need to be able to deal with the case that user loses key - support
     creation of new key)
   - (Will probably need the command line to perform some check to see if the
     configuration is correct)
-- Scenario 4: User is not familiar with terminal interaction at all; wants
-  handheld VSCode option
-  - It is not clear if VSCode can work with OAuth/OIDC mechanisms in a low
-    friction manner
+
+A solution is required which can be deployed both in the cloud provider context
+and in the Switch/Openstack context - assumptions relating to ingress
+configuration and getting access to ssh services must be clear for both
+contexts and map to the design of the different platforms
 
 ## Key Assumptions
 
 The following key assumptions apply:
-- a solution is required which can be deployed both in the cloud provider
-  context and in the Switch/Openstack context - assumptions relating to
-  ingress configuration and getting access to ssh services must be clear for
-  both contexts and map to the design of the different platforms
 - we are not targeting the most basic users; we assume they understand the
   need for some security and are willing to put the time/energy required into a
   reasonably sensible ssh configuration
@@ -131,12 +137,13 @@ item in the list below:
   proxy capabilities.
 - an approach which uses OAuth/OIDC to enable login without using either keys
   or long-term passwords
+
 Each of these approaches is discussed in more detail below.
 
-Other solutions could be envisaged: it could be possible to provide a tunnel
-over HTTP/S into the session and access it via ssh locally or perhaps some
-wireguard solution might work but these approaches would probably encounter
-issues with different OS versions, would not be easy to configure and
+Other solutions could be envisaged: it could be possible, for example, to
+provide a tunnel over HTTP/S into the session and access it via ssh locally or
+perhaps some wireguard solution might work but these approaches would probably
+encounter issues with different OS versions, would not be easy to configure and
 ultimately would likely have a detrimental impact on user experience and hence
 are not considered further.
 
@@ -168,21 +175,24 @@ A couple of further points relating to this approach:
 
 ### Adding a proxy jumphost
 
-Another solution is to have a jumphost and to use this to access the pods. In
-this case, a jumphost must be installed inside the kubernetes cluster as it 
-requires direct access to the ssh server running on the pods.
+Another solution and that which is currently deployed is to have a jumphost and
+to use this to access the pods. In this case, a jumphost must be installed
+inside the kubernetes cluster as it requires direct access to the ssh server
+running on the pods. The jumphost has a standard ssh server running with proxying
+enabled.
 
-This requires modifications to the cluster ingress configuration to forward TCP 
-traffic incident on a specific port to the jumphost pod on its ssh port; this
-pod must have its sshd configured to support proxying. Also, ssh proxying
-approaches require authentication against both the proxy and the destination
-server; this necessitates an approach in which some combination of no credentials and
-a keypair must be deployed to the proxy and/or user session.
+For the Switch/Openstack deployments, this requires modifications to the
+cluster ingress configuration to forward TCP traffic incident on a specific
+port to the jumphost pod on its ssh port. For cloud providers, a new ingress
+pointing at the ssh proxy is straightforward. ssh proxying approaches require
+authentication against both the proxy and the destination server; this
+necessitates an approach in which some combination of no credentials and a
+keypair must be deployed to the proxy and/or user session.
 
 The most natural solution in this case is to use the user's public key within
 the session and have open access with no ability to obtain a shell within the
-jumphost. Key management then becomes a user responsibility, with user's having
-to do this within their renku projects; as such keys are project entities,
+jumphost. Key management then becomes a user responsibility, with users having
+to do this within their Renku projects; as such, keys are project entities,
 rather than user entities.
 
 In this case, the command to ssh into a session requires specifying both the
@@ -196,29 +206,30 @@ The third approach is the one followed by gitpod; in this approach a dedicated
 proxy is provided which affords access to the sessions. As with the proxying
 approach above, this must run on the kubernetes cluster.
 
-Unlike the above approach, this requires writing and maintaining a software
-component. The key difference is that this proxy does not use the proxying
-capabilities provided by an ssh client (in which one ssh session is embedded
-inside another); effectively this approach concatenates two distinct ssh
-sessions with two distinct authentication flows. The proxy terminates one
-ssh session and forwards all traffic to another ssh session.
+Unlike the above approach, this requires writing and maintaining a new software
+component. The key difference is that this proxy does not use the standard ssh
+proxying capabilities provided by an ssh client (in which one ssh session is
+essentially embedded inside another); effectively this approach concatenates
+two distinct ssh sessions with two distinct authentication flows. The proxy
+terminates one ssh session, relaying all traffic to another ssh session.
 
 This solution has the following benefits:
-- it is possible to log in via a simple ssh command which looks much
-  cleaner/more professional;
-- there is a single entry point which reduces the attack surface;
+- keys can be clearly bound to users and separated from projects
 - it is reasonably straightforward to add instrumentation such that ssh session
-  information data can be easily collected.
+  information data can be easily collected with the current data collection
+  mechanisms.
+- there is greater control over the ssh command which users can use - the mapping
+  between username, keypair and Renku session is more configurable
 
 The primary downside is that it involves writing and maintaining another
 component; however initial work by Tasko indicates that this can be done with
-a modest amount of coding effort.
+a modest amount of effort.
 
 ### Using OAuth2/OIDC
 
 After presenting initial ideas above to the team, there were some questions
-regarding whether use of OIDC can reduce friction in this process, making
-for a smoother user experience. OIDC could obviate the need for keys/passwords
+regarding whether use of OAuth2/OIDC can reduce friction in this process, making
+for a smoother user experience. It could obviate the need for keys/passwords
 resulting in a simpler user experience.
 
 ssh has support for such mechanisms, providing (a) a means to present
@@ -226,43 +237,71 @@ information to the user about how the login process, eg providing a web link
 which can be clicked and (b) a mechanism by which another service can be used
 to validate login credentials.
 
-OIDC supports two modes which should be considered in this context:
-- Authorization Code Flow
+OAuth2/OIDC supports two modes which can be considered in this context:
 - Device Code Flow
+- Authorization Code Flow
 
-The latter is intended for long-lived authentication of a device, such that the
-device can act on behalf of the user - Smart TVs are a typical example. In this
-approach, a device obtains a token from the authentication service; an authentication
-flow takes place *on a different device* which uses this device specific token.
-Once that authentication flow completes, the device can act on behalf of the user.
-In this case, we don't have a specific, fixed device which should be acting on
-behalf of the user; for this reason the Device Code flow is not considered
-further.
-
-The Authorization Code Flow is intended for use cases such as this, where some
-authentication provider manages user credentials and a token is generated which
-can be used to act on behalf of the user outside the browser. In this case, the
-token is generated out of band via the web browser or perhaps the renku cli and
-this is entered via ssh. The ssh server then validates the generated token.
-More specifically, the token can be generated via keycloak and in the
-validation process, the ssh server requests keycloak to validate the token
-provided.
-
-Use of these mechanisms is not recommended within VSCode.
+The Device Code Flow is intended for long-lived authentication of a device,
+such that the device can act on behalf of the user - Smart TVs are a typical
+example. In this approach, a device obtains a token from the authentication
+service; an authentication flow takes place *on a different device* which uses
+this device specific token. Once that authentication flow completes, the device
+can act on behalf of the user. In the Renku case, there is no specific, fixed
+device which should be acting on behalf of the user; as such, this approach is
+not relevant here. For this reason, it is not considered further.
+
+The Authorization Code Flow is intended for use cases such as the ssh
+authentication, where some authentication provider manages user credentials and
+a token is generated which can be used to act on behalf of the user outside the
+browser. In this case, the token is generated out of band via the web browser
+or perhaps the Renku CLI and this is entered via ssh. The ssh server then
+validates the generated token. More specifically, the token can be generated
+via keycloak and in the validation process, the ssh server requests keycloak to
+confirm the token provided is valid.
+
+Some open source software exists which supports ssh login via the Authorization
+Code Flow; interestingly it was developed in the context of HPC compute resource
+access. It is a modest component which could conceivably be supported if Renku
+opts to support this; however, it does not appear to have a large backing and it
+is written primarily in C.
+
+It is also worth noting that use of these mechanisms is not recommended within
+VSCode; VSCode indicates that standard key based login mechanisms are preferred.
 
 ## Comparison of different approaches
 
+The following comparison criteria have been chosen to compare the different
+approaches:
+
+- VS Code support: whether the solution has good integration with VS Code ssh
+  mechanisms
+- Efficient IP address usage: whether use of IP addresses scales up with the
+  number of user sessions
+- Keys linked to user rather than project: whether each user has a dedicated
+  key/set of keys which can be arbitrarily mapped to user sessions or whether
+  keys are bound to projects rather than users
+- Flexible monitoring support: whether we can easily create grab user session
+  data to perform analysis of user behaviour
+- Key management responsibilities with users: whether users need to take
+  responsibility for managing keys
+- New Renku components to be developed/supported: whether the solution involves
+  development/support of new entities in the Renku platform
+
 A table showing how the different approaches compare is shown below:
 
 |                                                | Ssh port exposed by session directly | Proxy jumphost    | MITM Proxy         | OAuth2/OIDC       |
 |------------------------------------------------|--------------------------------------|-------------------|--------------------|-------------------|
-| Efficient IP address usage                   | :x:                                   | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |
-| Login controlled by Renku                    | :x:                                   | :x:                | :heavy_check_mark:  | :heavy_check_mark: |
-| Keys linked to user rather than project      | :x:                                   | :x:                | :heavy_check_mark:  | :heavy_check_mark: |
-| Flexible monitoring support                  | :x:                                   | :x:                | :heavy_check_mark:  | :x:                |
-| Key management responsibilities with users   | :x:                                   | :heavy_check_mark: | :heavy_check_mark:  | :x:                |
-| Increased complexity of components to manage | :x:                                   | :x:                | :x:                 | :heavy_check_mark: |
-| Control over session to ssh name             | :x:                                   | :heavy_check_mark: | :heavy_check_mark:  | :heavy_check_mark: |
+| VS Code support                                | :heavy_check_mark:                   | :heavy_check_mark: | :heavy_check_mark: | :x: |
+| Efficient IP address usage                     | :x:                                  | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |
+| Keys linked to user rather than project        | :x:                                  | :x:                | :heavy_check_mark:  | N/A |
+| Flexible monitoring support                    | :x:                                  | :x:                | :heavy_check_mark:  | :heavy_check_mark:                |
+| Key management responsibilities with users     | :heavy_check_mark:                   | :heavy_check_mark: | :heavy_check_mark:  | :x:                |
+| New Renku components to be developed/supported | :x:                                  | :x:                | :heavy_check_mark:  | :heavy_check_mark: |
+
+Based on the above comparison, the MITM proxy has the best fit with the requirements
+even though it requires the development of a dedicated, if modest component as
+part of Renku; it is also worth reiterating that this is the solution to which
+gitpod converged. 
 
 ## Proposed Solution
 
@@ -272,28 +311,86 @@ platform which will terminate ssh sessions and route them to the appropriate
 user session. It will necessitate a solution for mapping between username, 
 keys used for login and the user session.
 
-- Functionalities required:
-  - API:
-    - CRUD operations on public keys for logged in user
-    - Session operations?
-  - CLI
-    - Push public key for logged in user (check if it is valid public key)
-    - Create keypair locally, put key in appropriate folder and push public key
-    - Perform some validation of key setup without needing to launch a user session
-      - (probably needs to have an endpoint which is always available in the absence of a session) 
-    - For OIDC modus:
-      - Support generation of token for logged in user
-      - Support combined token generation and login via pty?
-      - Support login with token generated via web browser
-  - UI
-    - Import public key for logged in user (check if it is valid public key)
-    - Create keypair locally, support download of private key and give user
-      instructions on what to do with this
-    - For OIDC modus:
-      - Support generation of token in browser
-  - ssh Proxy
-    - Develop new proxy which supports public key login
-    - Need to have clarity on the solution for mapping login names to user sessions
+### Mapping between ssh login and Renku user session
+
+Given that a single ssh pubkey could apply to more than one Renku session, it
+is not realistic to use the ssh key itself as the mechanism which maps the
+ssh session to the Renku user session; this means it is not possible to have
+a solution in which the login is `ssh Renku@<ssh-proxy>` and the session is
+uniquely identified by the ssh key. The most straightforward solution is one
+in which the username maps uniquely to a Renku user session.
+
+A mechanism is then required which maps ssh username and ssh public key to a
+Renku user session.
+
+### Session creation
+
+The standard approach to session creation occurs in which the Renku notebooks
+service creates a `JupyterServer` CRD.
+
+In this case, however, a Kubernetes watcher listens for creation and deletion
+of these CRDs and updates a database accordingly. This database keeps track of
+the current set of active sessions and is used to support queries which map
+ssh username to keypair and Renku session. (This database can easily be 
+modified/augmented to support querying of historical data relating to Renku
+sessions).
+
+More specifically, the database table will contain the following three elements:
+- keycloak-user-id
+- Renku-session-name
+- ssh username
+
+While we could devise a solution in which the ssh username is derived exactly
+from the gitlab username and the Renku-session-name, these can be comprised of
+strings of somewhat random characters and, as such, can look somewhat ugly. This
+approach is proposed to add an extra level of flexibility which will allow us
+to consider alternative ssh username to Renku session name mappings.
+
+### SSH key management
+
+SSH keys will be stored in the Renku data store. As it is possible that users
+can have multiple SSH keys either concurrently or over time, it will be necessary
+to have a 1:many mapping between user-id and SSH keys.
+
+The Renku data store will support CRUD operations for key management.
+
+### Illustration of the processes
+
+...add mermaid diagrams here...
+
+```mermaid
+flowchart TD
+    Launch[Launch a session] --> Access{Is the user requesting a<br>non-default resource pool?}
+    Access --> |Yes| CheckAccess[Get resource pools for user]
+    Access --> |No| Run[Start session<br>with default resource pool]
+    CheckAccess --> IsAllowed{Is the user allowed<br>to use the resource pool?}
+    IsAllowed --> |Yes| PoolSetup[Add optional node affinity<br>Add optional toleration<br>Add priority]
+    IsAllowed --> |No| NotAllowed[Fail and inform user]
+    PoolSetup --> RunPool[Start session<br>in resource pool]
+```
+
+### Required modifications to Renku
+
+The following modifications to Renku will also be required:
+- API:
+  - CRUD operations on public keys for logged in user
+- CLI
+  - Push public key for logged in user (check if it is valid public key)
+  - Create keypair locally, put key in appropriate folder and push public key
+  - Perform some validation of key setup without needing to launch a user session
+    - (probably needs to have an endpoint which is always available in the absence of a session) 
+  - For OIDC modus:
+    - Support generation of token for logged in user
+    - Support combined token generation and login via pty?
+    - Support login with token generated via web browser
+- UI
+  - Import public key for logged in user (check if it is valid public key)
+  - Create keypair locally, support download of private key and give user
+    instructions on what to do with this
+  - For OIDC modus:
+    - Support generation of token in browser
+- ssh Proxy
+  - Develop new proxy which supports public key login
 
 ## Drawbacks
 

From 89dcf26da841c356c0161c5561b02e0313c44f53 Mon Sep 17 00:00:00 2001
From: Sean Murphy <sean.murphy@sdsc.ethz.ch>
Date: Wed, 22 Mar 2023 18:02:08 +0100
Subject: [PATCH 09/13] WIP(ssh sessions): add figures to ssh session rfc

---
 .../012-add-session-ssh-support.md}           | 54 ++++++++++++-------
 .../rfc-template.md                           |  0
 2 files changed, 35 insertions(+), 19 deletions(-)
 rename rfcs/{011-add-session-ssh-support/011-add-session-ssh-support.md => 012-add-session-ssh-support/012-add-session-ssh-support.md} (89%)
 rename rfcs/{011-add-session-ssh-support => 012-add-session-ssh-support}/rfc-template.md (100%)

diff --git a/rfcs/011-add-session-ssh-support/011-add-session-ssh-support.md b/rfcs/012-add-session-ssh-support/012-add-session-ssh-support.md
similarity index 89%
rename from rfcs/011-add-session-ssh-support/011-add-session-ssh-support.md
rename to rfcs/012-add-session-ssh-support/012-add-session-ssh-support.md
index 60735fe..4182f82 100644
--- a/rfcs/011-add-session-ssh-support/011-add-session-ssh-support.md
+++ b/rfcs/012-add-session-ssh-support/012-add-session-ssh-support.md
@@ -313,15 +313,16 @@ keys used for login and the user session.
 
 ### Mapping between ssh login and Renku user session
 
-Given that a single ssh pubkey could apply to more than one Renku session, it
-is not realistic to use the ssh key itself as the mechanism which maps the
-ssh session to the Renku user session; this means it is not possible to have
-a solution in which the login is `ssh Renku@<ssh-proxy>` and the session is
-uniquely identified by the ssh key. The most straightforward solution is one
-in which the username maps uniquely to a Renku user session.
+Given that multiple Renku sessions could potentially use the same (user) ssh
+public key, it is not possible to use the ssh key itself as the mechanism
+which maps the ssh session to the Renku user session; this means it is not
+possible to have a solution in which the login is `ssh renku@<ssh-proxy>` and
+the session is uniquely identified by the ssh key. The most straightforward
+solution is one in which the username maps uniquely to a Renku user session.
 
-A mechanism is then required which maps ssh username and ssh public key to a
-Renku user session.
+A mechanism is then required which maps ssh username to a Renku user session
+and the ssh challenge is successful using one of the user's registered ssh
+public keys.
 
 ### Session creation
 
@@ -356,17 +357,31 @@ The Renku data store will support CRUD operations for key management.
 
 ### Illustration of the processes
 
-...add mermaid diagrams here...
+Creation of a Renku session:
 
 ```mermaid
 flowchart TD
-    Launch[Launch a session] --> Access{Is the user requesting a<br>non-default resource pool?}
-    Access --> |Yes| CheckAccess[Get resource pools for user]
-    Access --> |No| Run[Start session<br>with default resource pool]
-    CheckAccess --> IsAllowed{Is the user allowed<br>to use the resource pool?}
-    IsAllowed --> |Yes| PoolSetup[Add optional node affinity<br>Add optional toleration<br>Add priority]
-    IsAllowed --> |No| NotAllowed[Fail and inform user]
-    PoolSetup --> RunPool[Start session<br>in resource pool]
+    Launch[Launch a session] --> NotebookCreatesServer[Renku notebook service creates JupyterServer]
+    NotebookCreatesServer --> RenkuDataStoreNotified[Renku Data Store watcher updates list of active sessions with ssh username, renku session and user id]
+```
+
+Termination of a Renku session:
+
+```mermaid
+flowchart TD
+    Terminate[Terminate a session] --> NotebookTerminatesServer[Renku notebook service deletes JupyterServer object]
+    NotebookTerminatesServer --> RenkuDataStoreNotified[Renku Data Store removes session from list of active sessions]
+```
+
+Login to Renku session:
+
+```mermaid
+flowchart TD
+    SshInitiated["User attempts ssh login with ssh session-name@renkulab.io"] --> RenkuSshProxyReceivesSshConnectionRequest["Renku SSH Proxy receives session initiation request"]
+    RenkuSshProxyReceivesSshConnectionRequest --> RenkuSshProxyGetsRenkuSessionInfo["Renku SSH Proxy queries Renku Data Store for session, user and valid key(s)"]
+    RenkuSshProxyGetsRenkuSessionInfo --> RenkuSshProxyDetermineIfKeyValid{"Renku SSH Proxy determines if the private key <br> used to initiate the session matches (one of) <br> the available public keys"}
+    RenkuSshProxyDetermineIfKeyValid --> |No| SshPermissionDenied["SSH Permission Denied"]
+    RenkuSshProxyDetermineIfKeyValid --> |Yes| SshConnectionProxied["SSH Proxies connection to renku-session"]
 ```
 
 ### Required modifications to Renku
@@ -374,14 +389,15 @@ flowchart TD
 The following modifications to Renku will also be required:
 - API:
   - CRUD operations on public keys for logged in user
+- Renku Data Store service
+  - Support storage of ssh keys bound to users
+  - Support queries linking ssh-username, renku-session name, user-id and valid user ssh keys
+  - Add watcher which tracks running Jupyter Servers and adds them to currently active sessions
 - CLI
   - Push public key for logged in user (check if it is valid public key)
   - Create keypair locally, put key in appropriate folder and push public key
   - Perform some validation of key setup without needing to launch a user session
     - (probably needs to have an endpoint which is always available in the absence of a session) 
-  - For OIDC modus:
-    - Support generation of token for logged in user
-    - Support combined token generation and login via pty?
     - Support login with token generated via web browser
 - UI
   - Import public key for logged in user (check if it is valid public key)
diff --git a/rfcs/011-add-session-ssh-support/rfc-template.md b/rfcs/012-add-session-ssh-support/rfc-template.md
similarity index 100%
rename from rfcs/011-add-session-ssh-support/rfc-template.md
rename to rfcs/012-add-session-ssh-support/rfc-template.md

From de754a5f92aafb870fc6c05a57beeb4e709e38b8 Mon Sep 17 00:00:00 2001
From: Sean Murphy <sean.murphy@sdsc.ethz.ch>
Date: Wed, 3 May 2023 18:11:28 +0200
Subject: [PATCH 10/13] chore(feedback): improve proposal based on feedback
 received

---
 .../012-add-session-ssh-support.md            | 122 +++++++++---------
 1 file changed, 58 insertions(+), 64 deletions(-)

diff --git a/rfcs/012-add-session-ssh-support/012-add-session-ssh-support.md b/rfcs/012-add-session-ssh-support/012-add-session-ssh-support.md
index 4182f82..684f369 100644
--- a/rfcs/012-add-session-ssh-support/012-add-session-ssh-support.md
+++ b/rfcs/012-add-session-ssh-support/012-add-session-ssh-support.md
@@ -21,9 +21,8 @@ be used to justify the final agreed solution.
 > Why are we doing this? What use cases does it support? What is the expected
 outcome?
 
-There is a [Shape-up
-document](https://www.notion.so/Support-RenkuLab-compute-access-from-local-terminal-SSH-VSCode-f896d3b391c94bcc87c56e375eb531d6)
-which provides the motivation for introducing this functionality.
+This functionality has been motivated through the Shape Up process and is
+documented [here](https://github.com/SwissDataScienceCenter/renku-design-docs/blob/main/feature-pitches/003-ssh-into-sessions/ssh-into-sessions.md).
 
 ## Problem Definition
 
@@ -38,41 +37,38 @@ below, VS Code primarily uses key based mechansims; they are generally the most
 widely used with ssh and are known to be trusted and secure - for this reason,
 there is a strong focus on key-based mechanisms here.
 
-From a user perspective, the following needs to be considered:
-- key setup
-  - user may want to use an existing keypair which is stored in `$HOME/.ssh` on
-    their local machine
-  - user may have existing keypairs but wants to have a new keypair for this
-    scenario
-    - having this generated by Renku is probably the best option but the user
-      may want to be able to control the naming of the file on the local
-      machine
-  - user may have a password protected key which is accessible via the `ssh-agent`
-    - this may require the solution to support `ssh-agent` forwarding 
-  - user's key may already be in use within gitlab/github and can be eaily
-    obtained from those sources
-- host key verification
-  - if user attempts to log into the server with ssh and it is not a recognized
-    server, the user should add the server to the known host keys
-    - we should not have too much friction in this process; ie if new host keys
-      are generated for each user session, the user may be asked each time to
-      confirm that the host key is valid in some configurations and clearly this
-      could become an irritation
-  - for VS Code host key verification is less intrusive
-- interaction modes
-  - the primary use case is one in which the user connects via VS Code; this uses
-    `ssh` based tunneling mechanisms to support diverse interactions with the
-    remote server including access to remote files, a remote terminal and
-    interaction with eg a Jupyter Notebook
-    - in this case the user should obtain a sensible shell configuration when a
-      terminal is opened (note that in VS Code, this may not be a login shell)
-  - direct `ssh` access is also possible/considered and in this case, the user
-    should also obtain a sensible shell configuration when logged in; for ssh-agent
-    forwarding, this may involve ensuring that the `SSH_AUTH_SOCK` is
-    configured
-    - user login should be quite intuitive with a clear username/hostname
-      pattern for logggin in; ssh config should also be configured analogously
-      with ssh login targets 
+From a user perspective, a number of different possibilities can be envisaged,
+which depend on how experienced the users are with SSH key handling - this can
+include different types of SSH keys (RSA, ED25519, etc), password encrypted SSH
+keys, use of `ssh-agent`, use of a priori generated keys etc. It is not
+realistic to support all possible options and hence we focus here on a
+straightforward approach.
+
+The basic approach considered here is one in which Renku is generally
+responsible for generating keypairs; renku will generate a private key which is
+kept local to the user and a public key which is pushed to the renku service.
+Another option which is straightforward to provide and would be a nice to have
+is to enable the user to specify an existing key which can be used; in this
+case, no keypair is generated but the public key associated with this keypair
+is uploaded to the renku service.
+
+Host key verification is another issue which needs to be borne in mind - 
+how this is perceived on the client side depends on the both the client
+configuration as well as the server side implementation. Generally, there
+should not be too much friction in this process which means that host keys
+should not be changing very dynamically (eg every time a session is launched).
+It is worth noting that for VS Code host key verification is less intrusive.
+
+It is envisaged that the primary use case to be supported is one in which the
+user connects via VS Code; this uses `ssh` tunneling mechanisms to support
+diverse interactions with the remote server including access to remote files, a
+remote terminal and interaction with eg a Jupyter Notebook. In this case the
+user should obtain a sensible shell configuration when a terminal is opened
+(note that in VS Code, this may not be a login shell). Direct `ssh` access is
+also considered and in this case, the user should also obtain a sensible shell
+configuration when logged in. User login should be quite intuitive with a clear
+username/hostname pattern for logggin in; ssh config should also be configured
+analogously with ssh login targets.
 
 From a system perspective, the following need to be considered:
 - unlike HTTP/S, ssh does not have support for server identification and hence
@@ -92,26 +88,10 @@ From a system perspective, the following need to be considered:
 - a solution in which keys are associated with users and are dynamically mapped
   to Renku user sessions is most sensible
 
-We consider the following different user scenarios:
-- Scenario 1: User is comfortable using ssh keys - bring you own key
-  - User imports ssh key to Renku as is common with cloud services
-- Scenario 2: User is used to using ssh keys but wants Renku to create new key
-  - User asks Renku to generate key via cli and it gets stored in the .ssh folder
-  - User uses this key to log in to Renku
-- Scenario 3: User is not so familiar with keys; wants Renku to generate keys via web interface
-  - Private key generated within browser, never uploaded to Renku and can be
-    downloaded once - it is not persisted anywhere persisted; user is provided
-    information on what to do with the private key
-  - Public key is retained by Renku
-  - (Will need to be able to deal with the case that user loses key - support
-    creation of new key)
-  - (Will probably need the command line to perform some check to see if the
-    configuration is correct)
-
 A solution is required which can be deployed both in the cloud provider context
 and in the Switch/Openstack context - assumptions relating to ingress
 configuration and getting access to ssh services must be clear for both
-contexts and map to the design of the different platforms
+contexts and map to the design of the different platforms.
 
 ## Key Assumptions
 
@@ -167,8 +147,8 @@ the ssh daemon configuration within their sessions to make their session very
 exposed.
 
 A couple of further points relating to this approach:
-- ingresses on cloud providers can typically be expensive; having one per
-  session is likely to incur significant cost;
+- ingresses mapped to load balancers on cloud providers can typically be
+  expensive; having one per session is likely to incur significant cost;
 - on the Switch deployments, the entire cluster is exposed via a single IP
   address; as such it is not straightforward to devise a solution in which
   different sessions would map to different IP addresses.
@@ -195,10 +175,12 @@ jumphost. Key management then becomes a user responsibility, with users having
 to do this within their Renku projects; as such, keys are project entities,
 rather than user entities.
 
-In this case, the command to ssh into a session requires specifying both the
-jumphost and the destination; hence it is a more complex command and introduces
-some user friction when accessing sessions. VSCode supports such proxying
-scenarios without any problems.
+In this case, the basic command to ssh into a session is more complex, requiring
+specification of both the jumphost and the destination; this is straightforward to
+deal with from the user perspective by adding entries to the ssh config for the
+sessions such that the users can simply enter the session name and the ssh
+config contains the necessary username, key, proxy and hostname. VS code can
+use the same information.
 
 ### Using a Man In the Middle (MITM) Proxy solution
 
@@ -214,12 +196,16 @@ two distinct ssh sessions with two distinct authentication flows. The proxy
 terminates one ssh session, relaying all traffic to another ssh session.
 
 This solution has the following benefits:
-- keys can be clearly bound to users and separated from projects
+- keys can be clearly bound to users and separated from projects;
 - it is reasonably straightforward to add instrumentation such that ssh session
   information data can be easily collected with the current data collection
-  mechanisms.
+  mechanisms;
 - there is greater control over the ssh command which users can use - the mapping
-  between username, keypair and Renku session is more configurable
+  between username, keypair and Renku session is more configurable;
+- it could be possible to use ssh as a means to launch renku sessions; if an
+  ssh login request is received for a specific session and that session is not
+  currently running, the platform could launch the session and permit the user
+  to log in once the session is running.
 
 The primary downside is that it involves writing and maintaining another
 component; however initial work by Tasko indicates that this can be done with
@@ -432,7 +418,15 @@ See comparison of approaches above.
 
 > What parts of the design do you expect to resolve through the RFC process before this gets merged?
 
+This document assumes the existence of somewhere to store information required
+to support this service. In the design of the CRAC service, the need for such a
+store was also identified and this has been evolving. The requirements identified
+in the CRAC service are more comprehensive; as such it makes sense to consider
+the design of that service in the CRAC context; for handling SSH sessions, it
+is simply necessary to be able to link SSH keys to users.
+
 > What parts of the design do you expect to resolve through the implementation of this feature before stabilisation?
 
 > What related issues do you consider out of scope for this RFC that could be addressed in the future independently of the solution that comes out of this RFC?
 
+

From 8c1caf02ef30a3082c11ded9bc66a4dc950df3ab Mon Sep 17 00:00:00 2001
From: Sean Murphy <sean.murphy@sdsc.ethz.ch>
Date: Wed, 3 May 2023 18:15:10 +0200
Subject: [PATCH 11/13] chore(tidying): remove rfc-template.md

---
 .../rfc-template.md                           | 45 -------------------
 1 file changed, 45 deletions(-)
 delete mode 100644 rfcs/012-add-session-ssh-support/rfc-template.md

diff --git a/rfcs/012-add-session-ssh-support/rfc-template.md b/rfcs/012-add-session-ssh-support/rfc-template.md
deleted file mode 100644
index 0d2925a..0000000
--- a/rfcs/012-add-session-ssh-support/rfc-template.md
+++ /dev/null
@@ -1,45 +0,0 @@
-- Start Date: (fill me in with today's date, DD-MM-YYYY)
-- Status: (One of Proposed, Accepted or Rejected)
-
-# (RFC title goes here)
-
-## Summary
-
-> One paragraph explanation of the change.
-
-## Motivation
-
-> Why are we doing this? What use cases does it support? What is the expected
-outcome?
-
-## Design Detail
-
-> This is the bulk of the RFC.
-
-> Explain the design in enough detail for somebody
-familiar with the infrastructure to understand. This should get into specifics and corner-cases,
-and include examples of how the service is used. Any new terminology should be
-defined here.
-
-## Drawbacks
-
-> Why should we *not* do this? Please consider the impact on users,
-on the integration of this change with other existing and planned features etc.
-
-> There are tradeoffs to choosing any path, please attempt to identify them here.
-
-## Rationale and Alternatives
-
-> Why is this design the best in the space of possible designs?
-
-> What other designs have been considered and what is the rationale for not choosing them?
-
-> What is the impact of not doing this?
-
-## Unresolved questions
-
-> What parts of the design do you expect to resolve through the RFC process before this gets merged?
-
-> What parts of the design do you expect to resolve through the implementation of this feature before stabilisation?
-
-> What related issues do you consider out of scope for this RFC that could be addressed in the future independently of the solution that comes out of this RFC?

From fe88280c3e4f103850897093e4e3ff162960f76f Mon Sep 17 00:00:00 2001
From: Sean Murphy <sean.murphy@sdsc.ethz.ch>
Date: Thu, 4 May 2023 15:07:54 +0200
Subject: [PATCH 12/13] chore(table): fix table based on feedback

---
 rfcs/012-add-session-ssh-support/012-add-session-ssh-support.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/rfcs/012-add-session-ssh-support/012-add-session-ssh-support.md b/rfcs/012-add-session-ssh-support/012-add-session-ssh-support.md
index 684f369..8abff6e 100644
--- a/rfcs/012-add-session-ssh-support/012-add-session-ssh-support.md
+++ b/rfcs/012-add-session-ssh-support/012-add-session-ssh-support.md
@@ -282,7 +282,7 @@ A table showing how the different approaches compare is shown below:
 | Keys linked to user rather than project        | :x:                                  | :x:                | :heavy_check_mark:  | N/A |
 | Flexible monitoring support                    | :x:                                  | :x:                | :heavy_check_mark:  | :heavy_check_mark:                |
 | Key management responsibilities with users     | :heavy_check_mark:                   | :heavy_check_mark: | :heavy_check_mark:  | :x:                |
-| New Renku components to be developed/supported | :x:                                  | :x:                | :heavy_check_mark:  | :heavy_check_mark: |
+| Implementation possible without developing new components | :heavy_check_mark:        | :heavy_check_mark: | :x:                 | :heavy_check_mark: |
 
 Based on the above comparison, the MITM proxy has the best fit with the requirements
 even though it requires the development of a dedicated, if modest component as

From e4a3f4b1fe399ebfb9ba356d0c404ff43956c6fa Mon Sep 17 00:00:00 2001
From: Sean Murphy <sean.murphy@sdsc.ethz.ch>
Date: Thu, 4 May 2023 15:35:00 +0200
Subject: [PATCH 13/13] chore(feedback): make modifications incorporating
 feedback

---
 .../012-add-session-ssh-support.md            | 40 ++++++++++---------
 1 file changed, 22 insertions(+), 18 deletions(-)

diff --git a/rfcs/012-add-session-ssh-support/012-add-session-ssh-support.md b/rfcs/012-add-session-ssh-support/012-add-session-ssh-support.md
index 8abff6e..835e889 100644
--- a/rfcs/012-add-session-ssh-support/012-add-session-ssh-support.md
+++ b/rfcs/012-add-session-ssh-support/012-add-session-ssh-support.md
@@ -171,16 +171,19 @@ keypair must be deployed to the proxy and/or user session.
 
 The most natural solution in this case is to use the user's public key within
 the session and have open access with no ability to obtain a shell within the
-jumphost. Key management then becomes a user responsibility, with users having
-to do this within their Renku projects; as such, keys are project entities,
-rather than user entities.
-
-In this case, the basic command to ssh into a session is more complex, requiring
-specification of both the jumphost and the destination; this is straightforward to
-deal with from the user perspective by adding entries to the ssh config for the
-sessions such that the users can simply enter the session name and the ssh
-config contains the necessary username, key, proxy and hostname. VS code can
-use the same information.
+jumphost. This can be achieved in a sstraightforward manner today by adding
+user public keys to their projects which then get picked up in the user
+sessions. This approach effectively binds keys to projects. It is also
+possible to envisage an approach in which the keys could be retrieved from a
+service which persists SSH keys enabling keys to be bound to users rather than
+projects.
+
+In this case, the command to ssh into a session is more complex than the basic
+ssh command, requiring specification of both the jumphost and the destination;
+this is straightforward to deal with from the user perspective by adding
+entries to the ssh config for the sessions such that the users can simply enter
+the session name and the ssh config contains the necessary username, key, proxy
+and hostname. VS code can use the same information.
 
 ### Using a Man In the Middle (MITM) Proxy solution
 
@@ -279,7 +282,7 @@ A table showing how the different approaches compare is shown below:
 |------------------------------------------------|--------------------------------------|-------------------|--------------------|-------------------|
 | VS Code support                                | :heavy_check_mark:                   | :heavy_check_mark: | :heavy_check_mark: | :x: |
 | Efficient IP address usage                     | :x:                                  | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |
-| Keys linked to user rather than project        | :x:                                  | :x:                | :heavy_check_mark:  | N/A |
+| Keys linked to user rather than project        | :x:                                  | Possible           | :heavy_check_mark:  | N/A |
 | Flexible monitoring support                    | :x:                                  | :x:                | :heavy_check_mark:  | :heavy_check_mark:                |
 | Key management responsibilities with users     | :heavy_check_mark:                   | :heavy_check_mark: | :heavy_check_mark:  | :x:                |
 | Implementation possible without developing new components | :heavy_check_mark:        | :heavy_check_mark: | :x:                 | :heavy_check_mark: |
@@ -304,7 +307,7 @@ public key, it is not possible to use the ssh key itself as the mechanism
 which maps the ssh session to the Renku user session; this means it is not
 possible to have a solution in which the login is `ssh renku@<ssh-proxy>` and
 the session is uniquely identified by the ssh key. The most straightforward
-solution is one in which the username maps uniquely to a Renku user session.
+solution is one in which the ssh username maps uniquely to a Renku user session.
 
 A mechanism is then required which maps ssh username to a Renku user session
 and the ssh challenge is successful using one of the user's registered ssh
@@ -315,12 +318,13 @@ public keys.
 The standard approach to session creation occurs in which the Renku notebooks
 service creates a `JupyterServer` CRD.
 
-In this case, however, a Kubernetes watcher listens for creation and deletion
-of these CRDs and updates a database accordingly. This database keeps track of
-the current set of active sessions and is used to support queries which map
-ssh username to keypair and Renku session. (This database can easily be 
-modified/augmented to support querying of historical data relating to Renku
-sessions).
+This needs to be augmented with an additional mechanism which persists session
+information to a database which can then be queried easily (a side effect of
+this is that there will be a persisted record of session creation and
+termination). Here, a Kubernetes watcher should listen for creation and
+deletion of these CRDs and update a database accordingly. This database keeps
+track of the current set of active sessions and is used to support queries
+which map ssh username to keypair and Renku session.
 
 More specifically, the database table will contain the following three elements:
 - keycloak-user-id