Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCPBUGS-40906: Add encapsulation flag to IPsecConfig struct #1472

Merged
merged 1 commit into from
Dec 18, 2024

Conversation

mccv1r0
Copy link
Contributor

@mccv1r0 mccv1r0 commented May 25, 2023

Add encapsulation flag to IPsecConfig struct.

This flag allows UDP encapsulation of IPsec related packets for things like firewall and NAT traversal

See RFC 3948 for specific uses

@openshift-ci
Copy link
Contributor

openshift-ci bot commented May 25, 2023

Hello @mccv1r0! Some important instructions when contributing to openshift/api:
API design plays an important part in the user experience of OpenShift and as such API PRs are subject to a high level of scrutiny to ensure they follow our best practices. If you haven't already done so, please review the OpenShift API Conventions and ensure that your proposed changes are compliant. Following these conventions will help expedite the api review process for your PR.

@openshift-ci openshift-ci bot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label May 25, 2023
@openshift-ci openshift-ci bot requested review from adambkaplan and eparis May 25, 2023 15:55
@openshift-bot
Copy link

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 24, 2023
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 31, 2023
@openshift-bot
Copy link

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Oct 1, 2023
@openshift-bot
Copy link

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

@openshift-ci openshift-ci bot closed this Oct 31, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 31, 2023

@openshift-bot: Closed this PR.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@yocum137
Copy link

Please re-open this PR for IPSec encapsulation on IBM VPC cloud.

@pperiyasamy
Copy link
Member

/reopen

@openshift-ci openshift-ci bot reopened this Nov 6, 2024
Copy link
Contributor

openshift-ci bot commented Nov 6, 2024

@pperiyasamy: Reopened this PR.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 6, 2024
@openshift-ci openshift-ci bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Nov 6, 2024
@pperiyasamy
Copy link
Member

/retitle OCPBUGS-40906: Add encapsulation flag to IPsecConfig struct

@openshift-ci openshift-ci bot changed the title Add encapsulation flag to IPsecConfig struct OCPBUGS-40906: Add encapsulation flag to IPsecConfig struct Nov 6, 2024
@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Nov 6, 2024
@openshift-ci-robot
Copy link

@mccv1r0: This pull request references Jira Issue OCPBUGS-40906, which is invalid:

  • expected the bug to target the "4.18.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

Add encapsulation flag to IPsecConfig struct.

This flag allows UDP encapsulation of IPsec related packets for things like firewall and NAT traversal

See RFC 3948 for specific uses

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@pperiyasamy
Copy link
Member

/jira refresh

@openshift-ci-robot openshift-ci-robot added the jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. label Nov 6, 2024
@openshift-ci-robot
Copy link

@pperiyasamy: This pull request references Jira Issue OCPBUGS-40906, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.18.0) matches configured target version for branch (4.18.0)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Nov 8, 2024
Copy link
Contributor

@JoelSpeed JoelSpeed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New user facing features should have an enhancement and a feature gate for their introduction. Do you have any design docs for this yet, or thoughts on feature gating this and what you'll need to do to be able to declare this stable?

Comment on lines 567 to 568
// This will be effective only when mode is configured with `Full`,
// otherwise it will be ignored.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can add CEL to the parent struct to prevent this being set when the mode is not full.
Base this field on the OptionalMember example please

Suggested change
// This will be effective only when mode is configured with `Full`,
// otherwise it will be ignored.
// This is permitted only when mode is configured with `Full`,
// and forbidden otherwise.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JoelSpeed thanks for the reference. added CEL to validate full parameter, is that fine ?

type IPsecFullModeConfig struct {
// encapsulation option to enabled or disable rfc3949 encapsulation of IKE and IPsec
// encryption packets
// +kubebuilder:validation:Enum:=yes;no;""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes and No are as bad as true and false. What does this actually do for an end user? What kind of encapsulation is this? Think about this from an end user perspective.

What difference does it make to me, as an end user, if I do, or do not have this field set to yes or no?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This option is going to set into OVN/OVS something like this: https://docs.openvswitch.org/en/latest/tutorials/ipsec/#custom-options and finally going to be set as ipsec config option for an ipsec tunnel.

encapsulation

In some cases, for example when ESP packets are filtered or when a broken IPsec peer does not properly recognise NAT,
it can be useful to force RFC-3948 encapsulation. In other cases, where IKE is NAT'ed but ESP packets can or should
flow without encapsulation, it can be useful to ignore the NAT-Traversal auto-detection. encapsulation=yes forces
the NAT detection code to lie and tell the remote peer that RFC-3948 encapsulation (ESP in port 4500 packets) is
required. encapsulation=no ignores the NAT detection causing ESP packets to send send without encapsulation.
The default value of encapsulation=auto follows the regular outcome of the NAT auto-detection code performed
in IKE. This option replaced the obsoleted forceencaps option.

https://libreswan.org/man/ipsec.conf.5.html

So I don't see any other values to represent this. Please suggest.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JoelSpeed here is how implementation looks like for this new API: openshift/cluster-network-operator#2573.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to me like there are three modes if I read this correctly? Force, Auto, Disable? With the default being Auto?

Could you briefly explain to me, what actually is encapsulation in this context? Does AH or ESP, transport or tunnel come into this in any way? https://support.huawei.com/enterprise/en/doc/EDOC1100301600/c5d5a9b/understanding-ipsec-encapsulation-modes

I'm still trying to understand how we explain this to an end user in terms that they can easily understand

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if ESP (IP Protocol 50) is not supported this uses UDP encapsulation. This is required for IBMCloud for example.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If your network or firewall blocks less common IP protocols like IP Protocol 50, this setting allows the use of standard UDP packets on ports 500 and 4500 instead.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IBMCloud IPI already uses UDP encap, it's just hardcoded. This supports platform: none or other edge cases.

We should probably add UDP port 4500 somewhere in descriptions.

Other UPI Docs already indicate UDP port 4500 is required, so we should be covered.
e.g. AWS UPI:

Name Description Protocol Port
MasterIngressIpsecNat IPsec NAT-T packets udp 4500
MasterIngressWorkerIpsecNat IPsec NAT-T packets udp 4500

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rbbratta about firewall use case, The NAT-T is very much needed there.

@JoelSpeed The IPsec tunnel is configured between OCP nodes in tunnel mode on the Geneve overlay traffic which carries pod payload traffic. Once IPsec SAs are established between nodes, then Gevenve traffic is encrypted with ESP header. So NAT-T (UDP) encapsulation happens on ESP payload so that it can pass firewalls and also compatible with intermediate NAT devices.

updated with 3 encapsulation modes (yes, no and auto) now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we please avoid the use of yes and no as enum values, they are as bad as true and false. Enum values should be descriptive where possible, I still think Force and Disable are better

Think of it this way:

  • Yes encapsulation
  • No encapsulation
    vs
  • Force encapsulation
  • Disable encapsulation

The latter two flow better in a sentence at a minimum

Also, enum values should be PascalCase

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes @JoelSpeed , done.

// This will be effective only when mode is configured with `Full`,
// otherwise it will be ignored.
// +optional
Full IPsecFullModeConfig `json:"full,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be a pointer, you can't have an omitempty with a non-pointer struct

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@pperiyasamy
Copy link
Member

New user facing features should have an enhancement and a feature gate for their introduction. Do you have any design docs for this yet, or thoughts on feature gating this and what you'll need to do to be able to declare this stable?

This is a minor change w.r.t implementation and addressing this for a bug OCPBUGS-40906, do you still recommend to have feature gate and design doc update for this ?

@pperiyasamy
Copy link
Member

/remove-lifecycle rotten

@openshift-ci openshift-ci bot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Nov 21, 2024
// This is permitted only when mode is configured with `Full`,
// and forbidden otherwise.
// +unionMember,optional
Full *IPsecFullModeConfig `json:"full,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs an explicit optional tag too please

Suggested change
Full *IPsecFullModeConfig `json:"full,omitempty"`
// +optional
Full *IPsecFullModeConfig `json:"full,omitempty"`

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

type IPsecFullModeConfig struct {
// encapsulation option to enabled or disable rfc3949 encapsulation of IKE and IPsec
// encryption packets
// +kubebuilder:validation:Enum:=yes;no;""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to me like there are three modes if I read this correctly? Force, Auto, Disable? With the default being Auto?

Could you briefly explain to me, what actually is encapsulation in this context? Does AH or ESP, transport or tunnel come into this in any way? https://support.huawei.com/enterprise/en/doc/EDOC1100301600/c5d5a9b/understanding-ipsec-encapsulation-modes

I'm still trying to understand how we explain this to an end user in terms that they can easily understand

@pperiyasamy
Copy link
Member

/retest-required

@pperiyasamy
Copy link
Member

/retest

Copy link
Contributor

openshift-ci bot commented Nov 27, 2024

@mccv1r0: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/okd-scos-e2e-aws-ovn 93e98d6 link false /test okd-scos-e2e-aws-ovn

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Comment on lines +589 to +593
// encapsulation option to configure libreswan on how inter-pod traffic across nodes
// are encapsulated to handle NAT traversal. When configured it uses UDP port 4500
// for the encapsulation.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the purpose of this field is documented well in this GoDoc, but I would like to see more information outlined on:

  • Limitations of this field. I would recommend explicitly calling out the allowed values in the GoDoc and describing how each setting influences the behavior
  • As of writing, this field is optional and doesn't look to have a default - what does not setting this field mean for encapsulation?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As Bryce says, we need to add godoc explaining the options, this typically takes the form of

// Valid values are Force, Disable, Auto and omitted.
// Force means ... 
// Disable means ...
// Auto means ...
// When omitted, this means no opinion and the platform is left to choose a reasonable default, which is subject to change over time.
// The current default is ...

Also, given this is optional/omitempty you don't need "" as a valid value in the enum

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, updated godoc with above info, hope it's fine now.

// for the encapsulation.
// +kubebuilder:validation:Enum:=Force;Disable;Auto;""
// +optional
Encap Encapsulation `json:"encapsulation,omitempty"`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason why the struct field name is Encap and not Encapsulation? Generally, I would recommend having the Go struct field name match the JSON field name where possible.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, changed it.

Comment on lines 582 to 583
// DisableEncapsulation disables UDP encapsulation even if NAT is present.
DisableEncapsulation = "Disable"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "disable" mean here? Is it semantically equivalent to "no encapsulation"? Would using something like None instead of Disable be a bit more intuitive for the semantic meaning of "no encapsulation"?

I know that generally we try to avoid the terms "enable" and "disable" because they are often overloaded.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case I think we were aiming for the antonym of Force.

There are three states for the encapsulation. Either you make sure it's on (force), you make sure it's off (disable), or you let the connection negotiate based on the other end (auto).

Given that context, I leant towards disable over none, wdyt?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe something like Always, Never, Auto would help avoid "Disabled" and be more naturally descriptive?

Thinking from a UX perspective, I think Always and Never would be easier to remember as "always encapsulate" and "never encapsulate" than something like Force and Disabled.

Provided the additional context, I don't think Force and Disabled are bad options

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the suggestion of Always and Never, what do you think @pperiyasamy ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1
updated it with Always and Never options now.

// for the encapsulation.
// +kubebuilder:validation:Enum:=Force;Disable;Auto;""
// +optional
Encap Encapsulation `json:"encapsulation,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be required when Full is provided? Do you expect new parameters to be added here in the future?

Trying to work out if we can make it a validation error to set

ipsecConfig:
  mode: Full
  full: {}

Since this would be semantically equivalent to

ipsecConfig:
  mode: Full

We could use +kubebuilder:validation:MinProperties:=1 to make sure that at least one property is added, if you don't think we should make this field required when full: {}, WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at present full option may not be configured at all for most of the clusters, so let it be compatible with OCP upgrades as well.
when customer wants to configure NAT-T encapsulation option other than default value, then they are free to configure it via full struct parameter.
of course there can be future requirements to customize ipsec connection on Full mode.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what Joel is getting at here is that making it so there is only one way to represent the "no configuration" option is a better UX choice as mentioned in the OpenShift API conventions doc here: https://github.com/openshift/enhancements/blob/master/dev-guide/api-conventions.md#only-one-phrasing-for-each-idea

It seems like there is currently two ways to set the "I have no opinion as to the configuration when mode is set to Full" idea by doing

ipsecConfig:
  mode: Full

and

ipsecConfig:
  mode: Full
  full: {}

In order to achieve this single representation, I think Joel is suggesting to either:

  • Make the ipsecConfig.full.encapsulation field required
  • Make it such that when ipsecConfig.full is specified, at least one field member of the struct must also be specified

Either of these would make it so the only valid way to set the "no configuration" option is to not include the ipsecConfig.full field in your YAML that you apply to the cluster even when ipsecConfig.mode is set to Full.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes @everettraven , I can understand it now and the suggestion makes sense.
adding +kubebuilder:validation:MinProperties:=1 on the struct to support:

  • Make it such that when ipsecConfig.full is specified, at least one field member of the struct must also be specified

Comment on lines 582 to 583
// DisableEncapsulation disables UDP encapsulation even if NAT is present.
DisableEncapsulation = "Disable"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case I think we were aiming for the antonym of Force.

There are three states for the encapsulation. Either you make sure it's on (force), you make sure it's off (disable), or you let the connection negotiate based on the other end (auto).

Given that context, I leant towards disable over none, wdyt?

Comment on lines +589 to +593
// encapsulation option to configure libreswan on how inter-pod traffic across nodes
// are encapsulated to handle NAT traversal. When configured it uses UDP port 4500
// for the encapsulation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As Bryce says, we need to add godoc explaining the options, this typically takes the form of

// Valid values are Force, Disable, Auto and omitted.
// Force means ... 
// Disable means ...
// Auto means ...
// When omitted, this means no opinion and the platform is left to choose a reasonable default, which is subject to change over time.
// The current default is ...

Also, given this is optional/omitempty you don't need "" as a valid value in the enum

@pperiyasamy
Copy link
Member

/assign @huiran0826

Signed-off-by: Michael Cambria <[email protected]>
Signed-off-by: Periyasamy Palanisamy <[email protected]>
@JoelSpeed
Copy link
Contributor

/test all
/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Dec 18, 2024
Copy link
Contributor

openshift-ci bot commented Dec 18, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: JoelSpeed, mccv1r0

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 18, 2024
@openshift-merge-bot openshift-merge-bot bot merged commit a3fb7a2 into openshift:master Dec 18, 2024
11 checks passed
@openshift-ci-robot
Copy link

@mccv1r0: Jira Issue OCPBUGS-40906: Some pull requests linked via external trackers have merged:

The following pull requests linked via external trackers have not merged:

These pull request must merge or be unlinked from the Jira bug in order for it to move to the next state. Once unlinked, request a bug refresh with /jira refresh.

Jira Issue OCPBUGS-40906 has not been moved to the MODIFIED state.

In response to this:

Add encapsulation flag to IPsecConfig struct.

This flag allows UDP encapsulation of IPsec related packets for things like firewall and NAT traversal

See RFC 3948 for specific uses

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@pperiyasamy
Copy link
Member

Thanks @JoelSpeed and @everettraven for the reviews and merging it finally !
It was a good exercise for me from API perspective.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.