End-to-end tests are essential to ensure that an operator works
as intended in real-world scenarios. The Operator SDK includes a testing
framework to make writing tests simpler and quicker by removing boilerplate
code and providing common test utilities. The Operator SDK includes the
test framework as a library under pkg/test
and the e2e tests are written
as standard go tests.
The test framework includes a few components. The most important to talk about are Framework and Context.
Framework contains all global variables, such as the kubeconfig, kubeclient, scheme, and dynamic client (provided via the controller-runtime project). It is initialized by the MainEntry function and can be used anywhere in the tests.
Note: several required arguments are initialized and added by MainEntry()
. Do not attempt to
use testing.M
directly.
Context is a local context that stores important information for each test, such as the namespace for that test and the cleanup functions. By handling namespace and resource initialization through Context, we can make sure that all resources are properly handled and removed after the test finishes.
In this section, we will be walking through writing the e2e tests of the sample memcached-operator.
The first step to writing a test is to create the main_test.go
file. The main_test.go
file simply calls the test framework's main entry that sets up the framework and then
starts the tests. It should be pretty much identical for all operators. This is what it
looks like for the memcached-operator:
package e2e
import (
"testing"
f "github.com/operator-framework/operator-sdk/pkg/test"
)
func TestMain(m *testing.M) {
f.MainEntry(m)
}
In this section, we will be designing a test based on the memcached_test.go file from the memcached-operator sample.
Once MainEntry sets up the framework, it runs the remainder of the tests. First, make
sure to import testing
, the operator-sdk test framework (pkg/test
) as well as your operator's libraries:
import (
"testing"
cachev1alpha1 "github.com/operator-framework/operator-sdk-samples/go/memcached-operator/pkg/apis/cache/v1alpha1"
"github.com/operator-framework/operator-sdk-samples/go/memcached-operator/pkg/apis"
framework "github.com/operator-framework/operator-sdk/pkg/test"
)
The next step is to register your operator's scheme with the framework's dynamic client.
To do this, pass the CRD's AddToScheme
function and its List type object to the framework's
AddToFrameworkScheme function. For our example memcached-operator, it looks like this:
memcachedList := &cachev1alpha1.MemcachedList{}
err := framework.AddToFrameworkScheme(apis.AddToScheme, memcachedList)
if err != nil {
t.Fatalf("failed to add custom resource scheme to framework: %v", err)
}
We pass in the CR List object memcachedList
as an argument to AddToFrameworkScheme()
because
the framework needs to ensure that the dynamic client has the REST mappings to query the API
server for the CR type. The framework will keep polling the API server for the mappings and
timeout after 5 seconds, returning an error if the mappings were not discovered in that time.
The next step is to create a Context for the current test and defer its cleanup function:
ctx := framework.NewContext(t)
defer ctx.Cleanup()
Now that there is a Context
, the test's Kubernetes resources (specifically the test namespace,
Service Account, RBAC, and Operator deployment in local
testing; just the Operator deployment
in cluster
testing) can be initialized:
err := ctx.InitializeClusterResources(&framework.CleanupOptions{TestContext: ctx, Timeout: cleanupTimeout, RetryInterval: cleanupRetryInterval})
if err != nil {
t.Fatalf("failed to initialize cluster resources: %v", err)
}
The InitializeClusterResources
function uses the custom Create
function in the framework client to create the resources provided
in your namespaced manifest. The custom Create
function use the controller-runtime's client to create resources and then
creates a cleanup function that is called by ctx.Cleanup
which deletes the resource and then waits for the resource to be
fully deleted before returning. This is configurable with CleanupOptions
. For info on how to use CleanupOptions
see
this section.
If you want to make sure the operator's deployment is fully ready before moving onto the next part of the
test, the WaitForOperatorDeployment
function from e2eutil (in the sdk under pkg/test/e2eutil
) can be used:
// get namespace
namespace, err := ctx.GetOperatorNamespace()
if err != nil {
t.Fatal(err)
}
// get global framework variables
f := framework.Global
// wait for memcached-operator to be ready
err = e2eutil.WaitForOperatorDeployment(t, f.KubeClient, namespace, "memcached-operator", 1, time.Second*5, time.Second*30)
if err != nil {
t.Fatal(err)
}
Since the controller-runtime's dynamic client uses go contexts, make sure to import the go context library.
In this example, we imported it as goctx
:
The test framework provides Client
, which exposes most of the controller-runtime's client unmodified, but the Create
function has added functionality to create cleanup functions for these resources as well. To manage how cleanup
is handled, we use a CleanupOptions
struct. Here are some examples of how to use it:
// Create with no cleanup
Create(goctx.TODO(), exampleMemcached, &framework.CleanupOptions{})
Create(goctx.TODO(), exampleMemcached, nil)
// Create with cleanup but no polling for resources to be deleted
Create(goctx.TODO(), exampleMemcached, &framework.CleanupOptions{TestContext: ctx})
// Create with cleanup and polling wait for resources to be deleted
Create(goctx.TODO(), exampleMemcached, &framework.CleanupOptions{TestContext: ctx, Timeout: timeout, RetryInterval: retryInterval})
This is how we can create a custom memcached custom resource with a size of 3:
// create memcached custom resource
exampleMemcached := &cachev1alpha1.Memcached{
ObjectMeta: metav1.ObjectMeta{
Name: "example-memcached",
Namespace: namespace,
},
Spec: cachev1alpha1.MemcachedSpec{
Size: 3,
},
}
err = f.Client.Create(goctx.TODO(), exampleMemcached, &framework.CleanupOptions{TestContext: ctx, Timeout: time.Second * 5, RetryInterval: time.Second * 1})
if err != nil {
return err
}
Now we can check if the operator successfully worked. In the case of the memcached operator, it should have
created a deployment called "example-memcached" with 3 replicas. To check, we use the WaitForDeployment
function, which
is the same as WaitForOperatorDeployment
with the exception that WaitForOperatorDeployment
will skip waiting
for the deployment if the test is run locally and the --up-local
flag is set; the WaitForDeployment
function always
waits for the deployment:
// wait for example-memcached to reach 3 replicas
err = e2eutil.WaitForDeployment(t, f.KubeClient, namespace, "example-memcached", 3, time.Second*5, time.Second*30)
if err != nil {
return err
}
We can also test that the deployment scales correctly when the CR is updated:
err = f.Client.Get(goctx.TODO(), types.NamespacedName{Name: "example-memcached", Namespace: namespace}, exampleMemcached)
if err != nil {
return err
}
exampleMemcached.Spec.Size = 4
err = f.Client.Update(goctx.TODO(), exampleMemcached)
if err != nil {
return err
}
// wait for example-memcached to reach 4 replicas
err = e2eutil.WaitForDeployment(t, f.KubeClient, namespace, "example-memcached", 4, time.Second*5, time.Second*30)
if err != nil {
return err
}
Once the end of the function is reached, the Context's cleanup functions will automatically be run since they were deferred when the Context was created.
To make running the tests simpler, the operator-sdk
CLI tool has a test
subcommand that can configure
default test settings, such as locations of your global resource manifest file (by default
deploy/crd.yaml
) and your namespaced resource manifest file (by default deploy/service_account.yaml
concatenated with
deploy/rbac.yaml
and deploy/operator.yaml
), and allows the user to configure runtime options.
To run the tests, run the operator-sdk test local
command in your project root and pass the location of the tests
as an argument. You can use --help
to view the other configuration options and use --go-test-flags
to pass in arguments to go test
. Here is an example command:
$ operator-sdk test local ./test/e2e --go-test-flags "-v -parallel=2"
If you wish to specify a different operator image than specified in your operator.yaml
file (or a user-specified
namespaced manifest file), you can use the --image
flag:
$ operator-sdk test local ./test/e2e --image quay.io/example/my-operator:v0.0.2
If you wish to run all the tests in 1 namespace (which also forces -parallel=1
), you can use the --namespace
flag:
$ kubectl create namespace operator-test
$ operator-sdk test local ./test/e2e --operator-namespace operator-test
To run the operator itself locally during the tests instead of starting a deployment in the cluster, you can use the
--up-local
flag. This mode will still create global resources, but by default will not create any in-cluster namespaced
resources unless the user specifies one through the --namespaced-manifest
flag.
NOTE: The --up-local
flag requires the --operator-namespace
flag and the command will NOT create the namespace. Then, be sure that you are specifying a valid namespace.
$ kubectl create namespace operator-test
$ operator-sdk test local ./test/e2e --operator-namespace operator-test --up-local
If you would prefer to create the resources yourself and skip resource creation, you can use the --no-setup
flag:
$ kubectl create namespace operator-test
$ kubectl create -f deploy/crds/cache.example.com_memcacheds_crd.yaml
$ kubectl create -f deploy/service_account.yaml --namespace operator-test
$ kubectl create -f deploy/role.yaml --namespace operator-test
$ kubectl create -f deploy/role_binding.yaml --namespace operator-test
$ kubectl create -f deploy/operator.yaml --namespace operator-test
$ operator-sdk test local ./test/e2e --operator-namespace operator-test --no-setup
Executing e2e tests requires the permission to access, create, and delete resources on your cluster. Depending on what kind of Kubernetes cluster you are using, this may require some manual setup. For example, OpenShift users are not created with cluster-admin access by default, so you would have to manually add permissions to access these resources.
The simplest way to accomplish this is to bind the cluster-admin Cluster Role to the Service Account you will run the test under. If you are unable or unwilling to grant such access, a more limited permission set can be created and bound to your Service Account. A good place to start would be the Role bound to your operator itself, such as this role for the memcached operator example. In addition, you might have to create a Cluster Role to allow your tests to create namespaces, like so:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: testuser
rules:
- apiGroups:
- ""
resources:
- namespaces
verbs:
- create
- delete
- get
- list
- watch
- update
Note that this isn't an exhaustive permission set, and the e2e tests you write might require more or less access.
For more documentation on the operator-sdk test local
command, see the SDK CLI Reference doc.
If the tests encounter an error, it is possible to tell the framework not to delete the resources. This behavior is enabled with the --skip-cleanup-error
flag:
$ operator-sdk test local ./test/e2e --skip-cleanup-error
This is useful if after the error happens, you need to inspect the resources that were created in the test or if you have automated scripts that download all the logs from the pods at the end of the test run.
NOTE: The created resources will be deleted if the tests pass.
For advanced use cases, it is possible to run the tests via go test
directly. As long as all flags defined
in MainEntry are declared, the tests will run correctly. Running the tests directly with missing flags
will result in undefined behavior. This is an example go test
equivalent to the operator-sdk test local
example above:
# Combine service_account, role, role_binding, and operator manifests into namespaced manifest
$ cp deploy/service_account.yaml deploy/namespace-init.yaml
$ echo -e "\n---\n" >> deploy/namespace-init.yaml
$ cat deploy/role.yaml >> deploy/namespace-init.yaml
$ echo -e "\n---\n" >> deploy/namespace-init.yaml
$ cat deploy/role_binding.yaml >> deploy/namespace-init.yaml
$ echo -e "\n---\n" >> deploy/namespace-init.yaml
$ cat deploy/operator.yaml >> deploy/namespace-init.yaml
# Run tests
$ go test ./test/e2e/... -root=$(pwd) -kubeconfig=$HOME/.kube/config -globalMan deploy/crds/cache.example.com_apps_crd.yaml -namespacedMan deploy/namespace-init.yaml -v -parallel=2
While the test framework provides utilities that allow the test to automatically be cleaned up when done,
it is possible that an error in the test code could cause a panic, which would stop the test
without running the deferred cleanup. To clean up manually, you should check what namespaces currently exist
in your cluster. You can do this with kubectl
:
$ kubectl get namespaces
Example Output:
NAME STATUS AGE
default Active 2h
kube-public Active 2h
kube-system Active 2h
main-1534287036 Active 23s
memcached-memcached-group-cluster-1534287037 Active 22s
memcached-memcached-group-cluster2-1534287037 Active 22s
The names of the namespaces will be either start with main
or with the name of the tests and the suffix will
be a Unix timestamp (number of seconds since January 1, 1970 00:00 UTC). Kubectl can be used to delete these
namespaces and the resources in those namespaces:
$ kubectl delete namespace main-153428703
Since the CRD is not namespaced, it must be deleted separately. Clean up the CRD created by the tests using the CRD manifest deploy/crd.yaml
:
$ kubectl delete -f deploy/crds/cache.example.com_memcacheds_crd.yaml