diff --git a/docs/proposals/e2e_test.md b/docs/proposals/e2e_test.md new file mode 100644 index 000000000..b3c4b7228 --- /dev/null +++ b/docs/proposals/e2e_test.md @@ -0,0 +1,175 @@ +# Support E2E Testing + + + +- [Summary](#summary) +- [Motivation](#motivation) + - [Goals](#goals) + - [Non-Goals](#non-goals) +- [Proposal](#proposal) + - [Test Scope](#test-scope) + - [Implementation Details](#implementation-details) + - [User Stories (Optional)](#user-stories-optional) + - [Story 1](#story-1) + - [Story 2](#story-2) + - [Story 3](#story-3) + - [Risks and Mitigations](#risks-and-mitigations) +- [Design Details](#design-details) + + + +## Summary + + + +This KEP proposes to support End-to-End (E2E) testing for HAMi, ensuring its functionality and compatibility +within the Kubernetes ecosystem. + +It introduces mechanisms to validate the entire workflow of the feature and +guarantee that it meets production-level requirements. + +## Motivation + + + +End-to-end (E2E) tests validate the complete functionality of a system, ensuring that the end-user experience +aligns with developer specifications. + +While unit and integration tests provide valuable feedback, they are often insufficient in distributed systems. +Minor changes may pass unit and integration tests but still introduce unforeseen issues at the system level. + +Comprehensive E2E test coverage is essential to mitigate the risks of regressions, improve reliability, +and maintain confidence in the system's seamless integration with Kubernetes. +Without it, HAMi's robustness and user trust may be compromised. + +### Goals + + + +- Setup E2E testing basic environment. +- Define the scope and scenarios for E2E testing of HAMi. +- Implement E2E tests that cover key workflows and edge cases. +- Ensure compatibility with the Kubernetes. +- Establish a reliable and repeatable test framework for future enhancements. + +### Non-Goals + + + +- Unit or integration testing of the feature (covered elsewhere). +- Performance benchmarking beyond basic scenarios. + +## Proposal + + + +This proposal aims to integrate E2E testing for HAMi. Tests will be implemented using the +Kubernetes E2E testing framework (e.g. Ginkgo) and adhere to community best practices. + +### Test Scope + +- Core functionality: Validate basic operations and workflows of the feature. +- Edge cases: Test unusual scenarios or invalid inputs to ensure robustness. +- Compatibility: + - Verify that the feature integrates with different heterogeneous devices. + - Verify that the feature integrates with different Kubernetes versions. + - Verify that the feature integrates with different CUDA versions. (Optional) +- Error handling: Ensure appropriate error messages and recovery mechanisms are in place. + +### Implementation Details + +- Test environment will hold in local environment. +- Tests will be written using the [Ginkgo](https://onsi.github.io/ginkgo/) testing framework. +- All tests will use isolated namespaces to avoid conflicts. +- Resource cleanup will be automated after each test run. +- CI integration will ensure tests run against PRs, daily builds, and releases. + +### User Stories (Optional) + + + +#### Story 1 +Automating E2E testing with helm deployment + +#### Story 2 +Automating E2E testing with resource validation + +### Story 3 +Automating E2E testing with Kubernetes resource deploy + +### Risks and Mitigations + + + +- Resource Limitations + - During E2E testing, testing clusters may encounter resource constraints, + such as insufficient CPU, memory, or storage. This could lead to test failures, + degraded performance, or timeouts during deployments. +- Environment Instability + - Instabilities in the testing environment, such as network latency, intermittent network failures, + or cluster node failures, can cause tests to fail or behave inconsistently. + +## Design Details + + + + +![gpu_utilization](e2e_test.png) + diff --git a/docs/proposals/e2e_test.png b/docs/proposals/e2e_test.png new file mode 100644 index 000000000..bd129f94a Binary files /dev/null and b/docs/proposals/e2e_test.png differ