Skip to content

Commit

Permalink
Merge branch 'litmuschaos:master' into master
Browse files Browse the repository at this point in the history
  • Loading branch information
kwx4957 authored Nov 22, 2024
2 parents 57ccc28 + aab8a5e commit 6a5bb75
Show file tree
Hide file tree
Showing 145 changed files with 124,512 additions and 1,105 deletions.
14 changes: 7 additions & 7 deletions .github/workflows/push.yml
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ jobs:
# echo export DEX_SERVER="litmusportal-dex-server" >> env-vars
- name: Uploading envs
uses: actions/upload-artifact@v2
uses: actions/upload-artifact@v4
with:
name: env_artifact
path: chaoscenter/env-vars
Expand All @@ -78,7 +78,7 @@ jobs:
uses: actions/checkout@v4

- name: Downloading image artficate
uses: actions/download-artifact@v2
uses: actions/download-artifact@v4
with:
name: env_artifact
path: chaoscenter
Expand Down Expand Up @@ -116,7 +116,7 @@ jobs:
uses: actions/checkout@v4

- name: Downloading image artficate
uses: actions/download-artifact@v2
uses: actions/download-artifact@v4
with:
name: env_artifact
path: chaoscenter
Expand Down Expand Up @@ -154,7 +154,7 @@ jobs:
uses: actions/checkout@v4

- name: Downloading image artficate
uses: actions/download-artifact@v2
uses: actions/download-artifact@v4
with:
name: env_artifact
path: chaoscenter
Expand Down Expand Up @@ -192,7 +192,7 @@ jobs:
uses: actions/checkout@v4

- name: Downloading image artficate
uses: actions/download-artifact@v2
uses: actions/download-artifact@v4
with:
name: env_artifact
path: chaoscenter
Expand Down Expand Up @@ -243,7 +243,7 @@ jobs:
uses: actions/checkout@v4

- name: Downloading image artficate
uses: actions/download-artifact@v2
uses: actions/download-artifact@v4
with:
name: env_artifact
path: chaoscenter
Expand Down Expand Up @@ -279,4 +279,4 @@ jobs:
source env-vars
FRONTEND_IMAGE=${{ matrix.frontend.image_name }}
timestamp=`date "+%s"`
make push-frontend
make push-frontend
155 changes: 77 additions & 78 deletions ADOPTERS.md

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion CODE_OF_CONDUCT.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ representative at an online or offline event.

Instances of abusive, harassing, or otherwise unacceptable behavior may be
reported to the community leaders responsible for enforcement at
prithvi.raj@harness.io.
sayan.mondal@harness.io.
All complaints will be reviewed and investigated promptly and fairly.

All community leaders are obligated to respect the privacy and security of the
Expand Down
11 changes: 5 additions & 6 deletions MAINTAINERS.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,8 @@ chaos-sdk |go/python/ansible sdk |litmus-go,litmus-python,litmu
e2e |e2e-suite, e2e-dashboard |litmus-e2e |@uditgaurav, @Jonsy13 |@neelanjan00, @S-ayanide, @avaakash |
integrations |CI/CD plugins, wrappers |chaos-ci-lib, gitlab-templates, github-actions |@uditgaurav, @ksatchit |@ispeakc0de, @Adarshkumar14 |
helm-charts |control-plane, agent, experiments|litmus-helm |@Jasstkn, @ispeakc0de, @imrajdas, @Jonsy13 |@ksatchit, @uditgaurav |
documentation |platform-docs, experiment-docs |litmus-docs, mkdocs |@neelanjan00, @umamukkara, @ispeakc0de |@ksatchit, @ajeshbaby, @amityt, @uditgaurav |
websites |project website, chaoshub, documentation |litmus-website, charthub, litmus-docs |@umamukkara, @arkajyotiMukherjee, @S-ayanide |@SahilKr24, @hrishavjha, @ajeshbaby |

documentation |platform-docs, experiment-docs |litmus-docs, mkdocs |@neelanjan00, @umamukkara, @ispeakc0de |@ksatchit, @ajeshbaby, @amityt, @uditgaurav |websites |project website, chaoshub, documentation |litmus-website, charthub, litmus-docs |@umamukkara, @arkajyotiMukherjee, @S-ayanide |@SahilKr24, @hrishavjha, @ajeshbaby |
websites |project website, chaoshub, documentation |litmus-website, charthub, litmus-docs |@SahilKr24, @hrishavjha, @ajeshbaby |@umamukkara, @S-ayanide |
### Consolidated Maintainers List

```
Expand All @@ -36,16 +35,16 @@ websites |project website, chaoshub, documentation |litmus-website, cha
"Udit Gaurav",@uditgaurav,[email protected]
"Vedant Shrotria",@Jonsy13,[email protected]
"Uma Mukkara",@umamukkara,[email protected]
"Sahil KR",@SahilKr24,[email protected]
"Ajesh Baby",@ajeshbaby,[email protected]
"Hrishav Jha",@hrishavjha,[email protected]
```

### Consolidated Reviewers List

```
"Adarsh Kumar",@Adarshkumar14,[email protected]
"Akash Srivastava",@avaakash,[email protected]
"Ajesh Baby",@ajeshbaby,[email protected]
"Sahil Kumar",@SahilKr24,[email protected]
"Hrishav Kumar Jha",@hrishavjha,[email protected]
```

### Emeritus Maintainers
Expand Down
12 changes: 9 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -105,9 +105,18 @@ Fill out the [LitmusChaos Meetings invite form](https://forms.gle/xYZyZ2gTWMqz7x

### Videos

- [What if Your System Experiences an Outage? Let's Build a Resilient Systems with Chaos Engineering](https://www.youtube.com/watch?v=3mjGEh905u4&t=1s) @ [CNCF](https://www.youtube.com/@cncf)
- [Enhancing Cyber Resilience Through Zero Trust Chaos Experiments in Cloud Native Environments](https://youtu.be/BelNIk4Bkng) @ [CNCF](https://www.youtube.com/@cncf)
- [LitmusChaos, with Karthik Satchitanand](https://www.youtube.com/watch?v=ks2R57hhFZk&t=503s) @ [The Kubernetes Podcast from Google](https://www.youtube.com/@TheKubernetesPodcast)
- [Cultural Shifts: Fostering a Chaos First Mindset in Platform Engineering](https://www.youtube.com/watch?v=WUXFKxgZRsk) @ [CNCF](https://www.youtube.com/@cncf)
- [Fire in the Cloud: Bringing Managed Services Under the Ambit of Cloud-Native Chaos Engineering](https://www.youtube.com/watch?v=xCDQp5E3VUs) @ [CNCF](https://www.youtube.com/@cncf)
- [Security Controls for Safe Chaos Experimentation](https://www.youtube.com/watch?v=whCkvLKAw74) @ [CNCF](https://www.youtube.com/@cncf)
- [Chaos Engineering For Hybrid Targets With LitmusChaos](https://www.youtube.com/watch?v=BZL-ngvbpbU&t=751s) @ [CNCF](https://www.youtube.com/@cncf)
- [Cloud Native Live: Litmus Chaos Engine and a microservices demo app](https://youtu.be/hOghvd9qCzI)
- [Chaos Engineering hands-on - An SRE ideating Chaos Experiments and using LitmusChaos | July 2022](https://youtu.be/_x_7SiesjF0)
- [Achieve Digital Product Resiliency with Chaos Engineering](https://youtu.be/PQrmBHgk0ps)
- [Case Study: Bringing Chaos Engineering to the Cloud Native Developers](https://youtu.be/KSl-oKk6TPA) @ [CNCF](https://www.youtube.com/@cncf)
- [Cloud Native Chaos Engineering with LitmusChaos](https://www.youtube.com/watch?v=ItUUqejdXr0) @ [CNCF](https://www.youtube.com/@cncf)
- [How to create Chaos Experiments with Litmus | Litmus Chaos tutorial](https://youtu.be/mwu5eLgUKq4) @ [Is it Observable](https://www.youtube.com/c/IsitObservable)
- [Cloud Native Chaos Engineering Preview With LitmusChaos](https://youtu.be/pMWqhS-F3tQ)
- [Get started with Chaos Engineering with Litmus](https://youtu.be/5CI8d-SKBfc) @ [Containers from the Couch](https://www.youtube.com/c/ContainersfromtheCouch)
Expand All @@ -129,7 +138,6 @@ Fill out the [LitmusChaos Meetings invite form](https://forms.gle/xYZyZ2gTWMqz7x

Community Blogs:

- Daniyal Rayn: [Do I need Chaos Engineering on my environment? Trust me you need it!](https://maveric-systems.com/blog/do-i-need-chaos-engineering-on-my-environment-trust-me-you-need-it/)
- LiveWyer: [LitmusChaos Showcase: Chaos Experiments in a Helm Chart Test Suite](https://livewyer.io/blog/2021/03/22/litmuschaos-showcase-chaos-experiments-in-a-helm-chart-test-suite/)
- Jessica Cherry: [Test Kubernetes cluster failures and experiments in your terminal](https://opensource.com/article/21/6/kubernetes-litmus-chaos)
- Yang Chuansheng(KubeSphere): [KubeSphere 部署 Litmus 至 Kubernetes 开启混沌实验](https://kubesphere.io/zh/blogs/litmus-kubesphere/)
Expand All @@ -138,8 +146,6 @@ Community Blogs:
- Akram Riahi(WeScale):[Chaos Engineering : Litmus sous tous les angles](https://blog.wescale.fr/2021/03/11/chaos-engineering-litmus-sous-tous-les-angles/)
- Prashanto Priyanshu(LensKart):[Lenskart’s approach to Chaos Engineering-Part 2](https://blog.lenskart.com/lenskarts-approach-to-chaos-engineering-part-2-6290e4f3a74e)
- DevsDay.ru(Russian):[LitmusChaos at Kubecon EU '21](https://devsday.ru/blog/details/40746)
- Ryan Pei(Armory): [LitmusChaos in your Spinnaker Pipeline](https://www.armory.io/blog/litmuschaos-in-your-spinnaker-pipeline/)
- David Gildeh(Zebrium): [Using Autonomous Monitoring with Litmus Chaos Engine on Kubernetes](https://www.zebrium.com/blog/using-autonomous-monitoring-with-litmus-chaos-engine-on-kubernetes)


## Adopters
Expand Down
49 changes: 32 additions & 17 deletions ROADMAP.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,47 +11,62 @@ This document captures only the high level roadmap items. For the detailed backl
- Per-experiment minimal RBAC permissions definition
- Creation of 'scenarios' involving multiple faults via Argo-based Chaos Workflows (with examples for microservices apps like podtato-head and sock-shop)
- Cross-Cloud Control Plane (Litmus Portal) to perform chaos against remote clusters
- Helm3 charts for LitmusChaos (control plane and experiments)
- Helm charts for LitmusChaos control plane
- Helm Chart for LitmusChaos execution Plane
- Support for admin mode (centralized chaos management) as well as namespaced mode (multi-tenant clusters)
- Continuous chaos via flexible schedules, with support to halt/resume or (manual/conditional) abort experiments
- Provide complete workflow termination/abort capability
- Generation of observability data via Prometheus metrics and Kubernetes chaos events for experiments
- Steady-State hypothesis validation before, during and after chaos injection via different probe types
- Support for Docker, Containerd & CRI-O runtime
- Support for scheduling policies (nodeSelector, tolerations) and resource definitions for chaos pods
- ChaosHub refactor for 2.x user flow
- Support for ARM64 nodes
- Minimized role permissions for Chaos Service Accounts
- Scaffolding scripts (SDK) to help bootstrap a new chaos experiment in Go, Python, Ansible
- Support orchestration of non-native chaos libraries via the BYOC (Bring-Your-Own-Chaos) model
- Support for OpenShift platform
- Workflow YAML linter addition
- Integration tests & e2e framework creation for control plane components and chaos experiments
- Documentation (usage guide for chaos operator, resources & developer guide for new experiment creation)
- Improved documentation and tutorials for Litmus Portal based execution flow
- Add architecture details & design resources
- Define community sync up cadence and structure

------

### In-Progress (Under Active Development)
### In-Progress (Under Design OR Active Development)

- Support for all ChaosEngine schema elements within workflow wizard
- Workflow YAML linter addition
- Minimized role permissions for Chaos Service Accounts
- Native Chaos Workflows with redesigned subscriber to improve resource delegation, enabling seamless and efficient execution of chaos workflows within Kubernetes clusters.
- Introduce transient runners to improve resource efficiency during chaos experiments by dynamically creating and cleaning up chaos runner instances.
- Implement Kubernetes connectors to enable streamlined integration with Kubernetes clusters, providing simplified authentication and configuration management.
- Integrate with tools like K8sGPT to generate insightful reports that identify potential weaknesses in your Kubernetes environment before executing chaos experiments.
- Add Terraform support for defining and executing chaos experiments on infrastructure components, enabling infrastructure-as-code-based chaos engineering.
- Add SDK support for Python and Java, with potential extensions to other programming languages based on community interest.
- Include in-product documentation, such as tooltips, to improve user experience and ease of adoption.
- Implement the litmus-java-sdk with a targeted v1.0.0 release by Q1.
- Integrate distributed tracing by adding attributes or events to spans, and create an OpenTelemetry demo showcasing chaos engineering observability.
- Enhance the exporter to function as an OpenTelemetry collector, providing compatibility with existing observability pipelines.
- Add support for DocumentDB by replacing certain MongoDB operations, improving flexibility for database chaos.
- Upgrade Kubernetes SDK from version 1.21 to 1.26 to stay aligned with the latest Kubernetes features and enhancements.
- Refactor the chaos charts to:
- Replace latest tags with specific, versioned image tags.
- Consolidate multiple images into a single optimized image.
- Update GraphQL and authentication API documentation for improved clarity and user guidance.
- Add comprehensive unit and fuzz tests to enhance code reliability and robustness.
- Implement out-of-the-box Slack integration for better collaboration and monitoring during chaos experiments.

------

### Backlog

- Validation support for all ChaosEngine schema elements within workflow wizard
- Chaos-center users account to chaosService account map
- Provide complete workflow termination/abort capability
- Cross-hub experiment support within a Chaos Workflow
- Helm Chart for Chaos Execution Plane
- Enhanced CRD schema for ChaosEngine to support advanced CommandProbe configuration
- Support for S3 artifact sink (helps performance/benchmark runs)
- ChaosHub refactor for 2.x user flow
- Chaos experiments against virtual machines and cloud infrastructure (AWS, GCP, Azure, VMWare, Baremetal)
- Improved documentation and tutorials for Litmus Portal based execution flow
- Off the shelf chaos-integrated monitoring dashboards for application chaos categories
- Support for user defined chaos experiment result definition
- Increased fault injection types (IOChaos, HTTPChaos, JVMChaos)
- Special Interest Groups (SIGs) around specific areas in the project to take the roadmap forward

------

### Backlog

- Pre-defined chaos workflows to inject chaos during application benchmark runs
- Support for cloudevents compliant chaos events
- Improved application Chaos Suites for various CNCF projects
13 changes: 13 additions & 0 deletions adopters/organizations/emirates-nbd.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
## Emirates NBD

[Emirates NBD](https://www.emiratesnbd.com) is Dubai's government-owned bank and is one of the largest banking groups in the Middle East in terms of assets.

### **Why do we use Litmus.**

Resilience is a key aspect in creating fault-tolerant environments, and leveraging tools like Litmus has been instrumental in automating resilience testing. Litmus has enabled us to simulate real-time chaos scenarios, allowing us to thoroughly verify the robustness of both our infrastructure and applications.

### **How do we use Litmus.**

We began with a proof of concept (POC) on a playground cluster. While we explored other tools during this process, Litmus stood out significantly, not only in its capabilities but also due to its excellent user interface. Although we faced a few challenges during the initial setup of Litmus on OpenShift, the team provided timely support, helping us overcome these obstacles and successfully complete the POC.

Now, we've successfully deployed Litmus in a non-production cluster environment, and our SRE team is in the process of transitioning from manual chaos testing to automated chaos tests. This shift will enable us to schedule, automate, and efficiently track the outcomes of these tests, enhancing the resilience of our systems.
30 changes: 30 additions & 0 deletions adopters/organizations/outsystems.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
## OutSystems

[OutSystems](https://www.outsystems.com/) is a low-code development platform which provides tools for companies to develop, deploy and manage omnichannel enterprise applications. OutSystems was founded in 2001 in Lisbon, Portugal. In June 2018 OutSystems secured a $360M round of funding from KKR and Goldman Sachs and reached the status of Unicorn.

### **Leveraging Litmus Chaos Engineering in Kubernetes Infrastructure:**

We have a Kubernetes-based infrastructure pivotal to our operations, where reliability and resilience are paramount. Recognizing the need for robust testing methodologies, we turned to Litmus Chaos Engineering to fortify our systems against potential failures and to ensure seamless operations even under adverse conditions.

### **Why do we use Litmus:**

Litmus emerged as our tool of choice due to its comprehensive suite of chaos engineering capabilities tailored specifically for Kubernetes environments. Its versatility in orchestrating controlled chaos experiments aligns perfectly with our commitment to enhancing system reliability while maintaining agility.

### **Use Case and Implementation:**

We have seamlessly integrated Litmus Chaos Engineering into various stages of our development and deployment pipeline, spanning from development and testing to staging and production environments. Leveraging Litmus, we meticulously craft and execute chaos experiments, meticulously observing how our infrastructure behaves under stress, and ensuring it meets our predefined Service Level Objectives (SLOs) and Service Level Indicators (SLIs).

### **Achievements:**

Our journey with Litmus Chaos Engineering has been marked by significant milestones:

- Successful deployment of Chaos Center and Litmus Delegate, empowering us with centralized chaos management capabilities.
- Establishment of secure access to Chaos Center through HTTPS, coupled with domain customization for enhanced usability.
- Implementation of WAF ACL to restrict access to Chaos Center, ensuring secure interactions.
- Integration of Azure SSO for streamlined user management and authentication.
- Seamless connectivity between Chaos Center and target nodes, facilitating efficient chaos experimentation.
- Execution of numerous successful experiments, validating the resilience and scalability of our infrastructure.

### **Next Steps:**

As we continue to harness the power of Litmus Chaos Engineering, we remain committed to expanding our chaos engineering initiatives, further refining our chaos experiments, and continually enhancing the resilience of our Kubernetes infrastructure.
Loading

0 comments on commit 6a5bb75

Please sign in to comment.