Test individual rules without dry-running the whole cluster #903

mterhar · 2023-11-15T15:00:17Z

Is your feature request related to a problem? Please describe.

Currently, the DryRun configuration applies to all traces in all environments across an entire refinery cluster.

If someone wants to test rules in a specific environment or a specific rule in a high-traffic environment, the option doesn't exist.

Describe the solution you'd like

Create a configuration in the rules file that acts similarly to DryRun. Since the run is mostly wet I suggest a different word such as Test: true or Mode: test.

Within each rule there's a Drop: true or SampleRate: 1 that can be set. I'd put this configuration at the same layer so the conditions are applied and then the keep/drop/sample rate math doesn't change the behavior.

Describe alternatives you've considered

Attempting to run the whole cluster in dry-run impacts event count far more than large, complex deployments want.

Attempting to run a separate refinery to test rules or only testing them in lower environments means generating fake data and then it'll still be risky to push the new rules to production.

Additional context

Naming is hard:

Debug: true may imply that you can get to more than just the decision information.
- It won't help with debugging the conditions, for example.
- Debug sounds more like a development activity than an operating mode
Mode: test may be good if we have multiple modes in the future ..
- I can imagine modes like "active" "passive" "test" "debug" "strict", or "send-all" .
- honestly, I'm not sure of the benefit of multiple operating modes on a per-rule basis.
Test: true is not particularly descriptive.
It's really only sending everything and moving the sample rate decision to a secondary field.
Test could also be construed to mean something about integration tests rather than sample-rate-math-evaluation
Decision: could default to apply which is standard processing mode
- report-send which is dry-run-like
- count-drop which would record statistics but not actually deliver the spans
- report-skip would add a field to the span saying that it matched or didn't match the rule but then processed it through the rest of the rules.
- It's clear that it is only impacting the decision-part of the process... but also do operators know what that even means?

The text was updated successfully, but these errors were encountered:

mterhar added the type: enhancement New feature or request label Nov 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test individual rules without dry-running the whole cluster #903

Test individual rules without dry-running the whole cluster #903

mterhar commented Nov 15, 2023

Test individual rules without dry-running the whole cluster #903

Test individual rules without dry-running the whole cluster #903

Comments

mterhar commented Nov 15, 2023