-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Valkey triggers proposal #9
base: main
Are you sure you want to change the base?
Changes from all commits
3feff43
037ec1e
d3db587
969c451
eb0ffe7
2ee0409
0ee10e8
5f6640b
5aa4d29
5bab53b
321b769
c37e26a
cd4cfef
b905acf
57a0df4
3f720ce
4329189
3273a7f
efd9a2a
4eca10c
cebd676
5db2bbc
2ea37a9
2868333
3becac9
0b74502
c682a75
059dede
4768e32
c9db2e5
d1af248
cbc13b6
bb1bddf
85d89e7
1c9181e
8f1bfe2
9fb30ae
0570091
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,234 @@ | ||||||
--- | ||||||
RFC: 9 | ||||||
Status: <Proposed> | ||||||
--- | ||||||
|
||||||
# Valkey Triggers RFC | ||||||
|
||||||
## Abstract | ||||||
Valkey triggers are persistent procedures which are executed in response to some keyspace event. | ||||||
Today [Valkey keyspace Notifications](https://valkey.io/topics/notifications/) offer the ability to publish events to dedicated pub/sub channels. | ||||||
Valkey triggers add the ability to place extending logic on such keyspace events and to localize actions on the server side reducing the need to monitor and react to events on the application side. | ||||||
We propose to extend the existing [Valkey Functions](https://valkey.io/topics/functions-intro/) with the ability to define trigger function calls on specific keyspace events. | ||||||
As they are being integrated into the Valkey Functions infrastructure, Triggers are persisted as part of the user data. | ||||||
|
||||||
## Motivation | ||||||
|
||||||
1. ***Localized actions.*** | ||||||
Today different application use cases rely on [Valkey keyspace notifications](https://valkey.io/topics/notifications/) in order to manage the database. For example, consider a case were a key is referenced in different places (like SortedSet and lists). In order to support TTL logic a cleanup needs to be made when the key is evicted to remove it from the different lists or sets. It is possible to have the application listen for eviction events and perform the cleanup, however this would require the application to rely on non-persistent subscribe connections, implement application side logic and extra network hops, which can cause the cleanup to be performed long after the key was evicted. | ||||||
For example lets take a commonly used pattern of scheduled tasks. | ||||||
In such cases the user usually registers tasks in a ZSET with the matching execution time as the score. | ||||||
Usually what is being done is setting a puller job in the application side which periodically reads all items in the set which have scores smaller than the current timestamp, and issue eval/fcall commands back to Valkey with the relevant tasks to run. | ||||||
This pattern introduces some waste as the application needs to maintain a puller job and perform the round trip back to the application in order to execute the scheduled operations. | ||||||
With Valkey triggers this can be achieved without the need for a puller job by the user. | ||||||
In order to achieve that we can use: | ||||||
- ZSET z for holding scheduled tasks | ||||||
- Key k to manage the next task execution time | ||||||
|
||||||
The external application code will only have to add tasks to the ZSET **z** with score matching their required execution time. | ||||||
1. once the task is added to **z** a trigger code will be executed which will take the minimal score from the ZSET **z**, and apply the diff to the current time to the TTL of key **k**. | ||||||
2. once key **k** has expired, a trigger will be executed which will remove and execute the tasks from the zset **z** that have scores lower than the current time. | ||||||
|
||||||
Here is an example library code to schedule tasks to be executed after/every several seconds | ||||||
|
||||||
![image](https://user-images.githubusercontent.com/88133677/225652613-aa85beee-05f1-4c5e-abe7-1d334b0e882f.png) | ||||||
|
||||||
So in order to use them the user can simply register operations to be triggered: | ||||||
|
||||||
``` | ||||||
fcall schedule 0 'server.call("PUBLISH", "scheduled", "this is a msg from the past")' "3" | ||||||
``` | ||||||
will cause the message to be published 3 seconds after the `fcall` is processed. | ||||||
|
||||||
``` | ||||||
fcall every 0 "3" 'redis.call("PUBLISH", "scheduled", "this is an annoying msg")' | ||||||
``` | ||||||
will cause the specified message to be published every 3 seconds. | ||||||
Note that the same concept can be used to implement HASH members eviction! | ||||||
|
||||||
2. ***Flexibility of extensibility.*** | ||||||
There are some cases were application needs to extend the logic of server side operations. In some of the cases it might be problematic making the change on the application | ||||||
side, either because of the risk to deploy the application part or since the operation is not triggered by the application (eg evictions, expirations etc...). | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Something went wrong with the line wrapping here. |
||||||
For example, consider the current implementation of [Valkey keyspace notifications](https://valkey.io/topics/notifications/). The existing mechanism relies on non-persistent pub/sub notifications. | ||||||
In order to persist the notifications it is possible to write them into a sorted set. | ||||||
Triggers can provide this ability in a very straightforward way. For example, consider the following function library: | ||||||
|
||||||
![image](https://user-images.githubusercontent.com/88133677/225616988-b9bb8730-59c1-478b-ae21-ece3b8b3a617.png) | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. See comment on previous image. The function is |
||||||
|
||||||
In this implementation we will capture each keyspace event and place the relevant key name and event on a dedicated stream called | ||||||
"keyevent-notifications". | ||||||
|
||||||
``` | ||||||
> set a b | ||||||
OK | ||||||
> lpush mylist f | ||||||
(integer) 1 | ||||||
> xread streams keyevent-notifications 0 | ||||||
1) 1) "keyevent-notifications" | ||||||
2) 1) 1) "1678974201461-0" | ||||||
2) 1) "a" | ||||||
2) "new" | ||||||
2) 1) "1678974201461-1" | ||||||
2) 1) "a" | ||||||
2) "set" | ||||||
3) 1) "1678974210181-0" | ||||||
2) 1) "mylist" | ||||||
2) "new" | ||||||
4) 1) "1678974210181-1" | ||||||
2) 1) "mylist" | ||||||
2) "lpush" | ||||||
``` | ||||||
|
||||||
[Valkey keyspace notifications](https://valkey.io/topics/notifications/) has another major disadvantage: they are not reported on the cluster-bus. While this might have good reasoning (eg avoid cluster-bus load, duplication of events etc...), this makes Valkey clients almost unable to manage key event listeners on Valkey clusters. Valkey triggers can enable the user to decide to publish keyspace events on the cluster bus. This way the trigger code can make smart decisions about publishing of the event and then use the 'PUBLISH' call in order to propagate the event to all cluster nodes. | ||||||
|
||||||
3. ***Security.*** | ||||||
In some cases there is a need to limit the access to the cache. Currently Valkey provides restricting functionality via [Access Control Lists](https://valkey.io/topics/acl/). | ||||||
However what if there is a need to access restricted parts of the dataset without granting real permissions to clients? | ||||||
Since a trigger can operate without explicit user request, it can potentially be operated under any defined ACL user which will enable it to operate actions on restricted parts of the dataset. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a very important and subtle issue. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think users who are using ACLs are probably used to to taking such decisions. It's not like the user will define a trigger and will just not consider the access requirements |
||||||
|
||||||
### Can't the same be done with modules? | ||||||
Modules have the ability to build logic placed on top of keyspace events. However Modules provide a much more restrictive form of logic which is much harder to implement and dynamically replace. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How is the logic more restrictive? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. maybe I used the wrong phrase. Managing modules require much more effort to maintain. |
||||||
By building this new logic inside the Valkey function library users will have a much simpler way to create stored procedures which are persistent, managed and replicated. | ||||||
|
||||||
|
||||||
## Design and Specification | ||||||
|
||||||
### Triggers as part of function libraries | ||||||
Valkey triggers will be implemented as an extension of the Valkey [Function Libraries](https://valkey.io/topics/functions-intro/) . | ||||||
A trigger is basically a Valkey function which will be called when some specific keyspace event is issued. | ||||||
In order to register a specific function as trigger, one will simply have to use the new **"register_trigger"** API which will be available for the different library engines. | ||||||
For example for the (currently only) LUA engine the registration code will look like: | ||||||
|
||||||
![image](https://user-images.githubusercontent.com/88133677/225090299-1cf05508-75e6-43dd-a90f-05ca91dd3a6f.png) | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. See comments on previous images. |
||||||
|
||||||
### Register trigger with named arguments form | ||||||
In order to provide some extra arguments to the trigger, the new API will also support the named arguments form. | ||||||
For example: | ||||||
|
||||||
![image](https://user-images.githubusercontent.com/88133677/225531522-d3f66a00-bd90-490b-98c8-11b62c3f3eba.png) | ||||||
|
||||||
### The Trigger code block | ||||||
As stated before, a trigger is basically a specific engine function. | ||||||
The function synopsis will be identical for all triggers which will get the following parameters: | ||||||
|
||||||
**a. key -** the key for which the event was triggered | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So, I'm a little confused here. It appears that you're using something different from the keyspace/keyevent notification format? Why not the same? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The event description is identical to the channel name used to report the keyspace events. |
||||||
|
||||||
**b. event -** a string specifying the specific event (ie. “expire ”, “set”, “move”, “store” etc...) | ||||||
|
||||||
**c. db -** the id of the db context for which the event is triggered (note it is not necessarily the current db scope the trigger is executed in, ie. MOVE event). | ||||||
|
||||||
For LUA engine based functions, the key will be supplied as the first argument in the keys array (i.e. KEYS[1]). | ||||||
The **event** and **db** will be supplied in the ARGS array. (event will be ARGS[1] and db will be ARGS[2]). | ||||||
|
||||||
|
||||||
### Triggering Events | ||||||
|
||||||
Valkey already has a feature to report different [keyspace notifications](https://valkey.io/topics/notifications/) which currently provides the ability to publish keyspace events to registered subscribers. The implementation should introduce the ability to set a trigger function to run on **ANY** event which is defined for [keyspace notifications](https://valkey.io/topics/notifications/). | ||||||
|
||||||
### Triggers configuration | ||||||
By design Triggers will not require ANY configuration in order to work. The Triggers will be part of the loadable function libraries and should be able to run on the target events even if the user **DID NOT** explicitly configure keyspace-notifications. | ||||||
|
||||||
### Atomicity and triggers | ||||||
|
||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I feel that this deserves a much more detailed discussion. What is atomicity in general? What atomicity promises does Valkey currently make? How will these promises change when triggers are being used? According to Atomicity, "Atomicity is the guarantee that series of database operations in an atomic transaction will either all occur (a successful operation), or none will occur (an unsuccessful operation)." My understanding is that Valkey only offers a very limited form of atomicity. For simple commands, Valkey matches the general definition. MULTI-EXEC transactions can fail in the middle, so that changes prior to the failure happen, while changes after the failure do not. Can a trigger cause a MULTI-EXEC transaction to fail in the middle? Could a trigger cause a simple command to fail in the middle? For example, could a trigger be used to check that each new element of list For example, would it be possible to define a trigger that causes It seems to me that in order to achieve any kind of atomicity, the trigger must be executed inside the same lock that guards the database update and scheduling of pubsub event notifications. Note: modules provide callbacks that will also be executed before this lock is released while pubsub event notifications are delivered after this lock is released. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
It is not just about "real fail" it is about logical failure. MULTI-EXEC can assume nothing changed once they were issued (using the key watch) however the key might still be changed via triggers while the transaction is executed. Maybe this is wanted and maybe not.
So it is true that the module notifications are "delivered" once the action is performed. the keyspace notifications are also "written" at the same time, however the application receives them only later on and needs to handle the potential lagging in it's knowledge of the data state. In both options I described, the triggers are executed under the same "lock" meaning as part of the same "ExecutionUnit", the only question is if it should be done in the middle of the ExecutionUnit or at the end of it. IMO keeping the same way application reacted to keyspace operations without having to operate a complete late roundtrip following keyspace notification would greatly simplify the understanding of this feature, while also providing the ability to make sure it will be executed in the same ExecutionUnit. |
||||||
All keyspace events are triggered immediately after the operation which triggered them. If we allow the triggers to be executed when the keyspace event has been triggered, | ||||||
It means allowing a write call performed in the middle of operating another command. Another (technical) issue is related to nested scripts. Currently there is a [protection](https://github.com/valkey-io/valkey/blame/unstable/src/script_lua.c#L891) in Valkey code preventing to execute lua script from a scripts. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
You wrote: "keyspace events are triggered immediately after the operation which triggered them". So, "If we allow the triggers to be executed when the keyspace event has been triggered", then the trigger is executed after the operation. That means any writes inside the trigger-function are not "in the middle of executing another command". Perhaps I have misunderstood something. Please provide more details. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe I was not clear. Lets take the msetnx command (for example) the command will issue keyspace event after each key set. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How does this work with [ There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was hoping to discuss this section. IMO we should NOT break atomicity with triggers, although this was the way I implemented the PoC. allowing triggers during scripts and transactions can indeed lead to much confusion but alternatively, queuing them and allow them to be executed after the transaction can be problematic since the trigger context might be lagging behind the data. I agree this part is the most problematic. |
||||||
|
||||||
There are 2 design options: | ||||||
|
||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we want both pre and post triggers? A pre-trigger would be helpful for enforcing constraints. Should we provide access to the value of the key, not just its name, either before or after the command was executed (or possibly both)? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I originally thought so, but for the first stage we can consider simplifying things and evolve later? |
||||||
1. Post execution of triggers. Much like what was provided for modules with `VM_AddPostNotificationJob`, which was introduced to allow modules to | ||||||
perform write operations on keyspace events as part of postExecutionUnitOperations. It is possible to follow the same logic as modules and share the same notification mechanism ie. to execute the trigger during the unit operation completion. | ||||||
This will help prevent breaking atomicity of operations and better follows the way applications are currently reacting to keyspace events. The downside of this option is that since there is no | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Based on you examples, "the way applications are currently reacting to keyspace events" is completely external to the Valkey server. I think we need an in-depth discussion of how closely we want triggers to "follow" that. |
||||||
grantee about the state of the data after the operation is completed. | ||||||
|
||||||
2. Allow nested script calls for trigger scripts. This suggestion will require some handling of nested lua run contexts, but during PoC was tested to work. | ||||||
|
||||||
### recursive trigger flow or "should trigger trigger trigger?" | ||||||
|
||||||
We do not want to reach a nested call of triggers and a potential endless loops. According to this design a trigger execution as a result of some keyspace event is a terminal state. | ||||||
we could keep some recursion depth counter that will stop triggering once it reaches some depth, but that would cause some unexpected behavior for the library developers, and difficulty debugging cases where triggers where not executed. | ||||||
Another important aspect to consider is if modules based keyspace events calbacks should be triggered from trigger actions and vice versa. | ||||||
In the scope of this document I will assume that the trigger based actions **WILL NOT** cause matching module calbacks to be called and that module actions performed during it's callback operation **WILL NOT** cause triggering triggers, but each will be executed only in 1 level of nesting. | ||||||
|
||||||
So for example lets say I have a module callback **cb** set on some keyspace event **e1** and a trigger action **t** registered on some event **e2**. | ||||||
when the event **e1** is issued the **cb** will be executed and will cause event **e2** which will **NOT** trigger **t**. | ||||||
Also in case **e2** is issued the **t** is being executed and cause event **e1** which will **NOT** trigger **cb**. | ||||||
|
||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am uncomfortable with this approach. The idea that a trigger will sometimes not be executed seems very confusing. I agree that fully recursive triggers offer yet another way to shoot yourself in the foot, but I would like a clearer idea of who we expect to be writing triggers and how they will be used. It would be helpful to have a much richer set of examples that go into the same kind of depth as your scheduling example. If you simply want to prevent loops, another option (which still makes me uncomfortable) would be to track the call stack and refuse to execute a trigger that is already executing. No matter what we do, as long as the language used to define triggers is sufficiently powerful, there will be crashes, deadlocks, and other very bad things. It seems to me that we must draw a distinction between triggers in development and triggers in production. We should expect and allow developers to code bad triggers while they are building and testing an application and provide good support (as you describe below) for debugging. If a bad trigger reaches production (perhaps it triggers itself, causing an infinite loop), what should happen? If it simply aborts the current transaction, will it leave the data in a consistent state? Should it save the data, then terminate the server? When a bad module is loaded in production we make no promises at all - should we do the same for bad triggers? If we are making promises with respect to bad triggers, what exactly are those promises (e.g., they will never crash the server, or they will never enter infinite loops, etc.)? |
||||||
### Scope of a trigger | ||||||
|
||||||
Whenever some event triggers an action, we will need this action to be processed in the scope of a specific database and ACL user. | ||||||
|
||||||
#### Database context | ||||||
Since Valkey triggers are built on keyspace events, the DB scope will be the same as the keyspace the event was issued on. | ||||||
|
||||||
#### ACL User scope | ||||||
By default the scope of the user will by the superuser which will not be limited by any authentication restrictions. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
This seems exceedingly dangerous. I would be more comfortable with no default, so that the developer is more likely to be aware of the risks. Or we could default to a user without write permissions, so that the developer must do something to explicitly authorize entry into a higher-risk domain. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This should default to least privilege rather than greatest. |
||||||
This makes sense as the triggers might be needed in order to operate a managed operation using some commands which should be otherwise restricted. | ||||||
However in some cases the user might want the trigger to operate under some ACL restrictions. | ||||||
The problem is that triggers might not operate in the scope of a specific external client (eg expiration events), so we suggest during trigger registration it will be possible to specify | ||||||
the name of the ACL user to attach to the trigger run context. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What should happen if the trigger issues a command that is denied by the ACL restrictions? The register_trigger function looks like this in one of the images:
(Note: because it is an image, you cannot search for it. register_trigger could look at the keyspace-event and require that an ACL user is specified when the event does not have an associated ACL user (e.g., eviction). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
So the trigger will fail in this case.
I am sorry - you are right, these images were taken from over 2 years ago when I created a PoC which did not have the ACL addition yet. You are right that the register_trigger should be able to accept a username to run in it's context. this does create some dependency on the ACL and function loading though.... |
||||||
|
||||||
### New Commands | ||||||
1. `CLIENT TRIGGERS [ON|OFF]` new subcommand. Sometimes it will be required to mute trigger execution in some context (say administrative fix to the dataset) | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why not simply How about a command to enable or disable specific triggers? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I wanted this as a way to prevent triggering on a specific client which is probably an admin client or something called from a script. in order to generically disable triggers we can have a global config. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
We can extend that. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. All or nothing? What if someone wanted to disable some triggers? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We can extend this in the future if we want. the point of this ability is that the developer/admin wants to access the data and have no interference of triggers while doing so. |
||||||
In order to support it the specific client will be able to mute triggers execution. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why is there a "specific client" in this sentence? Is it intended to only disable triggers that would have been fired by the current client? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes |
||||||
Response will always be "O.K". | ||||||
2. The `FUNCTION LIST` command output will now include a new part named "triggers". | ||||||
Much like functions the triggers part will describe a per-trigger information: | ||||||
|
||||||
``` | ||||||
"library_name" | ||||||
"mylib" | ||||||
"engine" | ||||||
"LUA" | ||||||
"functions" | ||||||
1) "name" | ||||||
2) "myfunc" | ||||||
3) "description" | ||||||
4) <some description> | ||||||
5) "flags" | ||||||
6) <relevant flags> | ||||||
"triggers" | ||||||
1) "name" | ||||||
2) "trigger_name" | ||||||
3) "event" | ||||||
4) "__keyspace@0__:*" | ||||||
7) "calls" | ||||||
8) "<the total number of times the trigger was called>" | ||||||
9) "errors" | ||||||
10) "<the total number of errors issued during all executions of this trigger>" | ||||||
11) "total_duration" | ||||||
12) <the aggregated total duration time of all executions of this trigger> | ||||||
``` | ||||||
|
||||||
### Benchmarking (Optional) | ||||||
|
||||||
TBD | ||||||
|
||||||
### Testing | ||||||
TBD | ||||||
|
||||||
### Observability | ||||||
|
||||||
#### general statistics | ||||||
The "Stats" info section will be added with the following statistics: | ||||||
- **total_trigger_calls -** the number of general trigger calls. this will not be a teardown list of calls per trigger function (this can be taken from the function list command as will be explained later) | ||||||
- **total_trigger_errors -** The total number of errors during trigger execution (an error is counted for each time the afterErrorReply is called). | ||||||
- **total_trigger_duration -** The total time spent running trigger related code. | ||||||
|
||||||
### Per trigger statistics | ||||||
The same global statistics will be available on a per-trigger resolution via the `FUNCTION LIST` command: | ||||||
- "calls" - the total number of times the trigger has called | ||||||
- "errors" - the total number of errors issued during this trigger run | ||||||
- "total_duration" - the aggregated total duration time this trigger run | ||||||
|
||||||
*** Open question *** | ||||||
should `CONFIG RESETSTAT` also reset the triggers statistics, or should that only be done when functions lib is reloaded? | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think they should be reset at the same time that function-statistics are reset. |
||||||
|
||||||
### Debug mechanism | ||||||
|
||||||
When Valkey functions and scripts are executed via FCALL/EVAL, it is possible to report the errors back to the calling client. | ||||||
Triggers, however, are silently executing which makes it hard for the user to understand why the trigger did not work and what errors occurred during execution of the trigger. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In this context I take the user to be a developer. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes |
||||||
The suggestion here is to enable reporting error msg on a predefined pub/sub channel: | ||||||
1. All error msgs will be intercepted as deferred errors (much like modules do). | ||||||
2. After trigger completes run, in case errors were issued, it will update the stats with the number of errors. | ||||||
3. The trigger will report the errors to a dedicated channel in the following structure: | ||||||
`__triggers@errors__:<trigger-name>` | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why isn't the log sufficient? Based on how errors are counted, it seems that a triggered function might attempt to report an error that should be delivered to the client. This suggests that the message will not normally be delivered. What would be delivered? I have not worked with LUA or EVAL in the past. It seems likely that the answers to some of my questions would be obvious if I had. Please point me at any reading you think would help. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Since the trigger execution is hidden from the user we cannot just return it as an error. For the case when the trigger is issued based on internal events (eg evictions) there is no request context to issue the error to.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This image could easily be replaced by the actual code. Then the code would be searchable. Replace instances of "redis" in the code with "server".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes please. The use of images is also inaccessible.