-
Notifications
You must be signed in to change notification settings - Fork 702
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
move clientCron onto a separate timer #1387
base: unstable
Are you sure you want to change the base?
Conversation
Signed-off-by: Jim Brunner <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems right to me. Are there any possible unintended behavior change due to decoupling this from serverCron? Would be good to callout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense to me. The code looks correct to me
Co-authored-by: Binbin <[email protected]> Signed-off-by: Jim Brunner <[email protected]>
Signed-off-by: Jim Brunner <[email protected]>
Signed-off-by: Jim Brunner <[email protected]>
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## unstable #1387 +/- ##
============================================
- Coverage 70.68% 70.67% -0.01%
============================================
Files 118 118
Lines 63550 63565 +15
============================================
+ Hits 44919 44924 +5
- Misses 18631 18641 +10
|
* - The minimum rate will be defined by server.hz | ||
* - At least CLIENTS_CRON_MIN_ITERATIONS will be performed each cycle | ||
*/ | ||
#define CLIENTS_CRON_MIN_ITERATIONS 5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's better to align the name with MAX_CLIENTS_PER_CLOCK_TICK
and put it to server.h
together, we have the chance now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
server.h
is a big dumping ground - and included almost everywhere. I don't see a good reason to move an existing define which is already localized and move it into a global namespace.
Note: I wanted to leave as much code untouched as possible (including this define). If that isn't a goal, I'd prefer making this a local const
inside the function rather than moving the existing define to a more global namespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we wanna minimize the scope, we can put MAX_CLIENTS_PER_CLOCK_TICK
here and change CLIENTS_CRON_MIN_ITERATIONS
to MIN_CLIENTS_PER_CLOCK_TICK
src/server.c
Outdated
} | ||
/* A separate timer for client maintenance. Runs at a variable speed depending | ||
* on the client count. */ | ||
if (aeCreateTimeEvent(server.el, 1, clientsTimerProc, NULL, NULL) == AE_ERR) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'd love to reuse the clientsCron
, just like serverCron
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I intentionally didn't use "cron" because the name is ambiguous. This function IS a time proc as defined by AE.
Naming is consistent with the defrag time proc (activeDefragTimeProc
)
We have other functions called "cron" that are simply functions called by a timer proc. Like replicationCron()
, databasesCron()
.
@@ -5549,7 +5555,7 @@ sds genValkeyInfoString(dict *section_dict, int all_sections, int everything) { | |||
"uptime_in_seconds:%I\r\n", (int64_t)uptime, | |||
"uptime_in_days:%I\r\n", (int64_t)(uptime / (3600 * 24)), | |||
"hz:%i\r\n", server.hz, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think it's better to show clients_hz
here, since it is out of the hz control
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what you're asking/suggesting here. I don't think we can just change the definition of the existing metric.
I think maybe you're suggesting a new metric? Currently the required hz for the clientsTimeProc
is computed locally to the function, in each iteration. Are you suggesting adding a new field to server...
and adding a metric above hz
with clients_hz
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, a new metric, since the clients_hz
can be changed and affect the poll_wait's sleep time, this will further affect the CPU usage
src/server.c
Outdated
* - At least CLIENTS_CRON_MIN_ITERATIONS will be performed each cycle | ||
*/ | ||
#define CLIENTS_CRON_MIN_ITERATIONS 5 | ||
long long clientsTimerProc(struct aeEventLoop *eventLoop, long long id, void *clientData) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should also count the time into server.el_cron_duration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently server.el_cron_duration
is set to the duration of the most recent serverCron
. It's not cumulative or additive. Then, in beforeSleep()
, we add (locals) duration_before_aof
& duration_after_write
and record the sum with durationAddSample
which both accumulates a total duration AND increments a counter of samples taken.
I think you are suggesting that inside serverCron
we should alter the existing code to do server.el_cron_duration += ...
and also increment server.el_cron_duration
in this function also.
If we do this, it will affect the samples taken counter as we will get samples for both serverCron
and the clientTimeProc
. Usually, these won't happen in the same event loop cycle, but at times they might align.
Thoughts?
As a side note, I notice that server.el_cron_duration
is defined as monotime
. That is not the intended usage for the type. monotime
is intended as a timestamp (a point in time). Durations should not be monotime
as they are not timestamps. This is shown in monotonic.h
in that all of the duration functions return uint64_t
, not monotime
.
Signed-off-by: Jim Brunner <[email protected]>
Signed-off-by: Jim Brunner <[email protected]>
Signed-off-by: Jim Brunner <[email protected]>
Signed-off-by: Jim Brunner <[email protected]>
Signed-off-by: Jim Brunner <[email protected]>
Signed-off-by: Jim Brunner <[email protected]>
Signed-off-by: Jim Brunner <[email protected]>
Signed-off-by: Jim Brunner <[email protected]>
The
serverCron()
function contains a variety of maintenance functions and is set up as a timer job, configured to run at a certain rate (hz). The default rate is 10hz (every 100ms).One of the things that
serverCron()
does is to perform maintenance functions on connected clients. Since the number of clients is variable, and can be very large, this could cause latency spikes when the 100msserverCron()
task gets invoked. To combat those latency spikes, a feature called "dynamic-hz" was introduced. This feature will runserverCron()
more often, if there are more clients. The clients get processed up to 200 at a time. The delay forserverCron()
is shortened with the goal of processing all of the clients every second.The result of this is that some of the other (non-client) maintenance functions also get (unnecessarily) run more often. Like
cronUpdateMemoryStats()
anddatabasesCron()
. Logically, it doesn't make sense to run these functions more often, just because we happen to have more clients attached.This PR separates client activities onto a separate, variable, timer. The "dynamic-hz" feature is eliminated. Now,
serverCron
will run at a standard configured rate. The separate clients cron will automatically adjust based on the number of clients. This has the added benefit that often, the 2 crons will fire during separate event loop invocations and will usually avoid the combined latency impact of doing both maintenance activities together.The new timer follows the same rules which were established with the dynamic HZ feature.
MAX_CLIENTS_PER_CLOCK_TICK
)CLIENTS_CRON_MIN_ITERATIONS
)The delay (ms) for the new timer is also more precise, computing the number of milliseconds needed to achieve the goal of reaching all of the clients every second. The old dynamic-hz feature just performs a doubling of the HZ until the clients processing rate is achieved (i.e. delays of 100ms, 50ms, 25ms, 12ms...)