-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFD 159 Discussion #123
Comments
Thanks again for writing this up! Based on talking to a few people about these issues, I think it would be useful to start with a clearer separation of the policy choice from the mechanism used to implement it. I think the policy we want is something like:
The mechanism is more complicated:
I know none of this is news to you, and much of it is reflected later in the RFD, but I think it would be useful to highlight the distinction. I say that because of the confusion I've seen around these issues. Some people think that if X=95, then we should just set the quota to 95% of the box's available storage [and pretend there's no MUSKIE_MAX_UTILIZATION_PCT]. We should explain here why if we do that, that would result in lots of request failures as zones fill up. We also may not be able to read from a totally full zone because nginx can't write to its request log. On the other hand, as the RFD mentions, we can't rely solely on MUSKIE_MAX_UTILIZATION_PCT because of the cases in production where that went wrong and we needed a backstop. These two considerations lead to the non-obvious result that if the quota is 95% and and MUSKIE_MAX_UTILIZATION_PCT limits us to 95%, then we wind up using only about 90% (0.95 * 0.95). My suggestion of policy above itself is also somewhat tied to the mechanism (because the idea of a target percentage is based in part on having implemented that), but I still think some distinction here is useful. Regarding the suggestion here:
I think it will under-use the box by about MUSKIE_MAX_UTILIZATION_PCT * 1 TiB.
Is that right, or is that backwards? I think the behavior here is that if the zone is 93.0001% full, Muskie treats that as 94% full (because Minnow uses If we want the target to be 95%, I think we'd set MUSKIE_MAX_UTILIZATION_PCT = 96. Then we'd be using 95% of the quota, which itself would be 1 TiB less than 95% of the box. I think that means we'd under-use each zone by MUSKIE_MAX_UTILIZATION_PCT * 1 TiB. Is that difference significant?
I've been wondering if we'd need to do this. Have you given much thought to how to do it? I imagine that we would have a SAPI tunable at the storage zone for the target fill (either as a percent or byte count), we'd have Minnow report this with the capacity record, and Muskie would use the value in the record as the target instead of its global limit. |
While hanging out in the OpenZFS project Slack thing, the subject of SPA slop space came up. Looking at it, I suspect it accounts for the described 3-4% disparity between what space exists at the There is a comment around /*
* Normally, we don't allow the last 3.2% (1/(2^spa_slop_shift)) of space in
* the pool to be consumed. This ensures that we don't run the pool
* completely out of space, due to unaccounted changes (e.g. to the MOS).
* It also limits the worst-case time to allocate space. If we have
* less than this amount of free space, most ZPL operations (e.g. write,
* create) will return ENOSPC.
*
* Certain operations (e.g. file removal, most administrative actions) can
* use half the slop space. They will only return ENOSPC if less than half
* the slop space is free. Typically, once the pool has less than the slop
* space free, the user will use these operations to free up space in the pool.
* These are the operations that call dsl_pool_adjustedsize() with the netfree
* argument set to TRUE.
*
* Operations that are almost guaranteed to free up space in the absence of
* a pool checkpoint can use up to three quarters of the slop space
* (e.g zfs destroy).
*
* A very restricted set of operations are always permitted, regardless of
* the amount of free space. These are the operations that call
* dsl_sync_task(ZFS_SPACE_CHECK_NONE). If these operations result in a net
* increase in the amount of space used, it is possible to run the pool
* completely out of space, causing it to be permanently read-only.
*
* Note that on very small pools, the slop space will be larger than
* 3.2%, in an effort to have it be at least spa_min_slop (128MB),
* but we never allow it to be more than half the pool size.
*
* See also the comments in zfs_space_check_t.
*/
int spa_slop_shift = 5;
uint64_t spa_min_slop = 128 * 1024 * 1024; There are also comments (referred to above) around the When this came up in the channel, there was some discussion of capping this value as 3.2% of a 250TB pool is 8TB, which feels like a lot of space to discard for all of the economic reasons mentioned in this RFD. |
Dave,
Ah. That's my mistake. I didn't realize minnow used a ceiling operation when it computed its usage percentage. That's confusing behavior. |
While it uses a ceiling calculation here in the minnow code to do the utilization calculation, in the picker code where its determined if a storage zones is too full to write to it uses a
|
Issue for discussing RFD 159 on Manta storage zone capacity
The text was updated successfully, but these errors were encountered: