-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFD 170 Manta Picker Component #132
Comments
Thanks! I'm curious to see what other things are added to this RFD. Meanwhile, here are some questions to consider.
|
Thanks for the comments @KodyKantor. I'm interested to hear your thoughts on my responses below.
Once picker is registered with DNS/zookeeper muskie can look it up at
The picker queries shard 1 moray. Regardless, I believe today if the picker can't provide updated data it uses stale information(need to double check that). The Picker component could return a timestamp with each request letting the requester know how stale the data is.
This brings up a good point. It would probably make sense to allow for versioning via the header or the API path.
This would still be an improvement, but it would provide us less flexibility. I cannot be certain but I doubt we need the same number of pickers as we need muskies.
These questions are more about how muskie would leverage a separate picker component. I was originally thinking this would be a separate RFD. My immediate goal here is to provide a picker that can be used with the rebalancer RFD 162. However, I do see the additional risk here and perhaps, in the short to medium term, breaking out the picker as a separate SMF instance on the same muskie zone will satisfy the rebalancer requirements, provide some relief to MANTA-4091, and reduce overall risk. A couple considerations for this approach:
If we do ultimately decide to use the separate picker zone approach we could, in much the same way that GETs are handled, query 2 picker zones at once for each PUT and simply choose the first one that responds. If one is delayed or unavailable it is unlikely that two of say 50 pickers (a 90% reduction in our current deployment) are both pathological. Another idea would be to leave the choosing algorithm as part of the muskie zone, and only query the picker zone for an updated view of the storage nodes. This would essentially reduce the risk to the same one we have today where muskie uses stale data when the shard 1 moray is overloaded. The rebalancer is going to have to use it's own choosing algorithm anyway (or add additional filtering to the current choosing algorithm in the picker). One final alternative is to abandon this work completely and simply have the rebalancer query the |
|
I chose I was thinking of adding versioning to the header, but I'd be interested to hear your thoughts on path versioning vs header versioning.
I thought that muskie didn't expose picker APIs, but I could be mistaken. |
I don't have a strong opinion on headers vs path, more that were we have had versioned APIs it has generally been a source of regret. Path seems the more common approach, but where we do use versioning today (by way of restify) I think it is by header. |
Thanks for starting this up Rui. I have a couple of initial thoughts and some notes on the discussion that's happened so far.
Right now the RFD has the following assumption:
It'd be a little useful to discuss how we come to that. I agree in theory that should be true, but I guess it depends a lot on the implementation. Is there a reason that this is essential to the RFD? If this becomes false at some point, doe that invalidate anything that we have or we're planning? Is it just the issue over overload or is there something else here?
If we go down this route, how would you propose a client handle that stale data? At what point should it cut off from using a given picker versus asking another? Recording the timestamp is kind of useful, but then we should also suggest a policy for what that is. I think a lot of this will depend on what the interfaces will be, how many pickers are asked in parallel about information, and what the muskies and rebalancers end up needing to do. One gotcha with going to multiple pickers in parallel is it does mean that the pickers need to be able to handle 2-3x the reqs/sec that all the deployed muskies would if we do choosing in the pickers. That drops dramatically based on some of the other discussion points about how muskie uses picker.
If we're going to filter it out, I think it's probably better that it just become its own service rather than process. The main question is basically should this element of scale (the picker) be tied directly to another one (muskie). In general, if the pieces aren't implemented together for an explicit reason, then we should probably say no. This is part of what makes it complicated to scale binders as they always embed a zookeeper. Another way to look at this is should the life cycle of a picker be the same as the muskie? By default, I don't know of any reason it needs to be or that they'd be directly related. If we wanted to update one of the two, it'd be nice to do so without impacting the other. If we go down the path picker as being delivered in the muskie zone, we should still probably think of them as discrete components and a given muskie zone should be able to use all of the picker services and not rely on the in-zone one, even if it for some reason prefers it.
|
@cburroughs I'd be interested to hear more about what has caused the regret around versioning. Is it around the maintenance burden it requires or it just didn't turn out to serve the intended purpose or something else? Thanks! |
Oh, I'm very sorry for causing confusion. I meant to say that unversioned apis in Ttriton (that is not having a version) has been a source of regret as it fossilizes both client and server. |
In general I like the separate service idea. The picker inside muskie already functions independently of the HTTP request path handling. I think it'd be fine to just have the picker service just offer a current view of the available storage nodes rather than doing the actual choosing. The view should include a last modified timestamp and that would allow consumers of the service to make decisions about what to do if the data becomes too stale and the definition of too stale can be made by each consumer. If we go that route then it doesn't make sense to call it the picker actually, but the point is the source of contention is the querying of the shard 1 morays so that is what the new service primarily needs to be concerned with to address any potential scaling problems. The muskie change could be very small in that the picker module could remain mostly intact, but using a client for this new service versus the moray client used today. One thing to note in the RFD is that the picker could be scaled up just like many other services such as muskie. It doesn't really matter which instance of picker a consumer's request is serviced by because they are all presenting a view of the same data. |
Ah ok, great. Thanks for the clarification! |
@rmustacc Thanks for your comments. Responses inline.
I am not convinced there is a need for a For the future, when muskie switches over to the separate picker model, it could either:
One advantage to #1 is that there is almost certainly less data going over the wire, but longer processing time on the picker zone. Also, the rebalancer needs (highly desires) #2.
That's a good point. I think breaking chunks at the server boundary will be sufficient. Muskie currently operates off of data that is 30s old. I don't think the changes between when the first and last chunks arrive would be any more significant than they currently are.
Do you mean to say that this RFD should cover all consolidation of the shard 1 moray's consumers?
If the assumption doesn't hold then there is significantly less value in splitting off the picker. The overall additional load to ap-southeast (for example) would be about 0.15% if we just had the rebalancer query the shard 1 moray directly. (There are currently 656 muskie processes in ap-southeast, each with its own picker, 1/657)
I'd like this to be covered in a separate RFD. I think the answer here depends on the consumer, as well as the scale and workload of the region. Perhaps, providing the timestamp is not valuable, but I could foresee a circumstance where we may regret not providing some primitive quantification of value.
That is a good point. We could impose a delay instead of a parallel request. Say if you haven't gotten a response in query another picker and use the data that you get back first, that muskie could then update its preferred picker based on the result.
This is what I am currently working on as we iterate on this discussion. |
This is an issue to discuss RFD 170 Manta Picker Component.
The text was updated successfully, but these errors were encountered: