-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFD 168 Bootstrapping a Manta Buckets deployment #130
Comments
Hi Kelly,
Thanks for putting this together. I have a few different thoughts,
though I imagine that a large chunk of this stems from my own lack of
understanding. All in all, this seems like a good direction.
I think it might be useful to include in the bootstrap process when the
various shards are created and how they are created. The RFD talks about
using manta-shardadm to specify '1.boray' and '2.boray', but it's not
immediately clear what is actually creating them. Similarly, I think
it'd be useful to indicate in the RFD how we know that enough of the
system has been updated such that we know we can actually start setting
this up. It'd be useful if this could also talk about whether or not it
will verify things like how manta-shardadm will know that the '1.boray'
is actually a boray related shard and has the right postgres versions,
etc.
Something I found confusing during this part as well is that it seems we
need to set the shards and their names before we've deployed any of the
images or things related to them. I guess there's a chicken and egg, but
how do we make sure that when we later deploy that they match and are
consistent?
Another thing is that rather than just creating a new more one-off
`manta-create-buckets-topology.sh`, maybe we should look at kind of
folding all of this as more first class things into either
manta-shardadm or manta-adm? I feel like we've had some more
centralization and more observability through moving things out of the
one-off bash scripts and that helps.
I'm not sure that it makes sense to serialize things to stdout by
default as it may be more challenging then to actually automate this as
ideally, we'd love to be able to use some kind of manta-adm tool to
actually initialize all of the buckets subsystem, installing some of the
various pieces, rather than halving a half dozen manual steps.
Regarding the electric-boray set up and the fact that the image start up
will check and verify what's going on? Can you explain a bit more about
the rationale for having this behavior? My main concern is that someone
will set the setting to auto-update and then this will cascade and say
we have something where we restart all the electric-boray's, but they
then all try to update because we forgot this was set, that could lead
to more downtime or issues. I'm sure there's a bit more to this and a
bunch of questions on what the right way to update all the rings in a
deployment are so that they have the same version. I'm just not sure how
much more benefit there is if the service is blocked while updating in
that state versus provisioning something new and tearing down the other
one. Maybe we could talk through a bit about how we'll make sure those
updates are atomic and can't be corrupted or lead us to operating with
a bunch of electric-boray instances in different states.
Thanks,
Robert
|
Thanks for the comments, Robert. I apologize for my delayed response.
This sounds good. I'll add some description about how the shards get created and the intended order of the related steps in the process. I left it out on the first pass because I wasn't sure if it would be useful, but your feedback tells me it is. Thanks!
This is a good point. I am not sure we do anything in this regard with moray shards today, but maybe prior to registering them with SAPI we need to ensure they exist and do a basic sanity check.
The process with shardadm is intended to happen after the shards have been provisioned. I'll mention that to clear it up and hopefully it will make more sense.
Ok, so maybe instead of creating yet another bash script we create a new program to encompass the function that both scripts serve. That sounds fine to me.
I think if we're replacing the scripts with a new program as previously mentioned then we can do all of this inside that program. I just mentioned stdout because node-fash exposes a command line utility for dumping the serialized ring to stdout, but if we're creating a program there's no reason to do that.
I was trying to take the first steps towards some of the support for changing the topology of the metadata system (i.e. relocating vnode data to new shards) with no system downtime or degradation in processing capability. I think there is a good case to make for having electric-boray update its config while running versus only updating the ring data through reprovisioning, but you are 100% correct there are risks and must be very good safeguards. I don't think it's necessary to couple this with the bootstrapping process. I was trying to get a bit of a head start, but thinking more about it it's not necessary to do right now. I'll edit the text to remove this part of the discussion and we can flesh it out separately at a later time. |
On 6/17/19 11:04 , Kelly McLaughlin wrote:
> It'd be useful if this could also talk about whether or not it will verify things like how manta-shardadm will know that the '1.boray' is actually a boray related shard and has the right postgres versions, etc.
This is a good point. I am not sure we do anything in this regard with moray shards today, but maybe prior to registering them with SAPI we need to ensure they exist and do a basic sanity check.
I don't think we do today. I suspect that's partially just because
there's only ever really been one kind of thing, so it's been easier to
get away with. I think if we add some kind of check as part of the SAPI
registration process that'll help.
Maybe the boray service could advertise something about it being boray
and some kind of semantic API version that we could use and check? I'm
only thinking about that as then it'll be different that will just get
an econnrefused if we do it in a moray or other service zone.
> Something I found confusing during this part as well is that it seems we need to set the shards and their names before we've deployed any of the images or things related to them. I guess there's a chicken and egg, but how do we make sure that when we later deploy that they match and are consistent?
The process with shardadm is intended to happen after the shards have been provisioned. I'll mention that to clear it up and hopefully it will make more sense.
Ah, gotcha. That makes sense. Thanks!
> Another thing is that rather than just creating a new more one-off `manta-create-buckets-topology.sh`, maybe we should look at kind of folding all of this as more first class things into either manta-shardadm or manta-adm? I feel like we've had some more centralization and more observability through moving things out of the one-off bash scripts and that helps.
Ok, so maybe instead of creating yet another bash script we create a new program to encompass the function that both scripts serve. That sounds fine to me.
> I'm not sure that it makes sense to serialize things to stdout by default as it may be more challenging then to actually automate this as ideally, we'd love to be able to use some kind of manta-adm tool to actually initialize all of the buckets subsystem, installing some of the various pieces, rather than halving a half dozen manual steps.
I think if we're replacing the scripts with a new program as previously mentioned then we can do all of this inside that program. I just mentioned stdout because node-fash exposes a command line utility for dumping the serialized ring to stdout, but if we're creating a program there's no reason to do that.
OK, that makes sense. Only real reason I brought this up is just the
general goal over time to consolidate set up so it could eventually be
as simple as something like 'manta-adm deploy buckets' or something
similar that makes more sense in manta-adm. I know it won't be that way
initially and that's fine.
> Regarding the electric-boray set up and the fact that the image start up will check and verify what's going on? Can you explain a bit more about the rationale for having this behavior? My main concern is that someone will set the setting to auto-update and then this will cascade and say we have something where we restart all the electric-boray's, but they then all try to update because we forgot this was set, that could lead to more downtime or issues. I'm sure there's a bit more to this and a bunch of questions on what the right way to update all the rings in a deployment are so that they have the same version. I'm just not sure how much more benefit there is if the service is blocked while updating in that state versus provisioning something new and tearing down the other one. Maybe we could talk through a bit about how we'll make sure those updates are atomic and can't be corrupted or lead us to operating with a bunch of electric-boray instances in different states.
I was trying to take the first steps towards some of the support for changing the topology of the metadata system (*i.e. relocating vnode data to new shards) with no system downtime or degradation in processing capability. I think there is a good case to make for having electric-boray updates it's config while running versus only updating the ring data through reprovisioning, but you are 100% correct there are risks and must be very good safeguards. I don't think it's necessary to couple this with the bootstrapping process. I was trying to get a bit of a head start, but thinking more about it it's not necessary to do right now. I'll edit the text to remove this part of the discussion and we can flesh it out separately at a later time.
OK. I see where this is coming from. I agree that getting out of
reprovision path and trying to drive towards something that we can do
with even less downtime would be great. I'm happy to help brainstorm on
that any time if it'd be useful.
|
I was thinking about this more and how to update the RFD to describe what we might do. I first thought we could check a minimum version of postgres that supports the required features. The buckets system will actually work with postgres 9.5 and later. We're targeting postgres 11, but there is not feature it provides that binds us to it. We do need the |
On 6/18/19 10:44 , Kelly McLaughlin wrote:
I was thinking about this more and how to update the RFD to describe what we might do. I first thought we could check a minimum version of postgres that supports the required features. The buckets system will actually work with postgres 9.5 and later. We're targeting postgres 11, but there is not feature it provides that binds us to it. We do need the `hstore` postgres extension present so we could check that, but this feels like it's moving towards a set of validations that easily gets stale and is forgotten about. A lot of this minimum requirement we are already encapsulating in the new `buckets-postgres` manta service so perhaps it's sufficient if the shardadm check verifies the shards are reachable on the network (similar to how `manta-adm show` might query services) and also ensure that each is an instance of the `buckets-postgres` service. Then it's left up to the service definition to describe what it needs to be valid and we don't have to expose those details to shardadm. What do you think of that approach?
I'm less familiar with how everything is organized, so if it's all being
encapsulated by the buckets-postgres service, then that seems reasonable
at first blush.
|
This is for discussion of RFD 168 Bootstrapping a Manta Buckets deployment.
The text was updated successfully, but these errors were encountered: