RFD 168 Bootstrapping a Manta Buckets deployment #130

kellymclaughlin · 2019-05-21T20:49:33Z

This is for discussion of RFD 168 Bootstrapping a Manta Buckets deployment.

rmustacc · 2019-06-03T16:04:45Z

Hi Kelly, Thanks for putting this together. I have a few different thoughts, though I imagine that a large chunk of this stems from my own lack of understanding. All in all, this seems like a good direction. I think it might be useful to include in the bootstrap process when the various shards are created and how they are created. The RFD talks about using manta-shardadm to specify '1.boray' and '2.boray', but it's not immediately clear what is actually creating them. Similarly, I think it'd be useful to indicate in the RFD how we know that enough of the system has been updated such that we know we can actually start setting this up. It'd be useful if this could also talk about whether or not it will verify things like how manta-shardadm will know that the '1.boray' is actually a boray related shard and has the right postgres versions, etc. Something I found confusing during this part as well is that it seems we need to set the shards and their names before we've deployed any of the images or things related to them. I guess there's a chicken and egg, but how do we make sure that when we later deploy that they match and are consistent? Another thing is that rather than just creating a new more one-off `manta-create-buckets-topology.sh`, maybe we should look at kind of folding all of this as more first class things into either manta-shardadm or manta-adm? I feel like we've had some more centralization and more observability through moving things out of the one-off bash scripts and that helps. I'm not sure that it makes sense to serialize things to stdout by default as it may be more challenging then to actually automate this as ideally, we'd love to be able to use some kind of manta-adm tool to actually initialize all of the buckets subsystem, installing some of the various pieces, rather than halving a half dozen manual steps. Regarding the electric-boray set up and the fact that the image start up will check and verify what's going on? Can you explain a bit more about the rationale for having this behavior? My main concern is that someone will set the setting to auto-update and then this will cascade and say we have something where we restart all the electric-boray's, but they then all try to update because we forgot this was set, that could lead to more downtime or issues. I'm sure there's a bit more to this and a bunch of questions on what the right way to update all the rings in a deployment are so that they have the same version. I'm just not sure how much more benefit there is if the service is blocked while updating in that state versus provisioning something new and tearing down the other one. Maybe we could talk through a bit about how we'll make sure those updates are atomic and can't be corrupted or lead us to operating with a bunch of electric-boray instances in different states. Thanks, Robert

kellymclaughlin · 2019-06-17T18:04:25Z

Thanks for the comments, Robert. I apologize for my delayed response.

I think it might be useful to include in the bootstrap process when the various shards are created and how they are created. The RFD talks about using manta-shardadm to specify '1.boray' and '2.boray', but it's not immediately clear what is actually creating them. Similarly, I think it'd be useful to indicate in the RFD how we know that enough of the system has been updated such that we know we can actually start setting this up.

This sounds good. I'll add some description about how the shards get created and the intended order of the related steps in the process. I left it out on the first pass because I wasn't sure if it would be useful, but your feedback tells me it is. Thanks!

It'd be useful if this could also talk about whether or not it will verify things like how manta-shardadm will know that the '1.boray' is actually a boray related shard and has the right postgres versions, etc.

This is a good point. I am not sure we do anything in this regard with moray shards today, but maybe prior to registering them with SAPI we need to ensure they exist and do a basic sanity check.

Something I found confusing during this part as well is that it seems we need to set the shards and their names before we've deployed any of the images or things related to them. I guess there's a chicken and egg, but how do we make sure that when we later deploy that they match and are consistent?

The process with shardadm is intended to happen after the shards have been provisioned. I'll mention that to clear it up and hopefully it will make more sense.

Another thing is that rather than just creating a new more one-off manta-create-buckets-topology.sh, maybe we should look at kind of folding all of this as more first class things into either manta-shardadm or manta-adm? I feel like we've had some more centralization and more observability through moving things out of the one-off bash scripts and that helps.

Ok, so maybe instead of creating yet another bash script we create a new program to encompass the function that both scripts serve. That sounds fine to me.

I'm not sure that it makes sense to serialize things to stdout by default as it may be more challenging then to actually automate this as ideally, we'd love to be able to use some kind of manta-adm tool to actually initialize all of the buckets subsystem, installing some of the various pieces, rather than halving a half dozen manual steps.

I think if we're replacing the scripts with a new program as previously mentioned then we can do all of this inside that program. I just mentioned stdout because node-fash exposes a command line utility for dumping the serialized ring to stdout, but if we're creating a program there's no reason to do that.

Regarding the electric-boray set up and the fact that the image start up will check and verify what's going on? Can you explain a bit more about the rationale for having this behavior? My main concern is that someone will set the setting to auto-update and then this will cascade and say we have something where we restart all the electric-boray's, but they then all try to update because we forgot this was set, that could lead to more downtime or issues. I'm sure there's a bit more to this and a bunch of questions on what the right way to update all the rings in a deployment are so that they have the same version. I'm just not sure how much more benefit there is if the service is blocked while updating in that state versus provisioning something new and tearing down the other one. Maybe we could talk through a bit about how we'll make sure those updates are atomic and can't be corrupted or lead us to operating with a bunch of electric-boray instances in different states.

I was trying to take the first steps towards some of the support for changing the topology of the metadata system (i.e. relocating vnode data to new shards) with no system downtime or degradation in processing capability. I think there is a good case to make for having electric-boray update its config while running versus only updating the ring data through reprovisioning, but you are 100% correct there are risks and must be very good safeguards. I don't think it's necessary to couple this with the bootstrapping process. I was trying to get a bit of a head start, but thinking more about it it's not necessary to do right now. I'll edit the text to remove this part of the discussion and we can flesh it out separately at a later time.

rmustacc · 2019-06-17T18:43:53Z

On 6/17/19 11:04 , Kelly McLaughlin wrote: > It'd be useful if this could also talk about whether or not it will verify things like how manta-shardadm will know that the '1.boray' is actually a boray related shard and has the right postgres versions, etc. This is a good point. I am not sure we do anything in this regard with moray shards today, but maybe prior to registering them with SAPI we need to ensure they exist and do a basic sanity check.

I don't think we do today. I suspect that's partially just because there's only ever really been one kind of thing, so it's been easier to get away with. I think if we add some kind of check as part of the SAPI registration process that'll help. Maybe the boray service could advertise something about it being boray and some kind of semantic API version that we could use and check? I'm only thinking about that as then it'll be different that will just get an econnrefused if we do it in a moray or other service zone.

> Something I found confusing during this part as well is that it seems we need to set the shards and their names before we've deployed any of the images or things related to them. I guess there's a chicken and egg, but how do we make sure that when we later deploy that they match and are consistent? The process with shardadm is intended to happen after the shards have been provisioned. I'll mention that to clear it up and hopefully it will make more sense.

Ah, gotcha. That makes sense. Thanks!

> Another thing is that rather than just creating a new more one-off `manta-create-buckets-topology.sh`, maybe we should look at kind of folding all of this as more first class things into either manta-shardadm or manta-adm? I feel like we've had some more centralization and more observability through moving things out of the one-off bash scripts and that helps. Ok, so maybe instead of creating yet another bash script we create a new program to encompass the function that both scripts serve. That sounds fine to me. > I'm not sure that it makes sense to serialize things to stdout by default as it may be more challenging then to actually automate this as ideally, we'd love to be able to use some kind of manta-adm tool to actually initialize all of the buckets subsystem, installing some of the various pieces, rather than halving a half dozen manual steps. I think if we're replacing the scripts with a new program as previously mentioned then we can do all of this inside that program. I just mentioned stdout because node-fash exposes a command line utility for dumping the serialized ring to stdout, but if we're creating a program there's no reason to do that.

OK, that makes sense. Only real reason I brought this up is just the general goal over time to consolidate set up so it could eventually be as simple as something like 'manta-adm deploy buckets' or something similar that makes more sense in manta-adm. I know it won't be that way initially and that's fine.

> Regarding the electric-boray set up and the fact that the image start up will check and verify what's going on? Can you explain a bit more about the rationale for having this behavior? My main concern is that someone will set the setting to auto-update and then this will cascade and say we have something where we restart all the electric-boray's, but they then all try to update because we forgot this was set, that could lead to more downtime or issues. I'm sure there's a bit more to this and a bunch of questions on what the right way to update all the rings in a deployment are so that they have the same version. I'm just not sure how much more benefit there is if the service is blocked while updating in that state versus provisioning something new and tearing down the other one. Maybe we could talk through a bit about how we'll make sure those updates are atomic and can't be corrupted or lead us to operating with a bunch of electric-boray instances in different states. I was trying to take the first steps towards some of the support for changing the topology of the metadata system (*i.e. relocating vnode data to new shards) with no system downtime or degradation in processing capability. I think there is a good case to make for having electric-boray updates it's config while running versus only updating the ring data through reprovisioning, but you are 100% correct there are risks and must be very good safeguards. I don't think it's necessary to couple this with the bootstrapping process. I was trying to get a bit of a head start, but thinking more about it it's not necessary to do right now. I'll edit the text to remove this part of the discussion and we can flesh it out separately at a later time.

OK. I see where this is coming from. I agree that getting out of reprovision path and trying to drive towards something that we can do with even less downtime would be great. I'm happy to help brainstorm on that any time if it'd be useful.

kellymclaughlin · 2019-06-18T17:44:08Z

I don't think we do today. I suspect that's partially just because there's only ever really been one kind of thing, so it's been easier to get away with. I think if we add some kind of check as part of the SAPI registration process that'll help.

Maybe the boray service could advertise something about it being boray and some kind of semantic API version that we could use and check? I'm only thinking about that as then it'll be different that will just get an econnrefused if we do it in a moray or other service zone.

I was thinking about this more and how to update the RFD to describe what we might do. I first thought we could check a minimum version of postgres that supports the required features. The buckets system will actually work with postgres 9.5 and later. We're targeting postgres 11, but there is not feature it provides that binds us to it. We do need the hstore postgres extension present so we could check that, but this feels like it's moving towards a set of validations that easily gets stale and is forgotten about. A lot of this minimum requirement we are already encapsulating in the new buckets-postgres manta service so perhaps it's sufficient if the shardadm check verifies the shards are reachable on the network (similar to how manta-adm show might query services) and also ensure that each is an instance of the buckets-postgres service. Then it's left up to the service definition to describe what it needs to be valid and we don't have to expose those details to shardadm. What do you think of that approach?

rmustacc · 2019-06-21T18:31:57Z

On 6/18/19 10:44 , Kelly McLaughlin wrote: I was thinking about this more and how to update the RFD to describe what we might do. I first thought we could check a minimum version of postgres that supports the required features. The buckets system will actually work with postgres 9.5 and later. We're targeting postgres 11, but there is not feature it provides that binds us to it. We do need the `hstore` postgres extension present so we could check that, but this feels like it's moving towards a set of validations that easily gets stale and is forgotten about. A lot of this minimum requirement we are already encapsulating in the new `buckets-postgres` manta service so perhaps it's sufficient if the shardadm check verifies the shards are reachable on the network (similar to how `manta-adm show` might query services) and also ensure that each is an instance of the `buckets-postgres` service. Then it's left up to the service definition to describe what it needs to be valid and we don't have to expose those details to shardadm. What do you think of that approach?

I'm less familiar with how everything is organized, so if it's all being encapsulated by the buckets-postgres service, then that seems reasonable at first blush.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFD 168 Bootstrapping a Manta Buckets deployment #130

RFD 168 Bootstrapping a Manta Buckets deployment #130

kellymclaughlin commented May 21, 2019

rmustacc commented Jun 3, 2019 via email

kellymclaughlin commented Jun 17, 2019 •

edited

Loading

rmustacc commented Jun 17, 2019 via email

kellymclaughlin commented Jun 18, 2019

rmustacc commented Jun 21, 2019 via email

RFD 168 Bootstrapping a Manta Buckets deployment #130

RFD 168 Bootstrapping a Manta Buckets deployment #130

Comments

kellymclaughlin commented May 21, 2019

rmustacc commented Jun 3, 2019 via email

kellymclaughlin commented Jun 17, 2019 • edited Loading

rmustacc commented Jun 17, 2019 via email

kellymclaughlin commented Jun 18, 2019

rmustacc commented Jun 21, 2019 via email

kellymclaughlin commented Jun 17, 2019 •

edited

Loading