-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFE] Reset cloud allocation API #561
Comments
Hi @josecastillolema I think all of this is already covered by additional processes or not in scope.
You should just use Foreman to do this, a self-scheduled environment is the exact same functionally as a deliberately scheduled one and this isn't really in scope of the API. Foreman is a third-party complete platform where those actions are better handled there.
Just re-provision via Foreman and skip this node. We would have no way of knowing what you chose to use as your bastion node anyway.
You will keep your environment name, cloud # and everything. I think this RFE isn't in scope for being managed by QUADS. We do have a Forman library but we do not want to try to operate Foreman from QUADS. |
@josecastillolema add here, we will be providing an RFE shortly to allow you to choose your OS so that may let you skip having to re-provision if you were just doing this to get a newer OS than the lab default, so long as that operating system is present in that QUADS Foreman. This would be an API option for self-scheduling. (WIP patchset) |
You can achieve this via the foreman REST api like this:
|
@josecastillolema given the details above and also choosing Foreman OS got merged in We'll be adding this to the self-service API via #563 It's not exactly what you're asking for here, but I think it saves the need for re-provisioning systems with a newer OS (EL9 vs EL8 lab default or lab default) when you get systems. We will close this RFE as QUADS won't be performing Foreman actions beyond initial provisioning workflows it already does with the QUADS Foreman library. Foreman already provides a robust API for what you're trying to do anyway and that's the best, most direct place to do it for marking something for build for example. |
Thanks for taking a look @sadsfae @grafuls
Re-provision the host via Foreman (programmatically from the CI) has proven to be extremely challenging. Not the API call to Foreman itself, that seems to be the easy part, but rebooting the server in the proper interface to pick the PXE boot from Foreman and then reestablishing settings in a way OCP deployments don't break afterwards. Some challenges we have experienced:
It has worked sometimes on performancelab cloud31 but never on cloud19 without @QuantumPosix manually loging on the server and doing some adjustments.
About (maybe unrealistic 😁) expectations:
flowchart TD
A[CI/Prow] --> B[QUADs]
B --> C[Foreman]
And finally about alternatives:
Looking forward to discuss this further. |
@josecastillolema Thanks for such a detailed response and some great information here, I'll try to respond in-line below:
This is just life with bare-metal, it's always a difficult challenge when you are at the mercy of so many vendors, firmware versions and hardware configurations. Things get especially interesting when complex application stacks also want to do varied things to the hardware too.
This seems like a hardware / lab configuration challenge. If it works reliably in once place and not the other it's likely not anything we can solve programmatically, there are just intrinsic differences between environments. Let's take the RDU3 Performance lab for example, there are 6 x different R740XD models and at least 3 x different R750 models, this is by design - some of it historical but mainly because each model has a differing set of hardware design which places device-name-changing attributes because of slot placements, different mainboards, different components and so on. The best we can hope for here is to abstract enough of these differences away so things can be installed/operated in a repeatable and reliable fashion. Customers don't have it any easier and may have even more variety in their fleet, may likely not have something useful like Jetlag or QUADS and fight a hard fight against the changing landscape of application installers to boot. The onus to sort this out is going to still be on your own automation and what model(s) you receive, the best we can provide here is some reliable designator that maps to XY hardware or XZ hardware config. We can build on what the API provides to make this easier and more turn-key but there's already a lot provided too. In QUADS 2.x and above hardware differences this can be filtered via the For example, you can filter based on model in RDU3 performance lab: There are a number of models there, each correspond to every sub-variety of every major server model, within these sub-models they should be identical so therefore an
If the model(s) you're using are not one of those then you need to work with the admin of that lab (Chris) to get them added and queryable. Go to the QUADS wiki for that lab and click on There has to be on the part of the lab the assumption that there are no differences between the The second big thing here is how complex application stacks like OCP or OSP want to interact with, fiddle or otherwise manipulate BIOS boot order. This is also a changing landscape we have no control over and something your automation needs to handle and something we need to provide feedback to installer teams as well and to track and address if things change.
No it will not for every case, but I think utilizing things like the
I'd have to look more about what you're trying to do here but you shouldn't need to modify any of the pxe loader stuff, perhaps this model isn't integrated correctly and needs more steps or changes in Foreman. I don't see r760 models listed at all on the RDU3 Performance Lab page. Are these models assigned onto to some people and not generally schedulable? Only Chris manages this lab so we'd have to take a look with him or see if this is the case. Have you tried the same thing without modfying the PXE loader stuff in Scale Lab on R660? They are also EFI by default and we can lend you some to test. This reads like more of a lab / foreman / configuration issue and nothing to do with QUADS or even Badfish. Let's take a look and chat about it internally.
Sure. QUADS already does this, what kind of hardware details are you looking for that's not provided already? We have a flexible, extensible and growing metadata model we can add almost anything and keep it in QUADS for each individual server so the API can query it: https://github.com/redhat-performance/quads/blob/latest/docs/quads-host-metadata-search.md#how-to-import-host-metadata and adding other hardware details to filter for is a great RFE we'd love to tackle.
The QUADS Foreman library only toggles systems for build, sets OS, sets a few host-level parameters, manages Foreman RBAC for cloud user access and with a more recent RFE it will let you set a default OS per cloud: 189cd16 The technical reason why this isn't possible is QUADS has no RBAC / admin token API level ability for individual cloud users to do these things outside of self-scheduling where it is token/bearer based (because it has to be). Deliberately scheduled assignment RBAC and access is handled within Foreman (e.g. cloud02 cloud03 cloud04 user, views, permissions and so on). Foreman is a lifecycle management platform for systems, and thus their access to them and administration (rebuilding, installing another OS, etc). QUADS is not a provisioner, it's a scheduler first and foremost - it calls to other provisioners to do the things they do best. Right now that's Foreman for a lot of good reasons but it's not limited to that for the future (think AWS, hybrid cloud etc). From a design-principle perspective we would not want to overlap with a mature, robust RESTful API that is already provided by Forman to do this or any number of things in this category. We do not want to handle RBAC in two places or act as some kind of API proxy / translation to Foreman either - that's just beyond the scope of what QUAD does and does not do right now. I could forsee moving cloud/environment-based RBAC to QUADS in the future as things evolve (or syncing it with Foreman) but it's handled well by Foreman right now for tenant machine operations and would be a complex undertaking to re-design it. I hope this explains our approach better here.
Without a cloud user RBAC in QUADS so far as rebooting systems - you have plenty of other ways to do this directly. You can power cycle the system through the Foreman API, you can power cycle the system through curl, you can power cycle the system through badfish, ipmitool, the python redfish library even, through Ansible uri module, through sushy, through native python with urllib3, and likely others. I don't think talking to an API to talk to an API is a sustainable design in this aspect, not when RBAC is handled at the Foreman level anyway for IPMI/OOB. So far as marking for build the same design principle applies here - talk directly to any number of API(s), hammer, curl, urllib3, uri module in Ansible to do this. There is a trivial facility in Ansible to wait_for other webservices response to depend on and enact other automation and there are half a dozen or more direct avenues for you to do this without relying on QUADS to do it beyond the duplication of RBAC we'd need to maintain to allow for that level of API POST/PUT (which would in turn have to just talk to the same service you would talk to directly). Now, if you're doing at least one of these against the same source you should just do both to keep things simple. Any feature that's already handled better by the Foreman API is just not something we'll likely be able to implement and maintain in QUADS to effectively proxy it for you when it still manages machine lifecycle RBAC. If in the future we move our assignment RBAC services directly to QUADS this can change and it might make sense to do it there but that's not the case today. Foreman does an excellent job of handling systems lifecycle management and provisioning and has a full-featured RESTful API and is already set up with granular RBAC for these needs so it just makes sense that's where you do it right now. It's also not uncommon or complex to expect a CI-driven workflow to talk to more than one API either. The caveat here is with self-scheduled assignments we handle token/bearer auth in QUADS to authorize admin-level scope for only the systems inclusive of that temporary, self-scheduled assignment to enable the necessary API calls necessary to complete the assignment allocation workflow but systems management is still handled by Foreman like any other assignment. Similarly when we start talking to public cloud provider API's well also likely abstract that in a similar fashion. The best way to think about the lines of delineation between QUADS and any other infrastructure platform in play is the following:
Likewise, let me re-open this RFE and we will keep it open due to the useful information here and discussion. We can always change the title and scope and use it to further discussion or carve out a related RFE from this. |
Is your feature request related to a problem? Please describe.
The feature request is related to CI usage. Most of our tooling (jetlag, jetski) assumes a clean deployment as a pre-requisite.
Describe the solution you'd like
A reset allocation API that will:
Describe alternatives you've considered
Manually hammer host update all of the nodes of the allocation
The text was updated successfully, but these errors were encountered: