Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Portworx and future storage modules #37

Merged
merged 12 commits into from
Feb 24, 2021

Conversation

bikashrc25
Copy link
Contributor

@bikashrc25 bikashrc25 commented Jan 20, 2021

Addresses #24 by adding support for Portworx. Portworx is software define storage layer that provisions persistent storage and data mangement capabilities for stateful applications. Portworx is using GKE/Anthos cluster on Equinix Metal and automating the installation of the complete stack.
Included is a script that detects the best disk on each worker node to act as the Portworx KVDB drive. This script was needed because the disk device assignments on each worker node were not consistently assigned.

main.tf Outdated Show resolved Hide resolved
Copy link
Member

@displague displague left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per our conversation, we'll want to add lvm2 to https://github.com/equinix/terraform-metal-anthos-on-baremetal/blob/master/templates/pre_reqs.sh#L11 and https://github.com/equinix/terraform-metal-anthos-on-baremetal/blob/master/templates/pre_reqs.sh#L31 (I don't know the package name for Fedora-based systems, nor if the package is preinstalled or must be added).

@c0dyhi11
Copy link
Collaborator

Hey folks. I'm opinionated on how we should do this. I think we should have a variable that is labeled something like csi_provider and have it default to none or something like that.

Then we should be able to write different CSI modules that we apply based on the CSI they choose.
I was in the process of writing one for rook

I would like to see a CSI directory and then then a Provider directory and then a TF file and whatever else is needed in there to execute on. I'm not sure how to optionally include a module in TF like this at the moment. But I think that would be the most flexible.

@displague
Copy link
Member

I completely agree with keeping storage providers abstract in this project. While these changes are being developed (and the PR is in draft) the changes are taking an opinionated stance. We'll definitely want to see more generic definitions before this can be merged.

Signed-off-by: Bikash Roy Choudhury <[email protected]>
@displague
Copy link
Member

displague commented Jan 27, 2021

@c0dyhi11 There are some changes coming through in this PR that may be generally applicable for Portworx, Rook, longhorn, or other storage providers. We could accept all of those changes until there is a conflict between storage providers or overhead introduced in the dependencies.

Longhorn, for example, requires open-iscsi and one way to accomplish that is through Kubernetes, rather than Terraform provisioning: https://raw.githubusercontent.com/longhorn/longhorn/master/deploy/iscsi/longhorn-iscsi-installation.yaml (from longhorn/longhorn#1741)

templates/pre_reqs.sh Outdated Show resolved Hide resolved
templates/device_name.sh Outdated Show resolved Hide resolved
if ! lsblk -f -b -n -oNAME,SIZE,FSTYPE -i /dev/$disk | egrep "xfs|ext3|ext4|btrfs|sr0" >/dev/null; then
echo -en "$disk $size"
fi
done | sort -n -k2 | head -n1 | cut -f1 -d" "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be sort -n -k2 -r, so the largest disk is first (or change head to tail).

Copy link
Member

@displague displague Jan 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recall conversations about using the smallest/fastest disk for the KV DB, I don't see anything here to support that.

If we do copy this function to get the smallest disk, and then pvcreate/vgcreate/lvcreate that disk, we would need to be cautious about cases where there are only 2 disks and the smallest and largest free disk are the same disk.

@displague
Copy link
Member

displague commented Jan 29, 2021

We'll want some README.md notes about how to enable the use of Portworx in this project.

We will also need to know what the installed solution looks like to the user (in terms of what StorageClasses are available, how to interact with the UI). Should the user customize the StorageClasses (manually or does Portworx do this automatically) so that nvme storage can be chosen over hdd storage (priority_io: high ?).

We should also point out which plans will work with the Portworx installation, for example, the 2-disk c3.small plan may not work with the Portworx installation that this module provides.

And we'll also need to explain what happens when the trial period ends, what are the user's options, what happens to existing volumes and snapshots, and what capabilities are lost. Does this effectively become the OSS version?

@displague
Copy link
Member

The README should also include notes about how to interact with the Portworx UI within the cluster.

bikashrc25 and others added 3 commits February 3, 2021 10:36
@displague
Copy link
Member

displague commented Feb 11, 2021

I'm attempting to modularize this PR here:
https://github.com/bikashrc25/terraform-metal-anthos-on-baremetal/pull/1/files#r574256949

I haven't fully tested this yet because I ran into a limitation that for_each within the module can not use the metal_device.worker_nodes.*.access_public_ipv4 because:

The "for_each" value depends on resource attributes that cannot be determined
until apply,

I'm looking into other ways to accomplish this or workaround the problem.


Update: a simple count / count.index worked.

@displague
Copy link
Member

@bikashrc25 Can you pull the latest version of this branch and give it a try?

You'll need to add the following to terraform.tfvars:

storage_module = "portworx"

/* optionally, set this:
storage_options = {
    version = "0.26"
}
*/

@c0dyhi11 this is ready for a review.

README.md Outdated
```hcl
storage_module = "portworx"
storage_options = {
portworx_version = "2.6"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
portworx_version = "2.6"
version = "2.6"

modules/storage/main.tf Outdated Show resolved Hide resolved
@displague displague changed the title install portworx Add support for Portworx and future storage modules Feb 11, 2021

Login to any one of the Anthos cluster nodes and run `pxctl status` to check the portworx state or run `kubectl get pods -lapp=portworx -n kube-system` to check if the portworx pods are running. Portworx logs can be viewed by running: `kubectl logs -lapp=portworx -n kube-system --all-containers`.

By default, Portworx 2.6 is installed in the Anthos Cluster. The version of Portworx can be changed using the `portworx_version` variable.
Copy link
Member

@displague displague Feb 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a link to https://docs.portworx.com/reference/knowledge-base/px-licensing/#trial-license because I think it is helpful for users to know in advance what will happen when the trial ends.

We can also introduce a variable for the portworx_license so the user can active portworx immediately upon deployment (or by setting the variable and running terraform apply later).

Signed-off-by: Marques Johansson <[email protected]>
@bikashrc25
Copy link
Contributor Author

bikashrc25 commented Feb 12, 2021

@displague I created a new cluster this morning after I did a git pull yesterday from the code changes that you had done.

Here are some of the inconsistencies:

  1. 2 out of 3 nodes did not create a LVM for the PX KVDB. One one node successfully did it.
  2. The 3rd node that did create the pxw_vg, picked up the larger 480GB drive instead of the 240GB.

I am including some of the snippets.

This is worker node 1 where it could not create the pwx_vg and you can clearly see the warning message.

root@equinix-metal-gke-cluster-yk9or-worker-01:~# pxctl status
Status: PX is operational
License: Trial (expires in 31 days)
Node ID: 1534165d-4b6b-41df-b8e1-03e8c8d5c4d1
    IP: 145.40.77.105 
    Local Storage Pool: 2 pools
    POOL    IO_PRIORITY RAID_LEVEL  USABLE  USED    STATUS  ZONE    REGION
    0   HIGH        raid0       447 GiB 10 GiB  Online  default default
    1   HIGH        raid0       224 GiB 10 GiB  Online  default default
    Local Storage Devices: 2 devices
    Device  Path        Media Type      Size        Last-Scan
    0:1 /dev/sdb    STORAGE_MEDIUM_SSD  447 GiB     12 Feb 21 17:34 UTC
    1:1 /dev/sdc    STORAGE_MEDIUM_SSD  224 GiB     12 Feb 21 17:34 UTC
    * Internal kvdb on this node is sharing this storage device /dev/sdc  to store its data.
    total       -   671 GiB
    Cache Devices:
     * No cache devices
Cluster Summary
    Cluster ID: equinix-metal-gke-cluster-yk9or
    Cluster UUID: 47eb0c51-b2c1-456b-a254-e5c849a7d1db
    Scheduler: kubernetes
    Nodes: 3 node(s) with storage (3 online)
    IP      ID                  SchedulerNodeName               StorageNode Used    Capacity    Status  StorageStatus   Version     Kernel          OS
    145.40.77.101   9afd9a30-0eb3-4a8d-937f-86f5cf63c4bc    equinix-metal-gke-cluster-yk9or-worker-03   Yes     20 GiB  671 GiOnline    Up      2.6.3.0-4419aa4 5.4.0-52-generic    Ubuntu 20.04.1 LTS
    145.40.77.211   99a6f578-6c6f-4b09-b516-8dd332beef7e    equinix-metal-gke-cluster-yk9or-worker-02   Yes     20 GiB  668 GiOnline    Up      2.6.3.0-4419aa4 5.4.0-52-generic    Ubuntu 20.04.1 LTS
    145.40.77.105   1534165d-4b6b-41df-b8e1-03e8c8d5c4d1    equinix-metal-gke-cluster-yk9or-worker-01   Yes     20 GiB  671 GiOnline    Up (This node)  2.6.3.0-4419aa4 5.4.0-52-generic    Ubuntu 20.04.1 LTS
    Warnings: 
         WARNING: Internal Kvdb is not using dedicated drive on nodes [145.40.77.105]. This configuration is not recommended for production clusters.
Global Storage Pool
    Total Used      :  60 GiB
    Total Capacity  :  2.0 TiB
root@equinix-metal-gke-cluster-yk9or-worker-01:~# lsblk
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda      8:0    0 447.1G  0 disk 
sdb      8:16   0 447.1G  0 disk 
sdc      8:32   0 223.6G  0 disk 
sdd      8:48   0 223.6G  0 disk 
├─sdd1   8:49   0     2M  0 part 
├─sdd2   8:50   0   1.9G  0 part 
└─sdd3   8:51   0 221.7G  0 part /
root@equinix-metal-gke-cluster-yk9or-worker-01:~# 

This worker node 2 where there is no pwx_vg for the KVDB.

root@equinix-metal-gke-cluster-yk9or-worker-02:~# pxctl status
Status: PX is operational
License: Trial (expires in 31 days)
Node ID: 99a6f578-6c6f-4b09-b516-8dd332beef7e
    IP: 145.40.77.211 
    Local Storage Pool: 2 pools
    POOL    IO_PRIORITY RAID_LEVEL  USABLE  USED    STATUS  ZONE    REGION
    0   HIGH        raid0       447 GiB 10 GiB  Online  default default
    1   HIGH        raid0       221 GiB 10 GiB  Online  default default
    Local Storage Devices: 2 devices
    Device  Path        Media Type      Size        Last-Scan
    0:1 /dev/sdb    STORAGE_MEDIUM_SSD  447 GiB     12 Feb 21 17:47 UTC
    1:1 /dev/sdc2   STORAGE_MEDIUM_SSD  221 GiB     12 Feb 21 17:47 UTC
    * Internal kvdb on this node is sharing this storage device /dev/sdc2  to store its data.
    total       -   668 GiB
    Cache Devices:
     * No cache devices
    Journal Device: 
    1   /dev/sdc1   STORAGE_MEDIUM_SSD
Cluster Summary
    Cluster ID: equinix-metal-gke-cluster-yk9or
    Cluster UUID: 47eb0c51-b2c1-456b-a254-e5c849a7d1db
    Scheduler: kubernetes
    Nodes: 3 node(s) with storage (3 online)
    IP      ID                  SchedulerNodeName               StorageNode Used    Capacity    Status  StorageStatus   Version     Kernel          OS
    145.40.77.101   9afd9a30-0eb3-4a8d-937f-86f5cf63c4bc    equinix-metal-gke-cluster-yk9or-worker-03   Yes     20 GiB  671 GiOnline    Up      2.6.3.0-4419aa4 5.4.0-52-generic    Ubuntu 20.04.1 LTS
    145.40.77.211   99a6f578-6c6f-4b09-b516-8dd332beef7e    equinix-metal-gke-cluster-yk9or-worker-02   Yes     20 GiB  668 GiOnline    Up (This node)  2.6.3.0-4419aa4 5.4.0-52-generic    Ubuntu 20.04.1 LTS
    145.40.77.105   1534165d-4b6b-41df-b8e1-03e8c8d5c4d1    equinix-metal-gke-cluster-yk9or-worker-01   Yes     20 GiB  671 GiOnline    Up      2.6.3.0-4419aa4 5.4.0-52-generic    Ubuntu 20.04.1 LTS
    Warnings: 
         WARNING: Internal Kvdb is not using dedicated drive on nodes [145.40.77.105 145.40.77.211]. This configuration is not recommended for production clusters.
Global Storage Pool
    Total Used      :  60 GiB
    Total Capacity  :  2.0 TiB
root@equinix-metal-gke-cluster-yk9or-worker-02:~# lsblk
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda      8:0    0 447.1G  0 disk 
sdb      8:16   0 447.1G  0 disk 
sdc      8:32   0 223.6G  0 disk 
├─sdc1   8:33   0     3G  0 part 
└─sdc2   8:34   0 220.6G  0 part 
sdd      8:48   0 223.6G  0 disk 
├─sdd1   8:49   0     2M  0 part 
├─sdd2   8:50   0   1.9G  0 part 
└─sdd3   8:51   0 221.7G  0 part /
root@equinix-metal-gke-cluster-yk9or-worker-02:~# 

Finally this is worker node 3. This node creates the pwx_vg on the larger capacity drive.

root@equinix-metal-gke-cluster-yk9or-worker-03:~# pxctl status
Status: PX is operational
License: Trial (expires in 31 days)
Node ID: 9afd9a30-0eb3-4a8d-937f-86f5cf63c4bc
    IP: 145.40.77.101 
    Local Storage Pool: 2 pools
    POOL    IO_PRIORITY RAID_LEVEL  USABLE  USED    STATUS  ZONE    REGION
    0   HIGH        raid0       447 GiB 10 GiB  Online  default default
    1   HIGH        raid0       224 GiB 10 GiB  Online  default default
    Local Storage Devices: 2 devices
    Device  Path        Media Type      Size        Last-Scan
    0:1 /dev/sdb    STORAGE_MEDIUM_SSD  447 GiB     12 Feb 21 17:34 UTC
    1:1 /dev/sdc    STORAGE_MEDIUM_SSD  224 GiB     12 Feb 21 17:34 UTC
    total           -           671 GiB
    Cache Devices:
     * No cache devices
    Kvdb Device:
    Device Path     Size
    /dev/pwx_vg/pwxkvdb 447 GiB
     * Internal kvdb on this node is using this dedicated kvdb device to store its data.
Cluster Summary
    Cluster ID: equinix-metal-gke-cluster-yk9or
    Cluster UUID: 47eb0c51-b2c1-456b-a254-e5c849a7d1db
    Scheduler: kubernetes
    Nodes: 3 node(s) with storage (3 online)
    IP      ID                  SchedulerNodeName               StorageNode Used    Capacity    Status  StorageStatus   Version     Kernel          OS
    145.40.77.101   9afd9a30-0eb3-4a8d-937f-86f5cf63c4bc    equinix-metal-gke-cluster-yk9or-worker-03   Yes     20 GiB  671 GiOnline    Up (This node)  2.6.3.0-4419aa4 5.4.0-52-generic    Ubuntu 20.04.1 LTS
    145.40.77.211   99a6f578-6c6f-4b09-b516-8dd332beef7e    equinix-metal-gke-cluster-yk9or-worker-02   Yes     20 GiB  668 GiOnline    Up      2.6.3.0-4419aa4 5.4.0-52-generic    Ubuntu 20.04.1 LTS
    145.40.77.105   1534165d-4b6b-41df-b8e1-03e8c8d5c4d1    equinix-metal-gke-cluster-yk9or-worker-01   Yes     20 GiB  671 GiOnline    Up      2.6.3.0-4419aa4 5.4.0-52-generic    Ubuntu 20.04.1 LTS
Global Storage Pool
    Total Used      :  60 GiB
    Total Capacity  :  2.0 TiB
root@equinix-metal-gke-cluster-yk9or-worker-03:~# lsblk
NAME             MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda                8:0    0 447.1G  0 disk 
└─pwx_vg-pwxkvdb 253:0    0 447.1G  0 lvm  
sdb                8:16   0 447.1G  0 disk 
sdc                8:32   0 223.6G  0 disk 
sdd                8:48   0 223.6G  0 disk 
├─sdd1             8:49   0     2M  0 part 
├─sdd2             8:50   0   1.9G  0 part 
└─sdd3             8:51   0 221.7G  0 part /
root@equinix-metal-gke-cluster-yk9or-worker-03:~# 

Any thoughts regarding these inconsistencies.

@bikashrc25
Copy link
Contributor Author

bikashrc25 commented Feb 15, 2021

@displague Thanks for fixing the terraform script to create pwx_vg on the worker nodes. However, the script should pick the lowest disk size for creating the pwx_vg for KVDB. I am attaching the output of the "lsblk" from all the 3 worker nodes. You would notice that worker node 2 and 3 are using a 480GB drive instead of 24GB.

Worker node1:

root@equinix-metal-gke-cluster-oywa2-worker-01:~# lsblk
NAME             MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda                8:0    0 223.6G  0 disk 
└─pwx_vg-pwxkvdb 253:0    0 223.6G  0 lvm  
sdb                8:16   0 223.6G  0 disk 
├─sdb1             8:17   0     2M  0 part 
├─sdb2             8:18   0   1.9G  0 part 
└─sdb3             8:19   0 221.7G  0 part /
sdc                8:32   0 447.1G  0 disk 
├─sdc1             8:33   0     3G  0 part 
└─sdc2             8:34   0 444.1G  0 part 
sdd                8:48   0 447.1G  0 disk 
root@equinix-metal-gke-cluster-oywa2-worker-01:~# 

Worker node 2:

root@equinix-metal-gke-cluster-oywa2-worker-02:~# lsblk
NAME             MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda                8:0    0 447.1G  0 disk 
└─pwx_vg-pwxkvdb 253:0    0 447.1G  0 lvm  
sdb                8:16   0 447.1G  0 disk 
sdc                8:32   0 223.6G  0 disk 
sdd                8:48   0 223.6G  0 disk 
├─sdd1             8:49   0     2M  0 part 
├─sdd2             8:50   0   1.9G  0 part 
└─sdd3             8:51   0 221.7G  0 part /
root@equinix-metal-gke-cluster-oywa2-worker-02:~# 
root@equinix-metal-gke-cluster-oywa2-worker-03:~# lsblk
NAME             MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda                8:0    0 447.1G  0 disk 
└─pwx_vg-pwxkvdb 253:0    0 447.1G  0 lvm  
sdb                8:16   0 447.1G  0 disk 
sdc                8:32   0 223.6G  0 disk 
sdd                8:48   0 223.6G  0 disk 
├─sdd1             8:49   0     2M  0 part 
├─sdd2             8:50   0   1.9G  0 part 
└─sdd3             8:51   0 221.7G  0 part /
root@equinix-metal-gke-cluster-oywa2-worker-03:~# 

The Portworx licensing step can fail if applied too soon. The documentation is updated to reflect that, providing remediations.
@displague
Copy link
Member

@bikashrc25 I created #45 to track the issue you mentioned about the disk choices.

In recent commits I documented that the Portworx license options should only be used after the Portworx install is ready. If used too early the failure can be fixed with another terraform apply (assuming we don't have any unrelated state drift problems in this project).

@c0dyhi11 We followed the pattern of offering generic storage options, I think this is ready to go so we can build upon this pattern. Awaiting your review before a merge.

Copy link
Collaborator

@c0dyhi11 c0dyhi11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few changes. Other than that... I'm good!

main.tf Show resolved Hide resolved
templates/pre_reqs.sh Outdated Show resolved Hide resolved
deploy_anthos_cluster is run on the first control plane node, but it
triggers ssh provisioning from that node to the workers outside of what
Terraform is doing.

Signed-off-by: Marques Johansson <[email protected]>
@displague displague merged commit 0e89860 into equinix:master Feb 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants