Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubernetes Providers connection refused when trying to turn on Auto Mode for existing cluster #3263

Open
1 task done
moore-nathan opened this issue Dec 31, 2024 · 1 comment
Labels

Comments

@moore-nathan
Copy link

moore-nathan commented Dec 31, 2024

Description

When attempting to enable Auto mode for an already existing cluster (by adding in the cluster_compute_config input) I start getting the error dial tcp 127.0.0.1:80: connect: connection refused for resources that use either the kubernetes provider or the helm provider.
It is only when trying to turn on auto mode with the cluster_compute_config input this starts happening. Any changes without this input still work as expected.

  • ✋ I have searched the open/closed issues and my issue is not listed.

⚠️ Note

Before you submit an issue, please perform the following first:

  1. Remove the local .terraform directory (! ONLY if state is stored remotely, which hopefully you are following that best practice!): rm -rf .terraform/
  2. Re-initialize the project root to pull down modules: terraform init
  3. Re-attempt your terraform plan or apply and check if the issue still persists

Versions

  • Module version [Required]: 20.31.6

  • Terraform version: 1.5.2

  • Provider version(s):
+ provider registry.terraform.io/hashicorp/aws v5.82.2
+ provider registry.terraform.io/hashicorp/cloudinit v2.3.5
+ provider registry.terraform.io/hashicorp/helm v2.17.0
+ provider registry.terraform.io/hashicorp/http v3.4.5
+ provider registry.terraform.io/hashicorp/kubernetes v2.35.1
+ provider registry.terraform.io/hashicorp/local v2.5.2
+ provider registry.terraform.io/hashicorp/null v3.2.3
+ provider registry.terraform.io/hashicorp/time v0.12.1
+ provider registry.terraform.io/hashicorp/tls v4.0.6
+ provider registry.terraform.io/integrations/github v6.4.0

Reproduction Code [Required]

module "eks" {
  source                        = "terraform-aws-modules/eks/aws"
  version                       = "~> 20.31"
  cluster_name                  = var.cluster_name
  cluster_version               = var.cluster_version
  iam_role_permissions_boundary = var.permissions_boundary_arn
  cluster_tags                  = local.all_eks_tags
  tags                          = local.all_eks_tags
  vpc_id                        = var.vpc_id
  subnet_ids                    = var.subnet_ids
  # control_plane_subnet_ids      = var.control_plane_subnet_ids # Optional
  create_kms_key = false

  cluster_encryption_config = {
    ...
  }

  cluster_compute_config = {
    enabled    = true
    node_pools = ["general-purpose", "system"]
  }

  # EKS Managed Node Group(s)
  eks_managed_node_group_defaults = {
    ...
  }

  eks_managed_node_groups = local.updated_node_groups

  node_security_group_enable_recommended_rules = true

  cluster_security_group_additional_rules = {
    ...
  }

  node_security_group_additional_rules = {
    ...
  }

  node_security_group_tags = merge(local.all_eks_tags, {
    # NOTE - if creating multiple security groups with this module, only tag the
    # security group that Karpenter should utilize with the following tag
    # (i.e. - at most, only one security group should have this tag in your account)
    "karpenter.sh/discovery" = var.cluster_name
  })

  # Enable automatic admin permissions for the cluster creator
  enable_cluster_creator_admin_permissions = false

  # Define access entries
  access_entries = {
    ...
  }
  cluster_addons = {
    kube-proxy = {
      resolve_conflicts_on_update = "OVERWRITE"
      addon_version               = try(var.cluster_addons_versions["kube-proxy"].version, "")
    }
    vpc-cni = {
      resolve_conflicts_on_update = "OVERWRITE"
      addon_version               = try(var.cluster_addons_versions["vpc-cni"].version, "")
      service_account_role_arn    = module.vpc_cni_irsa.iam_role_arn
      configuration_values        = try(jsonencode(var.cluster_addons_versions["vpc-cni"].configuration_values), "{}")
    }
    coredns = {
      resolve_conflicts_on_update = "OVERWRITE"
      addon_version               = try(var.cluster_addons_versions["coredns"].version, "")
      configuration_values = jsonencode({
        tolerations = [
          # Allow CoreDNS to run on the same nodes as the Karpenter controller
          # for use during cluster creation when Karpenter nodes do not yet exist
          {
            key    = "karpenter.sh/controller"
            value  = "true"
            effect = "NoSchedule"
          }
        ]
      })
    }
    aws-ebs-csi-driver = {
      resolve_conflicts_on_update = "OVERWRITE"
      addon_version               = try(var.cluster_addons_versions["aws-ebs-csi-driver"].version, "")
      service_account_role_arn    = module.ebs_csi_driver.iam_role_arn
    }

    eks-pod-identity-agent = {
      resolve_conflicts_on_update = "OVERWRITE"
      addon_version               = try(var.cluster_addons_versions["eks-pod-identity-agent"].version, "")
    }
  }

}

provider "kubernetes" {
  host                   = module.eks.cluster_endpoint
  cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)
  exec {
    api_version = "client.authentication.k8s.io/v1beta1"
    command     = "aws"
    args        = ["eks", "get-token", "--cluster-name", var.cluster_name, "--profile", "xxx"]
  }
}

provider "helm" {
  kubernetes {
    host                   = module.eks.cluster_endpoint
    cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)

    exec {
      api_version = "client.authentication.k8s.io/v1beta1"
      command     = "aws"
      args        = ["eks", "get-token", "--cluster-name", var.cluster_name, "--profile", "xxx"]
    }
  }
}

Steps to reproduce the behavior:

Expected behavior

EKS auto mode is turned on

Actual behavior

Terraform errors out without being able to connect to EKS

Terminal Output Screenshot(s)

Additional context

Add-on versions of existing cluster:

  • Amazon VPC CNI: v1.19.2-eksbuild.1
  • kube-proxy: v1.30.7-eksbuild.2
  • Amazon EBS CSI Driver: v1.37.0-eksbuild.1
  • Amazon EKS Pod Identity Agent: v1.3.4-eksbuild.1
  • CoreDNS: v1.11.4-eksbuild.1

The errors for the connection are on the a namespace resource and a helm_release resource

@andrewhharmon
Copy link

I think this is bc it is trying to drop/recreate the cluster and that causes an issue if you have helm or k8s provider in the same terraform project. I think if you set bootstrap_self_managed_addons = true, your error may go away bc I think that's what's causing the drop/recreate. However, I'm not sure the consequences of this. I spent a good part of a day trying to switch an existing cluster to auto mode and ended up breaking things so badly, I had to manually delete my cluster and recreate it. More broadly speaking, it isn't very clear how to migrate an existing cluster to auto mode in either the aws docs or this repo. Any insights there would be great.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants