Kubernetes Providers connection refused when trying to turn on Auto Mode for existing cluster #3263

moore-nathan · 2024-12-31T18:34:13Z

Description

When attempting to enable Auto mode for an already existing cluster (by adding in the cluster_compute_config input) I start getting the error dial tcp 127.0.0.1:80: connect: connection refused for resources that use either the kubernetes provider or the helm provider.
It is only when trying to turn on auto mode with the cluster_compute_config input this starts happening. Any changes without this input still work as expected.

✋ I have searched the open/closed issues and my issue is not listed.

⚠️ Note

Before you submit an issue, please perform the following first:

Remove the local .terraform directory (! ONLY if state is stored remotely, which hopefully you are following that best practice!): rm -rf .terraform/
Re-initialize the project root to pull down modules: terraform init
Re-attempt your terraform plan or apply and check if the issue still persists

Versions

Module version [Required]: 20.31.6
Terraform version: 1.5.2

Provider version(s):

+ provider registry.terraform.io/hashicorp/aws v5.82.2
+ provider registry.terraform.io/hashicorp/cloudinit v2.3.5
+ provider registry.terraform.io/hashicorp/helm v2.17.0
+ provider registry.terraform.io/hashicorp/http v3.4.5
+ provider registry.terraform.io/hashicorp/kubernetes v2.35.1
+ provider registry.terraform.io/hashicorp/local v2.5.2
+ provider registry.terraform.io/hashicorp/null v3.2.3
+ provider registry.terraform.io/hashicorp/time v0.12.1
+ provider registry.terraform.io/hashicorp/tls v4.0.6
+ provider registry.terraform.io/integrations/github v6.4.0

Reproduction Code [Required]

module "eks" {
  source                        = "terraform-aws-modules/eks/aws"
  version                       = "~> 20.31"
  cluster_name                  = var.cluster_name
  cluster_version               = var.cluster_version
  iam_role_permissions_boundary = var.permissions_boundary_arn
  cluster_tags                  = local.all_eks_tags
  tags                          = local.all_eks_tags
  vpc_id                        = var.vpc_id
  subnet_ids                    = var.subnet_ids
  # control_plane_subnet_ids      = var.control_plane_subnet_ids # Optional
  create_kms_key = false

  cluster_encryption_config = {
    ...
  }

  cluster_compute_config = {
    enabled    = true
    node_pools = ["general-purpose", "system"]
  }

  # EKS Managed Node Group(s)
  eks_managed_node_group_defaults = {
    ...
  }

  eks_managed_node_groups = local.updated_node_groups

  node_security_group_enable_recommended_rules = true

  cluster_security_group_additional_rules = {
    ...
  }

  node_security_group_additional_rules = {
    ...
  }

  node_security_group_tags = merge(local.all_eks_tags, {
    # NOTE - if creating multiple security groups with this module, only tag the
    # security group that Karpenter should utilize with the following tag
    # (i.e. - at most, only one security group should have this tag in your account)
    "karpenter.sh/discovery" = var.cluster_name
  })

  # Enable automatic admin permissions for the cluster creator
  enable_cluster_creator_admin_permissions = false

  # Define access entries
  access_entries = {
    ...
  }
  cluster_addons = {
    kube-proxy = {
      resolve_conflicts_on_update = "OVERWRITE"
      addon_version               = try(var.cluster_addons_versions["kube-proxy"].version, "")
    }
    vpc-cni = {
      resolve_conflicts_on_update = "OVERWRITE"
      addon_version               = try(var.cluster_addons_versions["vpc-cni"].version, "")
      service_account_role_arn    = module.vpc_cni_irsa.iam_role_arn
      configuration_values        = try(jsonencode(var.cluster_addons_versions["vpc-cni"].configuration_values), "{}")
    }
    coredns = {
      resolve_conflicts_on_update = "OVERWRITE"
      addon_version               = try(var.cluster_addons_versions["coredns"].version, "")
      configuration_values = jsonencode({
        tolerations = [
          # Allow CoreDNS to run on the same nodes as the Karpenter controller
          # for use during cluster creation when Karpenter nodes do not yet exist
          {
            key    = "karpenter.sh/controller"
            value  = "true"
            effect = "NoSchedule"
          }
        ]
      })
    }
    aws-ebs-csi-driver = {
      resolve_conflicts_on_update = "OVERWRITE"
      addon_version               = try(var.cluster_addons_versions["aws-ebs-csi-driver"].version, "")
      service_account_role_arn    = module.ebs_csi_driver.iam_role_arn
    }

    eks-pod-identity-agent = {
      resolve_conflicts_on_update = "OVERWRITE"
      addon_version               = try(var.cluster_addons_versions["eks-pod-identity-agent"].version, "")
    }
  }

}

provider "kubernetes" {
  host                   = module.eks.cluster_endpoint
  cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)
  exec {
    api_version = "client.authentication.k8s.io/v1beta1"
    command     = "aws"
    args        = ["eks", "get-token", "--cluster-name", var.cluster_name, "--profile", "xxx"]
  }
}

provider "helm" {
  kubernetes {
    host                   = module.eks.cluster_endpoint
    cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)

    exec {
      api_version = "client.authentication.k8s.io/v1beta1"
      command     = "aws"
      args        = ["eks", "get-token", "--cluster-name", var.cluster_name, "--profile", "xxx"]
    }
  }
}

Steps to reproduce the behavior:

Expected behavior

EKS auto mode is turned on

Actual behavior

Terraform errors out without being able to connect to EKS

Terminal Output Screenshot(s)

Additional context

Add-on versions of existing cluster:

Amazon VPC CNI: v1.19.2-eksbuild.1
kube-proxy: v1.30.7-eksbuild.2
Amazon EBS CSI Driver: v1.37.0-eksbuild.1
Amazon EKS Pod Identity Agent: v1.3.4-eksbuild.1
CoreDNS: v1.11.4-eksbuild.1

The errors for the connection are on the a namespace resource and a helm_release resource

The text was updated successfully, but these errors were encountered:

andrewhharmon · 2025-01-07T13:35:54Z

I think this is bc it is trying to drop/recreate the cluster and that causes an issue if you have helm or k8s provider in the same terraform project. I think if you set bootstrap_self_managed_addons = true, your error may go away bc I think that's what's causing the drop/recreate. However, I'm not sure the consequences of this. I spent a good part of a day trying to switch an existing cluster to auto mode and ended up breaking things so badly, I had to manually delete my cluster and recreate it. More broadly speaking, it isn't very clear how to migrate an existing cluster to auto mode in either the aws docs or this repo. Any insights there would be great.

bryantbiggs added the question label Dec 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kubernetes Providers connection refused when trying to turn on Auto Mode for existing cluster #3263

Kubernetes Providers connection refused when trying to turn on Auto Mode for existing cluster #3263

moore-nathan commented Dec 31, 2024 •

edited

Loading

andrewhharmon commented Jan 7, 2025

Kubernetes Providers connection refused when trying to turn on Auto Mode for existing cluster #3263

Kubernetes Providers connection refused when trying to turn on Auto Mode for existing cluster #3263

Comments

moore-nathan commented Dec 31, 2024 • edited Loading

Description

⚠️ Note

Versions

Reproduction Code [Required]

Expected behavior

Actual behavior

Terminal Output Screenshot(s)

Additional context

andrewhharmon commented Jan 7, 2025

moore-nathan commented Dec 31, 2024 •

edited

Loading