How to Deploy a GKE Cluster with Pulumi (Python)

Deploying a GKE cluster with Pulumi Python means provisioning the GKE control plane, a separately managed node pool, Workload Identity for keyless pod-to-GCP auth, and a typed kubeconfig output — part of the broader GCP Provider Configuration workflow. The pattern that matters is removing the default node pool and managing nodes as their own resource, so you can resize and recreate nodes without touching the control plane.

This guide builds a VPC-native GKE cluster with the default node pool stripped out, a dedicated node pool with autoscaling, Workload Identity enabled end to end, and a kubeconfig assembled from the GKE cluster outputs.

Context

The default GKE cluster is convenient and wrong for production: its node pool is coupled to the GKE cluster resource, so changing node config can force the whole cluster to recreate. Separating the node pool and enabling Workload Identity from the start avoids both an expensive recreate later and the anti-pattern of mounting node service-account keys into pods. Nodes provisioned this way can read a secured GCS bucket through Workload Identity instead of static keys, and the GKE cluster inherits its credential routing from the parent provider configuration.

Prerequisites

  • Python 3.9+ with pulumi>=3.0 and pulumi-gcp>=7.0.
  • A GCP project with the Kubernetes Engine API enabled and GCP_PROJECT / GCP_REGION set, or the provider configured per the parent guide.
  • IAM permissions for container.clusters.create, container.clusters.get, and iam.serviceAccounts.create.
  • An existing VPC-native network and subnetwork with secondary ranges for pods and services, or permission to create them.
  • mypy for static checking; kubectl to verify the kubeconfig.

Implementation

1. Define typed cluster configuration

# infra/gke_config.py
# CLI: mypy --strict infra/
from __future__ import annotations
from dataclasses import dataclass

@dataclass(frozen=True)
class GkeConfig:
    name: str
    location: str = "us-central1"
    network: str = "default"
    subnetwork: str = "default"
    node_machine_type: str = "e2-standard-4"
    min_nodes: int = 1
    max_nodes: int = 3
    # State implication: changing `location` (region vs zone) forces
    # replacement of the cluster.

2. Create the GKE cluster with the default node pool removed

remove_default_node_pool=True plus initial_node_count=1 tells GKE to bootstrap and then delete the default pool, leaving you free to attach a managed one. workload_identity_config binds the GKE cluster to the project workload identity pool.

# infra/gke.py
# CLI: pulumi preview --diff
from __future__ import annotations
import pulumi_gcp as gcp
from infra.gke_config import GkeConfig

def build_cluster(cfg: GkeConfig, project: str) -> gcp.container.Cluster:
    return gcp.container.Cluster(
        cfg.name,
        name=cfg.name,
        location=cfg.location,
        network=cfg.network,
        subnetwork=cfg.subnetwork,
        # Provider note: remove the default pool so nodes are managed
        # independently of the control plane.
        remove_default_node_pool=True,
        initial_node_count=1,
        ip_allocation_policy=gcp.container.ClusterIpAllocationPolicyArgs(),  # VPC-native
        workload_identity_config=gcp.container.ClusterWorkloadIdentityConfigArgs(
            workload_pool=f"{project}.svc.id.goog",
        ),
        deletion_protection=False,  # State implication: set True in prod to block accidental destroy
    )

3. Attach an autoscaling node pool with Workload Identity

The node pool sets workload_metadata_config="GKE_METADATA" so pods cannot reach the node's service-account token through the metadata server — the prerequisite for keyless Workload Identity.

# infra/gke.py (continued)
# CLI: pulumi up
import pulumi_gcp as gcp
from infra.gke_config import GkeConfig

def build_node_pool(cfg: GkeConfig, cluster: gcp.container.Cluster) -> gcp.container.NodePool:
    return gcp.container.NodePool(
        f"{cfg.name}-pool",
        cluster=cluster.name,
        location=cfg.location,
        autoscaling=gcp.container.NodePoolAutoscalingArgs(
            min_node_count=cfg.min_nodes,
            max_node_count=cfg.max_nodes,
        ),
        node_config=gcp.container.NodePoolNodeConfigArgs(
            machine_type=cfg.node_machine_type,
            oauth_scopes=["https://www.googleapis.com/auth/cloud-platform"],
            # Provider note: GKE_METADATA enforces Workload Identity on pods.
            workload_metadata_config=gcp.container.NodePoolNodeConfigWorkloadMetadataConfigArgs(
                mode="GKE_METADATA",
            ),
        ),
    )

4. Assemble and export a typed kubeconfig

Build the kubeconfig from the GKE cluster endpoint and CA certificate using Output.all().apply() so the credentials resolve only after the GKE cluster exists.

# __main__.py
# CLI: pulumi up && pulumi stack output kubeconfig --show-secrets > kubeconfig.yaml
import pulumi
import pulumi_gcp as gcp
from infra.gke_config import GkeConfig
from infra.gke import build_cluster, build_node_pool

project = gcp.config.project or ""
cfg = GkeConfig(name="apps")
cluster = build_cluster(cfg, project)
node_pool = build_node_pool(cfg, cluster)

def kubeconfig(name: str, endpoint: str, ca: str) -> str:
    return f"""apiVersion: v1
clusters:
- cluster: {{certificate-authority-data: {ca}, server: https://{endpoint}}}
  name: {name}
contexts:
- context: {{cluster: {name}, user: {name}}}
  name: {name}
current-context: {name}
users:
- name: {name}
  user:
    exec:
      apiVersion: client.authentication.k8s.io/v1beta1
      command: gke-gcloud-auth-plugin
      provideClusterInfo: true
"""

kc = pulumi.Output.all(cluster.name, cluster.endpoint,
                       cluster.master_auth.cluster_ca_certificate).apply(
    lambda a: kubeconfig(a[0], a[1], a[2])
)
pulumi.export("kubeconfig", pulumi.Output.secret(kc))  # State implication: secret

Verification

Assert Workload Identity is configured, then confirm the kubeconfig actually reaches the GKE cluster.

# tests/test_gke.py
# CLI: pytest tests/test_gke.py
from __future__ import annotations
import pulumi
from typing import Any, Dict, Tuple

class Mocks(pulumi.runtime.Mocks):
    def new_resource(self, args: pulumi.runtime.MockResourceArgs) -> Tuple[str, Dict[str, Any]]:
        outs = {**args.inputs, "endpoint": "10.0.0.1",
                "masterAuth": {"clusterCaCertificate": "QQ=="}}
        return (f"{args.name}-id", outs)
    def call(self, args: pulumi.runtime.MockCallArgs) -> Dict[str, Any]:
        return {}

pulumi.runtime.set_mocks(Mocks(), preview=False)

import importlib
main = importlib.import_module("__main__")

@pulumi.runtime.test
def test_workload_identity() -> pulumi.Output:
    return main.cluster.workload_identity_config.apply(
        lambda w: None if w and w.workload_pool else (_ for _ in ()).throw(AssertionError("WI not set"))
    )
# CLI: use the exported kubeconfig against the live cluster
pulumi stack output kubeconfig --show-secrets > kubeconfig.yaml
KUBECONFIG=kubeconfig.yaml kubectl get nodes

Gotchas & Edge Cases

A regional location triples your node count and cost. Setting location to a region (e.g. us-central1) spreads nodes across three zones, so min_node_count=1 means three nodes total — one per zone. Use a zonal location (e.g. us-central1-a) for cheaper dev clusters, and know that switching between regional and zonal forces a GKE cluster replacement.

Workload Identity needs the IAM binding too, not just GKE_METADATA. Enabling workload_identity_config and GKE_METADATA is only half the setup. Each Kubernetes service account must be annotated and bound to a GCP service account via roles/iam.workloadIdentityUser before pods can impersonate it. Without that binding, pod auth fails with a metadata-server permission error.

deletion_protection defaults to blocking pulumi destroy. Recent provider versions default cluster deletion_protection to True, so pulumi destroy fails until you set it to False and run pulumi up first. For dev stacks set it False explicitly; for production leave it on and disable deliberately when you mean to tear down.

Frequently Asked Questions

Why remove the default node pool? The default node pool is part of the GKE cluster resource, so changing node configuration can force the entire cluster to recreate. Setting remove_default_node_pool=True and managing an aws-style separate NodePool lets you resize, upgrade, or replace nodes without disturbing the control plane.

What does Workload Identity actually give me? It lets a Kubernetes service account impersonate a GCP service account, so pods authenticate to GCP APIs with short-lived tokens instead of mounted node keys. You enable workload_pool on the GKE cluster, GKE_METADATA on the node pool, and then bind each KSA to a GSA with roles/iam.workloadIdentityUser.

How do I authenticate kubectl with the exported kubeconfig? The kubeconfig uses the gke-gcloud-auth-plugin exec credential. Install it (gcloud components install gke-gcloud-auth-plugin), then KUBECONFIG=kubeconfig.yaml kubectl get nodes authenticates with your gcloud identity automatically.

Should I use a regional or zonal cluster? Regional clusters replicate the control plane and nodes across zones for high availability at roughly triple the node cost. Zonal clusters are cheaper and fine for development. Choose at creation — switching forces a GKE cluster replacement.

How do I keep the kubeconfig out of plaintext state? Wrap the assembled kubeconfig in pulumi.Output.secret() before exporting, as shown. Pulumi then stores it encrypted in state and masks it in pulumi stack output unless you pass --show-secrets.