Testing Python Infrastructure Code

Testing Python infrastructure code means proving that your resource definitions produce the configuration you intend before any cloud API is touched — and it is a primary reason engineers choose Python over a DSL. This page is part of the broader Python IaC fundamentals and strategy approach, and it lays out a testing pyramid that moves from fast unit checks up through snapshot, integration, and policy gates.

The IaC testing pyramid Four stacked layers: unit tests with mocks at the wide base, then snapshot tests, then integration tests, then policy checks at the narrow top. A side note shows cost and speed. Policy Integration LocalStack / moto Snapshot synthesized JSON Unit (mocks) fast, no network slow fast
The IaC testing pyramid: many fast unit tests at the base, fewer slow integration and policy checks at the top.

Why Untested Infrastructure Code Breaks

Infrastructure expressed as Python is still code, and code that is never asserted against drifts from intent the moment it grows past a single file. A mistyped CIDR block, a security group rule opened to 0.0.0.0/0, or a missing parent on a child resource will all synthesize cleanly and only surface as an incident after deployment. The cost of catching these defects rises sharply the later you find them: a failed assertion in pytest costs seconds, a failed apply against production costs an outage.

The Python ecosystem is what makes IaC testing tractable. You reuse pytest, fixtures, mocks, and CI runners you already operate for application code. That parity is the differentiator over HCL, where testing is bolted on through separate tooling. Each layer of the pyramid trades speed for fidelity, and a healthy suite weights heavily toward the fast base.

The Four Layers

Unit tests with mocks

Unit tests run your resource graph in-process with the cloud API replaced by a mock, so they execute in milliseconds and need no credentials. For Pulumi this means pulumi.runtime.set_mocks; the dedicated walkthrough lives in Unit Testing Pulumi Programs with Mocks. When your IaC helper code calls boto3 directly for a lookup, you mock the AWS SDK itself with moto, covered in Mocking AWS Services with moto in pytest.

# CLI: pytest tests/test_tags.py -v
# State implication: pure in-memory assertion, no provider call, no state mutation.
from dataclasses import dataclass

@dataclass(frozen=True)
class TagPolicy:
    cost_center: str
    environment: str

def required_tags(policy: TagPolicy) -> dict[str, str]:
    return {"CostCenter": policy.cost_center, "Environment": policy.environment}

def test_required_tags_complete() -> None:
    tags = required_tags(TagPolicy(cost_center="cc-42", environment="prod"))
    assert set(tags) == {"CostCenter", "Environment"}

Snapshot tests

Snapshot tests synthesize the stack to its final declarative form and compare it against a stored golden file, so any unintended change to the generated configuration fails the diff. This is the natural fit for CDKTF, whose synthesis step emits Terraform JSON you can assert against; the full pattern is in Snapshot Testing CDKTF Stacks with pytest.

Integration tests

Integration tests provision against a real or emulated backend — moto for AWS API surface, or LocalStack for a fuller emulation — and verify that resources actually come up. They are slower and need a clean teardown, so keep them few and run them on merge rather than on every keystroke.

Policy tests

Policy checks enforce organizational rules (no public buckets, mandatory encryption, approved instance types) across the synthesized output. They sit at the narrow top of the pyramid and act as the final gate before deployment.

Where Each Tool Fits

Layer Tool Needs cloud creds Speed What it catches
Unit pulumi.runtime mocks, moto No Milliseconds Wrong inputs, missing tags, graph shape
Snapshot cdktf.Testing.synth No Fast Drift in synthesized JSON
Integration moto / LocalStack Emulated Seconds–minutes Resources that fail to create
Policy Checkov, CrossGuard No Fast Compliance violations

Both major Python IaC tools support this stack. The Pulumi patterns and provider management section uses pulumi.runtime.Mocks for its component tests, and the CDKTF workflows and Terraform synthesis section leans on synthesized-JSON snapshots — same pyramid, different synthesis model.

Wiring It Into CI

Run the fast layers on every push and gate merges on the slow ones:

# CLI: invoked from .github/workflows/test.yml
# Provider note: only the integration stage needs cloud or LocalStack credentials.
pytest tests/unit -q                 # base layer, runs on every push
pytest tests/snapshot -q             # golden-file comparison
pytest tests/integration -q          # gated behind merge to main
checkov -d cdktf.out --quiet         # policy gate before deploy

Keep the unit and snapshot stages under a few seconds total so engineers run them locally without friction. The integration and policy stages belong in the pipeline where their latency and credential needs are acceptable.

Frequently Asked Questions

Do I need real cloud credentials to test Python IaC? No, for the base of the pyramid. Unit tests with pulumi.runtime.set_mocks and snapshot tests with cdktf.Testing.synth run entirely in-process. Only integration tests need credentials, and even those can target moto or LocalStack instead of a live account.

What is the difference between a snapshot test and an integration test? A snapshot test compares the synthesized declarative output (Terraform JSON or a Pulumi resource graph) against a stored golden file without deploying anything. An integration test actually provisions resources against an emulated or real backend and checks they come up correctly.

How many integration tests should I write? Few. They are the slowest and most brittle layer. Cover the critical paths that mocks cannot verify — actual resource creation, IAM evaluation, networking — and push everything else down to unit and snapshot tests.

Can I share one pytest suite across Pulumi and CDKTF projects? You share the runner, fixtures, and CI conventions, but the assertion style differs: Pulumi tests resolve Output values through mocks, while CDKTF tests assert against synthesized JSON. Keep them in separate test modules under the same tests/ tree.