Idempotency and Drift Detection in Python IaC

Idempotency means running the same infrastructure program twice produces the same cloud state—no duplicate resources, no needless replacements. Drift is the gap that opens when reality diverges from that state through manual console edits or out-of-band automation. This guide shows how Pulumi and CDKTF guarantee convergence, how to detect drift with pulumi refresh and cdktf diff, and how to build a typed helper that compares desired against actual state. It is part of the IaC Design Principles within Python IaC Fundamentals & Strategy.

Context

A correct IaC program is a pure function of its inputs: given the same configuration, it must converge on the same resources regardless of how many times you run it. Idempotency is what makes that safe—re-applying after a failed deploy resumes rather than duplicates. Drift breaks the contract from the other direction: someone widens a security group in the console, and your state file no longer describes the live world. Both engines treat the state file as the source of truth, so detecting and reconciling drift is a first-class operation rather than an afterthought. This builds directly on the typed contracts in Python Typing for Cloud Resource Definitions, which catch schema errors before they ever reach the provider.

Prerequisites

  • Python 3.9+ with from __future__ import annotations enabled in modules.
  • Pinned engine SDKs: pulumi>=3.0, pulumi-aws>=6.0, or cdktf>=0.20 with the matching cdktf-cli.
  • A remote state backend with locking already configured (S3 + DynamoDB, Pulumi Cloud, or Terraform Cloud) so refresh operations are serialized.
  • IAM credentials with read permission on every resource type under management—refresh and diff call the provider's Describe/Get APIs.
  • pytest>=7 for the verification step.

Implementation

1. Make re-runs converge

The engines achieve idempotency by diffing the desired graph against the recorded state and emitting only the delta. Your job is to keep resource identities stable: never derive a resource_name from a timestamp, random value, or list ordering, or every run looks like a new resource and forces a replace.

# infra/idempotent_naming.py
# CLI: pulumi up --stack dev   # run twice — second run must report "0 changes"
from __future__ import annotations

import pulumi
import pulumi_aws as aws

# Provider note: a deterministic logical name keeps the resource's URN stable across runs,
# so the engine matches it to existing state instead of creating a duplicate bucket.
def make_bucket(env: str) -> aws.s3.BucketV2:
    return aws.s3.BucketV2(
        f"artifacts-{env}",            # stable: derived only from config input
        bucket=f"acme-artifacts-{env}",
        tags={"managed-by": "pulumi", "env": env},
    )

# State implication: re-running with identical inputs is a no-op; the checkpoint is unchanged.

2. Detect drift against the live cloud

Refresh reconciles the state file with the provider's current view without changing infrastructure. Run it before a deploy in CI and gate on whether anything moved.

# Pulumi: update state from the cloud, then assert nothing drifted
pulumi refresh --yes --stack prod
pulumi preview --diff --expect-no-changes --stack prod   # non-zero exit if drift remains

# CDKTF: synthesize then diff the plan against live state
cdktf diff --stack prod   # exits non-zero when the plan is non-empty

3. Compare desired vs actual with a typed helper

For resources the engine cannot fully model—or when you want a custom alert payload—drop to a small comparator. Keep it typed so the comparison fields are explicit and mypy-checked.

# infra/drift_check.py
# CLI: python -m infra.drift_check
from __future__ import annotations

from dataclasses import dataclass
from typing import Mapping


@dataclass(frozen=True)
class ResourceSnapshot:
    resource_id: str
    attributes: Mapping[str, str]


@dataclass(frozen=True)
class DriftReport:
    resource_id: str
    drifted_keys: tuple[str, ...]

    @property
    def has_drift(self) -> bool:
        return len(self.drifted_keys) > 0


def compare(desired: ResourceSnapshot, actual: ResourceSnapshot) -> DriftReport:
    """Return the attribute keys whose live value diverges from the desired value."""
    # Provider note: `actual` is built from a read-only Describe call (e.g. boto3),
    # so this comparison never mutates cloud state.
    drifted = tuple(
        key
        for key, want in desired.attributes.items()
        if actual.attributes.get(key) != want
    )
    return DriftReport(resource_id=desired.resource_id, drifted_keys=drifted)

Verification

A minimal pytest case proves the comparator flags a widened security-group rule and stays quiet when state matches.

# infra/tests/test_drift_check.py
# CLI: pytest infra/tests/test_drift_check.py -q
from __future__ import annotations

from infra.drift_check import ResourceSnapshot, compare


def test_detects_out_of_band_change() -> None:
    desired = ResourceSnapshot("sg-01", {"ingress_cidr": "10.0.0.0/16"})
    actual = ResourceSnapshot("sg-01", {"ingress_cidr": "0.0.0.0/0"})  # console edit
    report = compare(desired, actual)
    assert report.has_drift
    assert report.drifted_keys == ("ingress_cidr",)


def test_converged_state_reports_no_drift() -> None:
    snap = ResourceSnapshot("sg-01", {"ingress_cidr": "10.0.0.0/16"})
    assert not compare(snap, snap).has_drift

Gotchas & Edge Cases

Refresh can hide a destructive deploy. pulumi refresh rewrites state to match reality, so a resource someone deleted in the console will be re-created on the next up rather than flagged. Run preview --expect-no-changes after refresh and fail the pipeline on a non-empty diff instead of auto-applying.

Provider-computed fields create phantom drift. Some attributes (auto-assigned ARNs, default tags injected by the provider, timestamps) differ on every read and will show as drift in a naive comparator. Exclude known computed keys from the desired.attributes map, or compare only the fields you actually manage.

Non-deterministic inputs defeat idempotency silently. A cidr_block pulled from an unsorted set, or a name built from datetime.now(), makes each run look different and triggers replacements that can cause downtime. Sort collections and derive every logical name from configuration, never from runtime entropy.

Frequently Asked Questions

Does pulumi refresh change my infrastructure? No. Refresh only updates the local/remote state file to match what the provider reports; it issues read calls, not writes. The risk is the opposite: after refreshing, a subsequent up may act on the newly-reconciled state, so always preview before applying.

How is CDKTF drift detection different from Pulumi? CDKTF delegates to Terraform: cdktf diff runs terraform plan against the synthesized HCL JSON, comparing the plan to the state backend. Pulumi computes the diff inside its own engine after an optional refresh. Both surface out-of-band changes, but only Pulumi's refresh rewrites state as a distinct step.

Can strong typing prevent drift on its own? Partially. Types catch schema and configuration errors at edit time, but drift originates outside your code—someone editing the cloud directly. Pair the typed contracts from Python Typing for Cloud Resource Definitions with a scheduled refresh-and-diff job to cover both failure directions.

How often should I run drift detection? Run it on every deploy as a pre-apply gate, and on a schedule (hourly or daily) for production stacks via a CI job that calls pulumi refresh + preview --expect-no-changes or cdktf diff and alerts on a non-zero exit.