Idempotency and Drift Detection in Python IaC
Idempotency means running the same infrastructure program twice produces the same cloud state—no duplicate resources, no needless replacements. Drift is the gap that opens when reality diverges from that state through manual console edits or out-of-band automation. This guide shows how Pulumi and CDKTF guarantee convergence, how to detect drift with pulumi refresh and cdktf diff, and how to build a typed helper that compares desired against actual state. It is part of the IaC Design Principles within Python IaC Fundamentals & Strategy.
Context
A correct IaC program is a pure function of its inputs: given the same configuration, it must converge on the same resources regardless of how many times you run it. Idempotency is what makes that safe—re-applying after a failed deploy resumes rather than duplicates. Drift breaks the contract from the other direction: someone widens a security group in the console, and your state file no longer describes the live world. Both engines treat the state file as the source of truth, so detecting and reconciling drift is a first-class operation rather than an afterthought. This builds directly on the typed contracts in Python Typing for Cloud Resource Definitions, which catch schema errors before they ever reach the provider.
Prerequisites
- Python 3.9+ with
from __future__ import annotationsenabled in modules. - Pinned engine SDKs:
pulumi>=3.0,pulumi-aws>=6.0, orcdktf>=0.20with the matchingcdktf-cli. - A remote state backend with locking already configured (S3 + DynamoDB, Pulumi Cloud, or Terraform Cloud) so refresh operations are serialized.
- IAM credentials with read permission on every resource type under management—
refreshanddiffcall the provider'sDescribe/GetAPIs. pytest>=7for the verification step.
Implementation
1. Make re-runs converge
The engines achieve idempotency by diffing the desired graph against the recorded state and emitting only the delta. Your job is to keep resource identities stable: never derive a resource_name from a timestamp, random value, or list ordering, or every run looks like a new resource and forces a replace.
# infra/idempotent_naming.py
# CLI: pulumi up --stack dev # run twice — second run must report "0 changes"
from __future__ import annotations
import pulumi
import pulumi_aws as aws
# Provider note: a deterministic logical name keeps the resource's URN stable across runs,
# so the engine matches it to existing state instead of creating a duplicate bucket.
def make_bucket(env: str) -> aws.s3.BucketV2:
return aws.s3.BucketV2(
f"artifacts-{env}", # stable: derived only from config input
bucket=f"acme-artifacts-{env}",
tags={"managed-by": "pulumi", "env": env},
)
# State implication: re-running with identical inputs is a no-op; the checkpoint is unchanged.
2. Detect drift against the live cloud
Refresh reconciles the state file with the provider's current view without changing infrastructure. Run it before a deploy in CI and gate on whether anything moved.
# Pulumi: update state from the cloud, then assert nothing drifted
pulumi refresh --yes --stack prod
pulumi preview --diff --expect-no-changes --stack prod # non-zero exit if drift remains
# CDKTF: synthesize then diff the plan against live state
cdktf diff --stack prod # exits non-zero when the plan is non-empty
3. Compare desired vs actual with a typed helper
For resources the engine cannot fully model—or when you want a custom alert payload—drop to a small comparator. Keep it typed so the comparison fields are explicit and mypy-checked.
# infra/drift_check.py
# CLI: python -m infra.drift_check
from __future__ import annotations
from dataclasses import dataclass
from typing import Mapping
@dataclass(frozen=True)
class ResourceSnapshot:
resource_id: str
attributes: Mapping[str, str]
@dataclass(frozen=True)
class DriftReport:
resource_id: str
drifted_keys: tuple[str, ...]
@property
def has_drift(self) -> bool:
return len(self.drifted_keys) > 0
def compare(desired: ResourceSnapshot, actual: ResourceSnapshot) -> DriftReport:
"""Return the attribute keys whose live value diverges from the desired value."""
# Provider note: `actual` is built from a read-only Describe call (e.g. boto3),
# so this comparison never mutates cloud state.
drifted = tuple(
key
for key, want in desired.attributes.items()
if actual.attributes.get(key) != want
)
return DriftReport(resource_id=desired.resource_id, drifted_keys=drifted)
Verification
A minimal pytest case proves the comparator flags a widened security-group rule and stays quiet when state matches.
# infra/tests/test_drift_check.py
# CLI: pytest infra/tests/test_drift_check.py -q
from __future__ import annotations
from infra.drift_check import ResourceSnapshot, compare
def test_detects_out_of_band_change() -> None:
desired = ResourceSnapshot("sg-01", {"ingress_cidr": "10.0.0.0/16"})
actual = ResourceSnapshot("sg-01", {"ingress_cidr": "0.0.0.0/0"}) # console edit
report = compare(desired, actual)
assert report.has_drift
assert report.drifted_keys == ("ingress_cidr",)
def test_converged_state_reports_no_drift() -> None:
snap = ResourceSnapshot("sg-01", {"ingress_cidr": "10.0.0.0/16"})
assert not compare(snap, snap).has_drift
Gotchas & Edge Cases
Refresh can hide a destructive deploy. pulumi refresh rewrites state to match reality, so a resource someone deleted in the console will be re-created on the next up rather than flagged. Run preview --expect-no-changes after refresh and fail the pipeline on a non-empty diff instead of auto-applying.
Provider-computed fields create phantom drift. Some attributes (auto-assigned ARNs, default tags injected by the provider, timestamps) differ on every read and will show as drift in a naive comparator. Exclude known computed keys from the desired.attributes map, or compare only the fields you actually manage.
Non-deterministic inputs defeat idempotency silently. A cidr_block pulled from an unsorted set, or a name built from datetime.now(), makes each run look different and triggers replacements that can cause downtime. Sort collections and derive every logical name from configuration, never from runtime entropy.
Frequently Asked Questions
Does pulumi refresh change my infrastructure?
No. Refresh only updates the local/remote state file to match what the provider reports; it issues read calls, not writes. The risk is the opposite: after refreshing, a subsequent up may act on the newly-reconciled state, so always preview before applying.
How is CDKTF drift detection different from Pulumi?
CDKTF delegates to Terraform: cdktf diff runs terraform plan against the synthesized HCL JSON, comparing the plan to the state backend. Pulumi computes the diff inside its own engine after an optional refresh. Both surface out-of-band changes, but only Pulumi's refresh rewrites state as a distinct step.
Can strong typing prevent drift on its own? Partially. Types catch schema and configuration errors at edit time, but drift originates outside your code—someone editing the cloud directly. Pair the typed contracts from Python Typing for Cloud Resource Definitions with a scheduled refresh-and-diff job to cover both failure directions.
How often should I run drift detection?
Run it on every deploy as a pre-apply gate, and on a schedule (hourly or daily) for production stacks via a CI job that calls pulumi refresh + preview --expect-no-changes or cdktf diff and alerts on a non-zero exit.
Related
- Python Typing for Cloud Resource Definitions — edit-time guarantees that complement runtime drift detection.
- How to Structure Python IaC Projects for Scale — where to wire scheduled drift scans into the project layout and CI gates.
- IaC Design Principles — the parent section covering state, locking, and convergence invariants.