Using boto3 Inside Pulumi and CDKTF

Pulumi and CDKTF providers cover most AWS resources, but not every lookup or imperative step has a provider equivalent — and reaching for boto3 inside a deployment program is safe only if you keep it read-only or strictly idempotent. This page is part of the Cloud Provider SDKs in Python workflow within Python IaC Fundamentals & Strategy, and it shows exactly when to drop to the AWS SDK and how to do it without corrupting framework state.

The core risk is simple: Pulumi and CDKTF track every resource they create in their own state. A boto3 call that creates or mutates a resource is invisible to that state, so the framework will never plan, diff, or destroy it — you have created untracked drift. The safe pattern is to use boto3 for reads (data the provider has no data source for) and to fence off any genuine side effect behind an idempotency guard.

When to reach for boto3

Use a provider resource or data source first — always. Drop to boto3 only when:

  • A data source is missing. You need an attribute the provider does not expose (for example, a quota, a regional service availability flag, or an account-level setting with no Terraform data source).
  • A one-off lookup feeds configuration. You want the default VPC ID, the latest AMI matching a custom filter, or an existing KMS key ARN to wire into a resource argument.
  • An imperative step has no declarative model. Starting a one-time export, triggering a Lambda for a bootstrap check, or reading a parameter that another team manages out-of-band.

If the task creates persistent infrastructure, write a provider resource instead. boto3 creation belongs in Pulumi dynamic providers, not inline in a stack.

Prerequisites

  • Python 3.9+ with boto3 >= 1.34 and botocore pinned in your lockfile.
  • pulumi >= 3.0 with pulumi-aws, or cdktf >= 0.20 with the generated AWS provider.
  • AWS credentials resolvable by the default chain — set up per Best practices for managing cloud credentials in Python.
  • IAM permissions for read-only describe calls (e.g. ec2:DescribeVpcs, ec2:DescribeImages) on the principal running the deployment.

Implementation

1. Read-only lookups in a Pulumi program

A boto3 read runs at program evaluation time, before Pulumi builds its resource graph. The returned value is a plain Python value (not an Output), so you can pass it straight into resource arguments. Wrap the client in a typed helper so the call site stays declarative.

# Run: pulumi up
from __future__ import annotations
from dataclasses import dataclass
import boto3
import pulumi
import pulumi_aws as aws

@dataclass(frozen=True)
class NetworkLookup:
    region: str

    def default_vpc_id(self) -> str:
        # State implication: this is a READ. boto3 sees nothing Pulumi created;
        # it only reads pre-existing account state, so it adds no untracked drift.
        client = boto3.client("ec2", region_name=self.region)
        resp = client.describe_vpcs(
            Filters=[{"Name": "isDefault", "Values": ["true"]}]
        )
        vpcs = resp.get("Vpcs", [])
        if not vpcs:
            raise RuntimeError(f"No default VPC in {self.region}")
        return vpcs[0]["VpcId"]

lookup = NetworkLookup(region="us-east-1")
sg = aws.ec2.SecurityGroup(
    "app-sg",
    vpc_id=lookup.default_vpc_id(),  # plain str, resolved synchronously
    description="App tier",
)

Provider note: Prefer the native data source when one exists — aws.ec2.get_vpc(default=True) is equivalent here and keeps the lookup inside Pulumi's own provider plumbing. Use boto3 only when no get_* function covers the attribute you need.

2. Read-only lookups in a CDKTF program

CDKTF evaluates Python at synth time. A boto3 read during synth produces a literal that gets baked into the synthesized Terraform JSON — fine for stable values, dangerous for volatile ones (a "latest AMI" lookup will silently change your plan on every synth). Cache or pin volatile reads.

# Run: cdktf synth
from __future__ import annotations
import boto3
from constructs import Construct
from cdktf import TerraformStack
from cdktf_cdktf_provider_aws.provider import AwsProvider
from cdktf_cdktf_provider_aws.instance import Instance

def latest_ami(region: str, owner: str, name_pattern: str) -> str:
    # State implication: value is frozen into synthesized JSON at synth time.
    # Pin owner+pattern tightly so the resolved AMI is reproducible across CI runs.
    client = boto3.client("ec2", region_name=region)
    images = client.describe_images(
        Owners=[owner],
        Filters=[{"Name": "name", "Values": [name_pattern]}],
    )["Images"]
    newest = max(images, key=lambda i: i["CreationDate"])
    return newest["ImageId"]

class AppStack(TerraformStack):
    def __init__(self, scope: Construct, id_: str) -> None:
        super().__init__(scope, id_)
        AwsProvider(self, "aws", region="us-east-1")
        Instance(
            self, "app",
            ami=latest_ami("us-east-1", "099720109477", "ubuntu/*22.04*"),
            instance_type="t3.micro",
        )

Provider note: The native DataAwsAmi data source resolves the AMI at terraform apply time instead of synth time, which is usually safer for CI. Reach for boto3 only when the filter you need cannot be expressed as a data source.

3. Fencing an imperative side effect

If you genuinely must perform a mutation (a bootstrap that no provider models), guard it so re-runs converge. Check current state first; act only when needed. This is the same idempotency contract a provider gives you for free.

# Run: python -m bootstrap
from __future__ import annotations
import boto3
from botocore.exceptions import ClientError

def ensure_account_ebs_encryption(region: str) -> bool:
    """Enable default EBS encryption only if not already on. Returns True if changed."""
    client = boto3.client("ec2", region_name=region)
    # Read before write — the guard that makes this safe to re-run.
    current = client.get_ebs_encryption_by_default()["EbsEncryptionByDefault"]
    if current:
        return False  # already converged, no side effect
    # State implication: this mutation is NOT tracked by Pulumi/CDKTF state.
    # Keep such calls out of the deployment program; run them as a separate step.
    client.enable_ebs_encryption_by_default()
    return True

Verification

Assert that read helpers are pure reads by mocking boto3 with moto and confirming no create call fires. This mirrors the testing approach in the parent SDK overview.

# Run: pytest tests/test_lookups.py -v
from moto import mock_aws
import boto3
from infra.lookups import NetworkLookup

@mock_aws
def test_default_vpc_lookup_is_read_only() -> None:
    boto3.client("ec2", region_name="us-east-1").create_default_vpc()
    vpc_id = NetworkLookup(region="us-east-1").default_vpc_id()
    assert vpc_id.startswith("vpc-")
    # No assertion on creation: the helper must never call create_*.

Gotchas & Edge Cases

boto3 reads run before credentials are validated by the provider. If the deployment principal lacks the describe permission, the program crashes at evaluation with AccessDenied before Pulumi or CDKTF prints a plan — confusing because no resource was touched. Grant read permissions explicitly.

Synth-time reads make CDKTF plans non-deterministic. A "latest" lookup changes the synthesized JSON between runs, so terraform plan shows spurious diffs and CI snapshot tests flap. Pin the value, cache it, or switch to a data source that resolves at apply time.

boto3 mutations create invisible drift. Anything you create via boto3 inside a stack is untracked: pulumi destroy and cdktf destroy will leave it behind, and pulumi refresh will never reconcile it. If you need lifecycle management, model it as a provider resource or a Pulumi dynamic provider instead.