Securing Pulumi secrets with AWS KMS and HashiCorp Vault

Production infrastructure demands cryptographic control over state files. Pulumi’s default service-managed encryption lacks audit trails and cross-account portability. Migrating to AWS KMS or HashiCorp Vault enforces compliance boundaries. This guide details atomic provider swaps, strict Python 3.9+ typing patterns, and state recovery workflows.

Environment Isolation & Python 3.9+ Baseline

Virtual Environment & Dependency Pinning

Infrastructure code requires deterministic dependency resolution. Floating versions introduce silent breaking changes during provider upgrades. Pin pulumi, boto3, and hvac in pyproject.toml or requirements.txt. Isolate each stack in a dedicated virtual environment.

CLI: Initialize and activate a clean Python environment.

python3.9 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Strict Type Checking with mypy

Dynamic typing obscures configuration resolution errors until deployment. Enforce mypy --strict in CI pipelines. Annotate all configuration loaders and resource constructors. Catch None propagation before the Pulumi engine evaluates the dependency graph.

IAM & Vault Auth Pre-Flight Checks

Authentication deadlocks halt stack operations mid-execution. Validate AWS IAM kms:Decrypt and kms:Encrypt permissions before initializing the provider. For Vault, verify AppRole or TLS certificate validity. Run a dry-run credential fetch to confirm network routing and policy attachment.

Migrating to AWS KMS Secrets Provider

CLI Provider Swap Command

State migration must remain atomic. The change-secrets-provider subcommand re-encrypts ciphertext values without altering resource URNs. Target a specific KMS alias and AWS SDK version to avoid legacy API deprecation.

CLI: Execute the atomic migration sequence.

pulumi stack change-secrets-provider "awskms://alias/pulumi-secrets-key?region=us-east-1&awssdk=v2"

Typed Secret Retrieval in Python

Raw string interpolation bypasses Pulumi’s secret masking engine. Wrap sensitive values in pulumi.Output types immediately. Use explicit return annotations to prevent accidental serialization during resource graph compilation.

import pulumi
from typing import Dict, Optional
from pulumi import Output

def get_db_credentials(config: pulumi.Config) -> Dict[str, Optional[Output[str]]]:
 """Retrieve typed database credentials with explicit secret wrapping."""
 username: str = config.require("db_username")
 password: Optional[Output[str]] = config.get_secret("db_password")
 
 if password is None:
 raise ValueError("Missing required secret: db_password")
 
 return {"user": username, "pass": password}

IAM Policy Scoping & Least Privilege

Broad KMS permissions violate zero-trust architectures. Scope policies to specific key aliases and restrict kms:GenerateDataKey to the Pulumi CLI execution role. Cross-account decryption requires explicit grant propagation. Consult the AWS Provider Deep Dive for granular IAM policy templates and alias routing strategies.

Integrating HashiCorp Vault Secrets Provider

Vault Transit Engine Configuration

The transit backend provides encryption-as-a-service without persistent secret storage. Enable the transit secrets engine and generate a dedicated keyring. Configure key rotation policies to align with organizational compliance windows.

CLI: Provision the transit path and key.

vault secrets enable transit
vault write -f transit/keys/pulumi-stack type=aes256-gcm96

Token & Auth Method Mapping

Pulumi requires persistent authentication during stack operations. Map AppRole, TLS, or Kubernetes service accounts to the transit path. Align token TTLs with maximum deployment durations. Short-lived tokens trigger mid-apply 403 Forbidden failures.

Python Fallback Typing Patterns

Dynamic secret resolution often requires conditional fallbacks. Use typing.Optional and typing.Dict[str, Any] for heterogeneous configuration maps. Validate secret presence before passing values to resource constructors.

from typing import Dict, Optional, Any
import pulumi

def resolve_vault_secrets(config: pulumi.Config) -> Dict[str, Any]:
 """Dynamically resolve Vault-backed secrets with safe fallback typing."""
 api_key: Optional[str] = config.get_secret("vault_api_key")
 region: str = config.require("deployment_region")
 
 resolved: Dict[str, Any] = {
 "api_key": api_key,
 "region": region,
 "fallback_enabled": api_key is not None
 }
 return resolved

State Safety, Drift Detection & Safe Rollback

Pre-Migration State Snapshots

Provider transitions introduce cryptographic incompatibilities. Export the current state before executing any migration command. Store snapshots in version-controlled artifact storage. Maintain immutable backups for compliance audits.

CLI: Export stack state to a local artifact.

pulumi stack export --file state-pre-migration.json

Drift Detection via pulumi refresh

Post-migration state verification prevents silent configuration divergence. Run pulumi refresh to reconcile the local state file with live infrastructure. Review diff outputs for unexpected resource replacements or property resets.

Atomic State Import & Rollback

Decryption failures require immediate state restoration. Import the pre-migration snapshot to revert cryptographic bindings. Force overwrite the corrupted state file to unblock subsequent deployments. Reference Pulumi Patterns & Provider Management for automated stack lifecycle governance and versioned state recovery pipelines.

CLI: Execute forced state rollback on failure.

pulumi stack import --file state-pre-migration.json --force

Testing Boundaries & Secret Masking Validation

pytest Isolation for IaC

Unit tests must never invoke live cloud providers. Isolate configuration parsing from resource provisioning logic. Mock the Pulumi runtime engine to simulate stack evaluation without network calls.

Mocking KMS/Vault Responses

Patch pulumi.runtime.invoke and pulumi.config.Config using unittest.mock. Return deterministic ciphertext payloads during test execution. Validate type coercion and error handling paths without exposing real credentials.

CLI Output Redaction Verification

Secret masking relies on Pulumi’s internal serialization layer. Verify that pulumi preview and pulumi up outputs display [secret] placeholders. Assert that .apply() string interpolation raises ValueError when attempting synchronous plaintext conversion.

Common Mistakes & Remediation

Mistake Remediation Impact
Using pulumi config set without --secret during migration Always append --secret or enforce config.require_secret() in code. Verify ciphertext format in Pulumi.<stack>.yaml. Plaintext secrets committed to VCS, triggering compliance violations and audit failures.
Skipping IAM policy scoping or Vault transit path validation Apply least-privilege kms:Decrypt/kms:Encrypt or Vault transit/encrypt/* policies. Validate with aws kms describe-key or vault read transit/keys/pulumi. CLI hangs on pulumi up with opaque AccessDenied or 403 Forbidden errors.
Ignoring Python type hints for pulumi.Output secrets Wrap secrets in pulumi.Output types. Avoid synchronous string operations on Output objects. Use .apply() for async transformations. Runtime TypeError during dependency resolution and failed resource graph compilation.

Frequently Asked Questions

Can I migrate Pulumi secrets to KMS/Vault without recreating resources? Yes. pulumi stack change-secrets-provider only re-encrypts state values. Resource IDs and URNs remain intact. Validate the operation with pulumi preview before applying changes.

How does drift detection handle rotated KMS keys or Vault tokens? Pulumi does not auto-detect key rotation. Implement CI/CD checks with pulumi refresh and monitor AWS CloudTrail or Vault audit logs for AccessDenied events during stack operations. Align token lifecycles with deployment windows.

What is the testing boundary for mocking KMS/Vault in Python IaC? Use unittest.mock to patch pulumi.runtime.invoke and pulumi.config.Config. Never mock the actual secrets provider. Test configuration resolution, type safety, and error propagation in strict isolation.