Skip to main content

Terraform + CloudFormation StackSets: Deploying IAM Roles Across Every Account in Your Organization

About the author: I'm Charles Sieg, a cloud architect and platform engineer who builds apps, services, and infrastructure for Fortune 1000 clients through Vantalect. If your organization is rethinking its software strategy in the age of AI-assisted engineering, let's talk.

Every multi-account AWS organization needs a baseline IAM role in every member account. Cross-account access for security tooling, centralized billing queries, incident response, compliance scanning: the use cases pile up fast. I have deployed this pattern across six enterprise organizations, each with 50 to 400 member accounts. The approach that survives at scale is Terraform managing a CloudFormation StackSet from the management account, with service-managed permissions and auto-deployment enabled. New accounts get the role automatically. No tickets. No manual steps. No drift.

This is the reference for engineers who need to ship this pattern in production. Not a getting-started walkthrough. I assume you already run Terraform against your management account and understand IAM role trust policies. What follows is the architecture, the full Terraform configuration, the CloudFormation template, the operational gotchas, and the failure modes that will bite you if you skip them.

Why StackSets for Cross-Account IAM

The Problem

You have an AWS Organization with dozens or hundreds of member accounts. You need an IAM role in every single one. The role trusts the management account (or a dedicated security account) so that a central automation can assume it. Common scenarios:

  • Security Hub aggregation: a central Lambda assumes a role in each member account to pull findings.
  • Cost Explorer queries: a billing dashboard assumes roles across accounts for consolidated reporting.
  • Incident response: an IR automation assumes roles in compromised accounts to isolate resources.
  • Compliance scanning: AWS Config or a third-party tool needs read access in every account.

You could create these roles manually. You could write a script that iterates over every account and calls aws iam create-role. You could use Terraform with multiple provider aliases. All of these approaches break the moment someone creates a new account and forgets to run the script.

Why Not Pure Terraform

The natural instinct for a Terraform shop is to write an aws_iam_role resource with a for_each over a list of account IDs, each with its own provider alias configured via assume_role. This works at small scale. At 50+ accounts, it falls apart:

Approach Accounts Supported Auto-Deploy New Accounts Terraform State Size Provider Configuration
Pure Terraform with provider aliases ~20 before pain No Grows linearly One provider block per account
Terraform + StackSet (self-managed) 1,000+ No Minimal Single management account provider
Terraform + StackSet (service-managed) 100,000+ Yes Minimal Single management account provider

The provider alias approach requires a provider block for every account. Terraform has to initialize credentials for every provider on every plan/apply. State files balloon. Adding a new account means editing Terraform code. It does not scale.

StackSets solve this. You define the IAM role once in a CloudFormation template, wrap it in a StackSet, target the entire organization or specific OUs, and every current and future account gets the role. Terraform manages the StackSet itself. CloudFormation handles the fan-out.

Permission Models: Self-Managed vs. Service-Managed

The most important decision when creating a StackSet is the permission model. Get this wrong and you will spend hours debugging role assumption failures.

Aspect Self-Managed Service-Managed
IAM role creation You create AWSCloudFormationStackSetAdministrationRole in management account and AWSCloudFormationStackSetExecutionRole in every target account AWS creates roles automatically via Organizations integration
Target specification List of account IDs Organizational Units (OUs) or entire organization
Auto-deployment to new accounts No Yes (configurable)
Delegated administration No Yes
Terraform permission_model "SELF_MANAGED" (default) "SERVICE_MANAGED"
Prerequisites Manual role creation in every target account Trusted access enabled for CloudFormation in Organizations

When to Use Self-Managed

Self-managed permissions make sense when you deploy to a small, fixed set of accounts outside your organization, or when you need granular control over which execution role the StackSet assumes in each account. I rarely use this model in production. The overhead of pre-creating roles in every target account defeats the purpose of using StackSets in the first place.

When to Use Service-Managed

Service-managed is the right choice for organizational deployments. AWS handles the role plumbing. You target OUs instead of individual account IDs. New accounts added to the OU get stack instances automatically. This is the model I use for every engagement.

No Yes Yes No Account IDs OUs Creating aStackSet Deploying withinan AWS Organization? Self-Managed Need auto-deployfor new accounts? Service-Managed Deploying tospecific account IDsor entire OUs?
Permission model decision tree

Enabling Trusted Access

Before you can use service-managed permissions, you need to enable trusted access for CloudFormation StackSets in AWS Organizations. This is a one-time operation from the management account:

aws organizations enable-aws-service-access \
  --service-principal member.org.stacksets.cloudformation.amazonaws.com

Or in Terraform:

resource "aws_organizations_organization" "this" {
  aws_service_access_principals = [
    "member.org.stacksets.cloudformation.amazonaws.com",
  ]
  feature_set = "ALL"
}

Skip this step and every StackSet creation will fail with a cryptic error about insufficient permissions. I have watched three different teams burn hours on this.

The Architecture

Here is how the pieces fit together. Terraform runs in the management account, creates a StackSet with a CloudFormation template that defines an IAM role, and targets the root OU (or specific OUs). CloudFormation deploys a stack instance to every member account in those OUs. Each stack instance creates the IAM role locally in that member account.

Auto-deploy TerraformManagement Account CloudFormationStackSet ORG Stack InstanceAccount A Stack InstanceAccount B Stack InstanceAccount C Stack InstanceAccount N... IAM Role Created IAM Role Created IAM Role Created IAM Role Created New AccountAdded to OU
StackSet deployment architecture

The Management Account

Terraform runs here. It creates the aws_cloudformation_stack_set resource and the aws_cloudformation_stack_instances resource. The management account holds the StackSet definition and orchestrates deployments. It does not receive a stack instance itself; StackSets with service-managed permissions skip the management account.

StackSets with service-managed permissions never deploy to the management account, even if the management account is in the targeted OU. If you need the IAM role in the management account too, create it separately with a standard Terraform aws_iam_role resource.

Auto-Deployment

With auto_deployment enabled, any new account added to a targeted OU automatically receives a stack instance. The IAM role appears in the new account within minutes. No Terraform run required. No human intervention. This is the behavior that makes StackSets worth the complexity over pure Terraform.

You also configure what happens when an account leaves the OU. Setting retain_stacks_on_account_removal = true keeps the IAM role in the account even after removal. I recommend this for security roles: you do not want to lose your incident response access just because someone moved an account to a different OU.

The Terraform Configuration

Here is the complete Terraform configuration. I will walk through each resource.

The StackSet Resource

resource "aws_cloudformation_stack_set" "cross_account_role" {
  name             = "cross-account-security-role"
  description      = "Deploys a cross-account IAM role to all member accounts"
  permission_model = "SERVICE_MANAGED"

  auto_deployment {
    enabled                          = true
    retain_stacks_on_account_removal = true
  }

  operation_preferences {
    failure_tolerance_percentage = 10
    max_concurrent_percentage    = 25
    region_concurrency_type      = "PARALLEL"
  }

  capabilities = ["CAPABILITY_NAMED_IAM"]

  template_body = file("${path.module}/templates/cross-account-role.yaml")

  parameters = {
    TrustedAccountId = data.aws_caller_identity.current.account_id
    RoleName         = "OrganizationSecurityAuditRole"
    ExternalId       = var.external_id
  }

  lifecycle {
    ignore_changes = [administration_role_arn]
  }
}

Key decisions in this configuration:

  • permission_model = "SERVICE_MANAGED" enables Organizations integration. AWS handles all the cross-account role plumbing.
  • auto_deployment.enabled = true ensures new accounts get the role automatically.
  • retain_stacks_on_account_removal = true keeps the role when accounts move between OUs.
  • capabilities = ["CAPABILITY_NAMED_IAM"] is required because the CloudFormation template creates a named IAM role. Omit this and the deployment fails silently in every account.
  • lifecycle.ignore_changes on administration_role_arn prevents a Terraform update loop. When using service-managed permissions, AWS sets this field automatically, and Terraform tries to clear it on every apply.

The Stack Instances

resource "aws_cloudformation_stack_instances" "all_accounts" {
  stack_set_name = aws_cloudformation_stack_set.cross_account_role.name

  deployment_targets {
    organizational_unit_ids = [data.aws_organizations_organization.current.roots[0].id]
  }

  regions = var.target_regions

  operation_preferences {
    failure_tolerance_percentage = 10
    max_concurrent_percentage    = 25
    region_concurrency_type      = "PARALLEL"
  }
}

I target the organization root to cover every account. If you only need the role in specific OUs (production accounts, security accounts), replace the root ID with those OU IDs.

The regions parameter specifies which regions get stack instances. IAM is global, so you only need one region. I use ["us-east-1"] for IAM-only StackSets. If your template includes regional resources (CloudWatch alarms, Config rules), specify every region you operate in.

Operation Preferences

Operation preferences control how fast the StackSet deploys and how many failures it tolerates before stopping.

Parameter Description Recommended Value
failure_tolerance_percentage Percentage of accounts that can fail before the operation stops 10% for initial deploy, 0% for updates
max_concurrent_percentage Percentage of accounts deployed to simultaneously 25% for large orgs, 100% for small orgs
region_concurrency_type Deploy regions in parallel or sequentially PARALLEL for IAM (global resource), SEQUENTIAL for regional resources
region_order Order of region deployment Only needed with SEQUENTIAL

For an IAM-only deployment, I run parallel with 25% concurrency and 10% failure tolerance. This deploys to a quarter of your accounts simultaneously and stops if more than 10% fail. Aggressive enough to finish in minutes for a 200-account org; conservative enough to catch systemic problems before they hit every account.

The CloudFormation Template

This is the template that CloudFormation deploys to every member account. It creates a single IAM role with a trust policy that allows the management account (or any account you specify) to assume it.

AWSTemplateFormatVersion: "2010-09-09"
Description: >
  Cross-account IAM role deployed via StackSet.
  Grants read-only security audit access to a trusted account.

Parameters:
  TrustedAccountId:
    Type: String
    Description: AWS account ID allowed to assume this role
  RoleName:
    Type: String
    Default: OrganizationSecurityAuditRole
    Description: Name of the IAM role to create
  ExternalId:
    Type: String
    Description: External ID for additional assume-role security

Resources:
  CrossAccountRole:
    Type: AWS::IAM::Role
    Properties:
      RoleName: !Ref RoleName
      MaxSessionDuration: 3600
      AssumeRolePolicyDocument:
        Version: "2012-10-17"
        Statement:
          - Effect: Allow
            Principal:
              AWS: !Sub "arn:aws:iam::${TrustedAccountId}:root"
            Action: "sts:AssumeRole"
            Condition:
              StringEquals:
                "sts:ExternalId": !Ref ExternalId
      ManagedPolicyArns:
        - "arn:aws:iam::aws:policy/SecurityAudit"
        - "arn:aws:iam::aws:policy/ReadOnlyAccess"
      Tags:
        - Key: ManagedBy
          Value: StackSet
        - Key: Purpose
          Value: CrossAccountSecurityAudit

Outputs:
  RoleArn:
    Description: ARN of the created cross-account role
    Value: !GetAtt CrossAccountRole.Arn

Trust Policy Design

The trust policy is the most security-sensitive part of this configuration. Three elements matter:

Element Purpose Recommendation
Principal Who can assume this role Specify the exact account ID, never use *
Condition: sts:ExternalId Prevents confused deputy attacks Always include for cross-account roles
MaxSessionDuration How long assumed sessions last 3600 seconds (1 hour) for automated tooling

The sts:ExternalId condition is optional but strongly recommended. Without it, any principal in the trusted account that has sts:AssumeRole permissions can assume your role. With it, the caller must know the external ID. This prevents confused deputy scenarios where a compromised service in the trusted account pivots into your member accounts.

Choosing Managed Policies

For a security audit role, SecurityAudit and ReadOnlyAccess cover most use cases. SecurityAudit grants the permissions that AWS Security Hub, GuardDuty, and Config need. ReadOnlyAccess adds read permissions for services that SecurityAudit misses.

If you need write access (incident response, automated remediation), create a custom policy with precisely scoped permissions. Never attach AdministratorAccess to a StackSet-deployed role. One compromised credential in the trusted account gives an attacker admin access to every member account in the organization.

Delegated Administration

Running Terraform against the management account is a security concern. The management account has god-mode permissions over the entire organization. Limiting direct access to it is an IAM best practice. StackSets support delegated administration: you register a member account (typically a dedicated security or infrastructure account) as a delegated administrator for CloudFormation StackSets.

resource "aws_organizations_delegated_administrator" "stacksets" {
  account_id        = var.delegated_admin_account_id
  service_principal = "member.org.stacksets.cloudformation.amazonaws.com"
}

Once registered, you run Terraform from the delegated admin account with call_as = "DELEGATED_ADMIN":

resource "aws_cloudformation_stack_set" "cross_account_role" {
  name             = "cross-account-security-role"
  permission_model = "SERVICE_MANAGED"
  call_as          = "DELEGATED_ADMIN"

  # ... rest of configuration
}

This moves StackSet management out of the management account entirely. I recommend this for every organization with more than 10 accounts. The management account should be a locked vault, not a daily driver.

Operational Considerations

Drift Detection

StackSets support drift detection across all stack instances. If someone manually modifies the IAM role in a member account (adds a policy, changes the trust policy), drift detection catches it.

aws cloudformation detect-stack-set-drift \
  --stack-set-name cross-account-security-role \
  --call-as DELEGATED_ADMIN

Run this on a schedule. I trigger it weekly via a CloudWatch Events rule. Drift in IAM roles is a security finding: someone bypassed the managed deployment to make a manual change. Investigate every instance.

Updates and Rollbacks

When you update the CloudFormation template (change the managed policies, modify the trust policy), the StackSet rolls the update across all accounts. The operation_preferences control the rollout speed and failure tolerance.

For IAM changes, I recommend:

  • Set failure_tolerance_percentage = 0 for updates. IAM changes should succeed everywhere or nowhere.
  • Set max_concurrent_percentage = 10 for updates. Roll slowly. If something is wrong with the new template, you want to catch it early.
  • Test the template change in a sandbox account first by deploying a standalone stack, before pushing through the StackSet.

CloudFormation handles rollbacks per-account. If a stack instance fails to update, it rolls back to the previous template. The StackSet operation continues to other accounts. You will see OUTDATED status on the failed instances, indicating they are running the old template.

Account Removal Behavior

When retain_stacks_on_account_removal = true, removing an account from a targeted OU leaves the IAM role in place. The stack instance is removed from the StackSet, but the IAM role persists in the account. This is the correct behavior for security roles.

When retain_stacks_on_account_removal = false, CloudFormation deletes the stack (and the IAM role) when the account leaves the OU. Use this for non-critical resources where cleanup matters more than continuity.

Limits and Quotas

StackSets have specific limits that matter at scale:

Quota Default Value Adjustable
Stack sets per administrator account 100 Yes
Stack instances per stack set 100,000 Yes
Concurrent operations per Region per administrator 10,000 Yes
Stack sets per delegated administrator 100 Yes
Maximum MaxConcurrentCount Varies by org size No
Concurrent StackSet operations 1 per StackSet No
Drift detection operations 1 per StackSet at a time No

The single concurrent operation limit per StackSet is the most operationally constraining. If a StackSet update is in progress and you try to run drift detection (or another update), the second operation fails. Sequence your operations carefully, especially in CI/CD pipelines where multiple Terraform applies might target the same StackSet.

Common Failure Modes

After deploying this pattern across multiple organizations, these are the failures I have seen most frequently:

Failure Cause Fix
CAPABILITY_NAMED_IAM error Template creates a named IAM resource but the capability is not declared Add capabilities = ["CAPABILITY_NAMED_IAM"] to the StackSet resource
Trust policy rejection Trusted account ID is wrong or the external ID does not match Verify parameters passed to the CloudFormation template
StackSetNotFoundException Targeting an OU that does not exist or that the StackSet does not cover Verify OU IDs with aws organizations list-organizational-units-for-parent
Perpetual Terraform diff on administration_role_arn Service-managed StackSets auto-populate this field; Terraform tries to reset it Add lifecycle { ignore_changes = [administration_role_arn] }
Deployment skips management account By design: service-managed StackSets never deploy to the management account Create the role separately in the management account with aws_iam_role
New account does not get the role Account was added to an OU not targeted by the StackSet, or auto-deployment is disabled Verify auto_deployment.enabled = true and the account's OU is in deployment_targets
Drift detection times out Large number of accounts with many resources per stack Increase detection timeout; run during off-peak hours

The lifecycle.ignore_changes on administration_role_arn deserves emphasis. This is a known issue in the Terraform AWS provider. Service-managed StackSets set this field automatically. Without the lifecycle block, every terraform plan shows a diff, and every terraform apply triggers an unnecessary update. I have seen this cause CI/CD pipelines to run StackSet updates on every commit.

Putting It All Together

Here is the complete Terraform module structure:

modules/stackset-iam-role/
  main.tf              # StackSet and instance resources
  variables.tf         # Input variables
  templates/
    cross-account-role.yaml  # CloudFormation template

The variables file:

variable "role_name" {
  type        = string
  default     = "OrganizationSecurityAuditRole"
  description = "Name of the IAM role to create in each member account"
}

variable "external_id" {
  type        = string
  description = "External ID for assume-role condition"
  sensitive   = true
}

variable "target_regions" {
  type        = list(string)
  default     = ["us-east-1"]
  description = "Regions to deploy stack instances"
}

variable "target_ou_ids" {
  type        = list(string)
  default     = []
  description = "OU IDs to target. Empty targets the organization root."
}

variable "managed_policy_arns" {
  type        = list(string)
  default     = [
    "arn:aws:iam::aws:policy/SecurityAudit",
    "arn:aws:iam::aws:policy/ReadOnlyAccess"
  ]
  description = "Managed policy ARNs to attach to the role"
}

This module deploys to six organizations I manage. The CloudFormation template has not changed in over a year. New accounts get the role within five minutes of creation. Zero tickets. Zero manual steps.

Cross-reference this with the AWS IAM: An Architecture Deep-Dive for deeper coverage of IAM policy evaluation, and the Infrastructure as Code: CloudFormation, CDK, Terraform, and Pulumi Compared for a broader comparison of IaC tools.

Key Patterns

  1. Use service-managed permissions. Self-managed requires pre-creating roles in every target account. Service-managed handles it through Organizations integration. Always service-managed for organizational deployments.
  2. Enable auto-deployment. The whole point of StackSets is removing manual steps. Auto-deploy ensures new accounts get the role without any human intervention.
  3. Retain stacks on account removal. For security roles, losing access when an account moves between OUs is the wrong behavior. Retain the stacks.
  4. Use the lifecycle.ignore_changes block. The administration_role_arn diff is a known provider issue. Prevent unnecessary StackSet updates.
  5. Always include an external ID. Cross-account role assumption without an external ID is a confused deputy vulnerability. It costs nothing to include.
  6. Move to delegated administration. Keep the management account locked down. Run StackSet operations from a delegated admin account.
  7. Run drift detection on a schedule. Manual IAM changes in member accounts are a security risk. Catch them early.
  8. Set failure tolerance to zero for updates. IAM changes should be all-or-nothing. A partial rollout where half your accounts have different permissions is worse than a failed deployment.

Additional Resources

Let's Build Something!

I help teams ship cloud infrastructure that actually works at scale. Whether you're modernizing a legacy platform, designing a multi-region architecture from scratch, or figuring out how AI fits into your engineering workflow, I've seen your problem before. Let me help.

Currently taking on select consulting engagements through Vantalect.