Skip to main content

AWS IAM: An Architecture Deep-Dive

About the author: I'm Charles Sieg, a cloud architect and platform engineer who builds apps, services, and infrastructure for Fortune 1000 clients through Vantalect. If your organization is rethinking its software strategy in the age of AI-assisted engineering, let's talk.

Every AWS architecture decision I make runs through IAM eventually. Network topology, compute strategy, data pipeline design: none of it matters if the permissions are wrong. And wrong is the default. IAM starts from a position of deny-everything, and the gap between "nothing works" and "everything works but is wide open" is exactly where most teams live. I have spent more time debugging IAM policy evaluation failures than any other category of AWS issue. Not because IAM is broken. Because it is precise, and precision punishes sloppy thinking. This is the reference for how IAM actually evaluates permissions, how the policy layers interact, and where production deployments routinely break. If you already know what an IAM role is, keep reading. If you do not, start with the AWS fundamentals documentation and come back.

A vast fortress with multiple concentric layers of walls and checkpoints, each made of different materials, with a figure presenting credentials at the outermost gate
A vast fortress with multiple concentric layers of walls and checkpoints, each made of different materials, with a figure presenting credentials at the outermost gate

The IAM Policy Evaluation Algorithm

Every API call to AWS passes through the same evaluation engine. The algorithm is deterministic, follows a fixed order, and produces one of two outcomes: Allow or Deny. Understanding this sequence is the single most useful thing you can know about IAM.

The Evaluation Sequence

The enforcement code processes policies in a specific order. An explicit Deny at any step terminates evaluation immediately.

YesNoNoYesNoYesYes, to IAM entityNoNoYesNoYesNoYesAPI RequestExplicit Denyin any policy?DENYOrganizationsSCP allows?Resource ControlPolicy allows?Resource-basedpolicy grantsaccess?ALLOWIdentity-basedpolicy allows?Permissionsboundary allows?Session policyallows?
IAM policy evaluation sequence for same-account requests
StepPolicy TypeEffectScope
1Explicit Deny (any policy)Hard stop. Request denied.All policies checked
2Service Control Policy (SCP)Must allow. Absence of allow = deny.Organization-wide
3Resource Control Policy (RCP)Must allow. Absence of allow = deny.Organization-wide
4Resource-based policyCan grant access directly (same account)Per-resource
5Identity-based policyMust allow for the principalPer-identity
6Permissions boundaryMust allow. Caps identity-based grants.Per-identity
7Session policyMust allow. Caps session permissions.Per-session

The Default Deny

Every request starts as denied. This is the implicit deny. It requires zero configuration. If no policy explicitly allows an action, the request fails. This is the correct default. The alternative (default allow) would make every new AWS account a security incident waiting to happen.

Explicit Deny Wins Everything

An explicit Deny statement in any policy type overrides every Allow in every other policy type. If an SCP allows s3:* and an identity-based policy allows s3:DeleteBucket, but a resource-based policy on the bucket has an explicit Deny for s3:DeleteBucket, the request is denied. No exceptions. No precedence tricks. This property is what makes IAM's security model composable: you can layer restrictions without worrying about some other policy overriding your guardrail.

The Resource-Based Policy Shortcut

Resource-based policies (S3 bucket policies, KMS key policies, SQS queue policies) have a special property in same-account scenarios: they can grant access directly to a principal without requiring a matching identity-based policy. If a bucket policy says "Principal": {"AWS": "arn:aws:iam::123456789012:role/MyRole"}, "Effect": "Allow", "Action": "s3:GetObject", the role can access that bucket even if its identity-based policy says nothing about S3. This shortcut does not apply cross-account. Cross-account access requires both the resource-based policy and the identity-based policy (or role trust policy) to allow the action.

Policy Types and How They Layer

AWS has six distinct policy types. They serve different purposes, attach at different levels, and interact through intersection and union operations.

Policy TypeAttached ToGrants Permissions?Sets Maximum?Managed By
Identity-basedUsers, groups, rolesYesNoAccount admin
Resource-basedS3 buckets, KMS keys, SQS queues, etc.YesNoResource owner
Permissions boundaryUsers, rolesNo (ceiling only)YesAccount admin
Service Control Policy (SCP)Organization OUs, accountsNo (ceiling only)YesOrg admin
Resource Control Policy (RCP)Organization OUs, accountsNo (ceiling only)YesOrg admin
Session policyTemporary credentialsNo (ceiling only)YesCaller of AssumeRole/GetFederationToken

Identity-Based Policies

The workhorses. These are the policies you attach to IAM users, groups, and roles. They come in two forms:

Managed policies are standalone JSON documents with their own ARN. AWS provides over 1,000 AWS-managed policies. You can also create customer-managed policies. A single managed policy can be attached to multiple identities.

Inline policies are embedded directly in a single user, group, or role. They have no ARN and cannot be reused. Use inline policies only when you need a strict 1:1 relationship between a policy and an identity, typically for exceptions that should not be accidentally attached to another identity.

Permissions Boundaries

A permissions boundary is an advanced feature that sets the maximum permissions an identity-based policy can grant. The effective permissions are the intersection of the identity-based policy and the permissions boundary. If the identity-based policy allows s3:* but the permissions boundary only allows s3:GetObject and s3:PutObject, the effective permissions are s3:GetObject and s3:PutObject.

The primary use case: delegating IAM administration. You want developers to create their own IAM roles for Lambda functions, but you do not want them creating roles with AdministratorAccess. Attach a permissions boundary to the roles they create. The boundary caps what those roles can do regardless of what identity-based policies the developers attach.

Service Control Policies (SCPs)

SCPs apply to entire AWS accounts or organizational units (OUs) within AWS Organizations. They do not grant permissions. They define the maximum permissions available to principals in the account. Even the root user of a member account is constrained by SCPs (the management account is exempt).

Common SCP patterns:

PatternSCP StatementPurpose
Region restrictionDeny all actions where aws:RequestedRegion is not in allowed listPrevent resource creation in unapproved regions
Service restrictionDeny specific service actions across the OUBlock services that are not approved for use
Protect security toolingDeny iam:Delete*, cloudtrail:StopLogging, config:StopConfigurationRecorderPrevent disabling of audit/compliance infrastructure
Require encryptionDeny s3:PutObject without s3:x-amz-server-side-encryptionEnforce encryption at rest

Session Policies

When you call AssumeRole, AssumeRoleWithSAML, AssumeRoleWithWebIdentity, or GetFederationToken, you can pass an optional session policy. This policy further restricts the temporary credentials beyond what the role's identity-based policies allow. Session policies cannot grant more permissions than the role already has. They are a ceiling on a ceiling.

I use session policies for tenant isolation in multi-tenant systems. A shared IAM role serves all tenants, but each AssumeRole call passes a session policy scoped to that tenant's S3 prefix or DynamoDB partition key. The role itself has broad permissions; the session narrows them to exactly what the tenant should access.

Cross-Account Access Architecture

Multi-account AWS architectures depend on cross-account role assumption. The pattern involves a trust relationship between two accounts and a set of permissions granted to the assumed role.

AssumeRole(role ARN in Account B)Check trust policyTrust policy allows Account ACheck identity-based policysts:AssumeRole allowedCheck SCP (Account A)Check SCP (Account B)Temporary credentials(access key + secret + session token)API call with temporary credentialsEvaluate role permissions+ resource policy + SCPs (Account B)ResponsePrincipal(Account A)AWS STSRole(Account B)Resource(Account B)
Cross-account role assumption flow

Trust Policies

Every IAM role has a trust policy (also called an assume role policy) that specifies which principals can assume it. For cross-account access, the trust policy must explicitly name the source account or specific principals.

{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Principal": {"AWS": "arn:aws:iam::111111111111:root"},
    "Action": "sts:AssumeRole",
    "Condition": {
      "StringEquals": {"sts:ExternalId": "unique-external-id-here"}
    }
  }]
}

The External ID Problem

When a third party assumes a role in your account, the confused deputy problem arises. The third party's service (Account C) assumes a role in your account (Account B) on behalf of multiple customers. Without an external ID, a malicious customer could provide their own Account B role ARN to the third party, tricking Account C into accessing Account B's resources on the attacker's behalf.

The sts:ExternalId condition key solves this. The external ID is a shared secret between you and the third party. The trust policy requires it, and the third party includes it in the AssumeRole call. An attacker who does not know the external ID cannot exploit the trust relationship.

Cross-Account Evaluation Differences

Cross-account access has a critical difference from same-account access: both accounts must allow the action. The identity-based policy (or trust policy) in the source account must allow sts:AssumeRole. The trust policy in the target account must allow the source principal. And the role's permissions in the target account must allow the requested action. A resource-based policy shortcut that works same-account does not apply cross-account for most services. Both sides must explicitly permit the access.

ABAC vs. RBAC

AWS supports two authorization models. Most production environments use both.

Role-Based Access Control (RBAC)

RBAC assigns permissions based on job function. You create IAM roles with names like DataEngineerRole, ApplicationDeployRole, ReadOnlyAuditorRole and attach policies that grant the permissions each function needs. When a new service or resource appears, you update the relevant role policies.

Attribute-Based Access Control (ABAC)

ABAC assigns permissions based on attributes (tags) attached to the principal and the resource. Instead of listing specific resource ARNs in policies, you write condition expressions that compare tags.

{
  "Effect": "Allow",
  "Action": ["ec2:StartInstances", "ec2:StopInstances"],
  "Resource": "*",
  "Condition": {
    "StringEquals": {
      "aws:ResourceTag/Project": "${aws:PrincipalTag/Project}"
    }
  }
}

This policy allows starting and stopping EC2 instances only when the instance's Project tag matches the principal's Project tag. No resource ARNs. No account-specific references. The same policy works across every account in the organization without modification.

CriterionRBACABAC
Policy updates for new resourcesRequiredNot required (tags handle it)
Number of policiesGrows with resources/functionsStays small
Works across accountsRequires per-account ARNsWorks with tags (account-agnostic)
ComplexitySimple to understandRequires disciplined tagging
AuditabilityClear role-to-permission mappingRequires tag inventory
Service supportUniversalVaries by service (check condition key support)

When to Use ABAC

ABAC pays off at scale. If you have 50 AWS accounts and 200 microservices, maintaining RBAC policies with specific ARNs for each resource in each account is a full-time job. ABAC policies that match on Project, Environment, and Team tags work identically everywhere. The tradeoff: ABAC depends entirely on consistent tagging. One untagged resource or mistagged principal and the access silently fails (or silently succeeds, which is worse).

IAM Access Analyzer and Least Privilege

IAM Access Analyzer provides two categories of findings that directly support least-privilege architecture.

External Access Findings

Access Analyzer identifies resources shared with principals outside your account or organization. If an S3 bucket policy grants access to an external account, Access Analyzer flags it. Same for KMS keys, IAM roles with external trust policies, Lambda function resource policies, SQS queues, and Secrets Manager secrets.

Unused Access Findings

The more valuable capability. Access Analyzer tracks which permissions each IAM role and user has actually exercised over a configurable lookback period. It identifies:

Finding TypeWhat It Detects
Unused rolesRoles that have not been assumed in the lookback period
Unused access keysAccess keys with no API activity
Unused passwordsConsole passwords with no sign-in activity
Unused servicesServices listed in policies but never called
Unused actionsSpecific actions allowed by policy but never invoked

Policy Generation

Access Analyzer can generate a least-privilege policy based on CloudTrail activity. Feed it a role and a lookback window, and it produces a policy containing only the actions the role actually used. I treat these generated policies as starting points, not final products. They reflect past behavior, not future requirements. A role that has not yet processed a monthly batch job will not have those permissions in the generated policy. Always review and adjust before applying.

Integration Pattern

The production pattern I use: enable Access Analyzer with unused access findings across the organization, pipe findings to Security Hub, create EventBridge rules that notify the team when new external access or overly broad unused permissions appear, and run quarterly reviews where generated policies replace hand-crafted ones. This creates a continuous tightening cycle rather than a one-time audit.

Quotas and Limits That Bite

IAM has limits that surprise teams when they hit production scale.

ResourceDefault LimitMaximum (after increase)Notes
Managed policies per role1025Request increase via Service Quotas
Managed policy size6,144 charactersNot increasableWhitespace counts
Inline policy aggregate (roles)10,240 charactersNot increasableSum of all inline policies on the role
Inline policy aggregate (users)2,048 charactersNot increasableVery tight; avoid inline on users
Roles per account1,0005,000Request increase
Instance profiles per account1,0005,0001:1 with roles in most cases
SAML providers per account100Not increasableRelevant for large SSO deployments
Policy versions5 per managed policyNot increasableDelete old versions before creating new ones
Trust policy size2,048 characters4,096 (with increase)Cross-account patterns eat this fast
Groups per account300500Often forgotten until it blocks

The managed policy size limit (6,144 characters) is the one that bites most often. A policy with fine-grained permissions for a complex application can hit this limit easily. Strategies: split into multiple managed policies (up to 10 or 25 per role), use wildcards strategically on resource ARNs, switch to ABAC to reduce policy size, or use inline policies to access the separate 10,240-character pool.

The trust policy size limit (2,048 characters) catches teams with many cross-account trust relationships. Each trusted principal ARN consumes ~60 characters. At around 30 trusted accounts, you hit the limit. The fix: trust the organization root or an OU, then use condition keys to restrict which accounts can assume the role, rather than listing each account individually.

Common Failure Modes

IAM failures in production follow predictable patterns. I have encountered every one of these at least twice.

Privilege Escalation via PassRole

The iam:PassRole permission is the most dangerous permission in AWS that nobody talks about. It allows a principal to assign an IAM role to an AWS service. If a developer has iam:PassRole with Resource: "*" and can create Lambda functions, they can create a Lambda function with an administrator role attached, then invoke it to perform any action. The Lambda function runs with the passed role's permissions, not the developer's.

1. iam:PassRole *2. lambda:CreateFunctionwith AdminRole3. Has full adminpermissionsDeveloperLimited permissionsAdminRoleLambda FunctionRunning as AdminRoleAny AWS Service
Privilege escalation via iam:PassRole

Fix: Always scope iam:PassRole to specific role ARNs. Never use Resource: "*" for PassRole.

SCP Blocking Valid Operations

SCPs apply to all principals in member accounts, including service-linked roles in some cases. A region-restriction SCP that denies all actions outside us-east-1 and us-west-2 will break global services that operate in us-east-1 by default (IAM itself, CloudFront, Route 53, AWS Organizations). Always include global service exceptions in region-restriction SCPs.

{
  "Condition": {
    "StringNotEquals": {
      "aws:RequestedRegion": ["us-east-1", "us-west-2"]
    },
    "ArnNotLike": {
      "aws:PrincipalARN": "arn:aws:iam::*:role/aws-service-role/*"
    }
  }
}

The Wildcard Trap

"Resource": "*" in an identity-based policy is sometimes necessary (for iam:ListUsers, s3:ListAllMyBuckets, and other list/describe operations that do not support resource-level restrictions). The failure mode is using * as a lazy default for actions that do support resource-level restrictions. s3:PutObject with Resource: "*" grants write access to every bucket in the account. I have seen production incidents where a misconfigured Lambda function overwrote data in the wrong bucket because its role had overly broad S3 permissions.

Tag-Condition Failures

ABAC policies fail silently when tags are missing. If a policy condition requires aws:ResourceTag/Environment to equal production and the resource has no Environment tag, the condition evaluates to false. The request is denied with a generic AccessDenied error. CloudTrail does not tell you which condition failed. Debugging requires reading the policy, checking every tag on the resource and the principal, and working through the evaluation logic manually. Mandate tags via SCPs or AWS Config rules to prevent this class of failure.

Session Token Expiration

Temporary credentials from AssumeRole default to one hour. Long-running batch jobs, data pipelines, or CI/CD processes that exceed this window fail mid-execution. The maximum session duration is configurable per role (up to 12 hours, or 36 hours for some session types). Set it explicitly rather than relying on the default. For processes longer than 12 hours, implement credential refresh logic using the STS client.

Key Patterns

Start with deny-everything SCPs, then allow. The allow-list SCP model (deny everything except explicitly permitted services) is more secure than the deny-list model (allow everything except explicitly denied services). It requires more upfront work but prevents entire categories of misuse.

Scope PassRole to specific role ARNs. This is the single highest-impact IAM hardening step you can take. Never grant iam:PassRole with Resource: "*".

Use ABAC for multi-account, RBAC for single-account. ABAC's value scales with the number of accounts and resources. For a single-account workload, RBAC is simpler and more auditable.

Trust organization roots, not individual accounts. For cross-account trust policies, trusting arn:aws:iam::*:root with an aws:PrincipalOrgID condition is more maintainable than listing 50 account IDs. See AWS Cognito User Authentication: An Architecture Deep-Dive for how Cognito integrates with these IAM patterns.

Run Access Analyzer continuously. Enable unused access findings. Pipe to Security Hub. Review quarterly. Replace hand-written policies with generated ones after verification.

Set explicit session durations. Do not rely on the one-hour default. Match the session duration to the workload's actual runtime. Implement refresh logic for anything that runs longer.

Test IAM changes in a sandbox account first. IAM policy changes take effect immediately. There is no rollback. A bad SCP deployed to the wrong OU can lock out every account in the organization. I have seen it happen. The recovery involves the management account root user and a very bad afternoon.

Additional Resources

Let's Build Something!

I help teams ship cloud infrastructure that actually works at scale. Whether you're modernizing a legacy platform, designing a multi-region architecture from scratch, or figuring out how AI fits into your engineering workflow, I've seen your problem before. Let me help.

Currently taking on select consulting engagements through Vantalect.