AWS IAM: An Architecture Deep-Dive

Every AWS architecture decision I make runs through IAM eventually. Network topology, compute strategy, data pipeline design: none of it matters if the permissions are wrong. And wrong is the default. IAM starts from a position of deny-everything, and the gap between "nothing works" and "everything works but is wide open" is exactly where most teams live. I have spent more time debugging IAM policy evaluation failures than any other category of AWS issue. Not because IAM is broken. Because it is precise, and precision punishes sloppy thinking. This is the reference for how IAM actually evaluates permissions, how the policy layers interact, and where production deployments routinely break. If you already know what an IAM role is, keep reading. If you do not, start with the AWS fundamentals documentation and come back.

The IAM Policy Evaluation Algorithm

Every API call to AWS passes through the same evaluation engine. The algorithm is deterministic, follows a fixed order, and produces one of two outcomes: Allow or Deny. Understanding this sequence is the single most useful thing you can know about IAM.

The Evaluation Sequence

The enforcement code processes policies in a specific order. An explicit Deny at any step terminates evaluation immediately.

IAM policy evaluation sequence for same-account requests

Step	Policy Type	Effect	Scope
1	Explicit Deny (any policy)	Hard stop. Request denied.	All policies checked
2	Service Control Policy (SCP)	Must allow. Absence of allow = deny.	Organization-wide
3	Resource Control Policy (RCP)	Must allow. Absence of allow = deny.	Organization-wide
4	Resource-based policy	Can grant access directly (same account)	Per-resource
5	Identity-based policy	Must allow for the principal	Per-identity
6	Permissions boundary	Must allow. Caps identity-based grants.	Per-identity
7	Session policy	Must allow. Caps session permissions.	Per-session

The Default Deny

Every request starts as denied. This is the implicit deny. It requires zero configuration. If no policy explicitly allows an action, the request fails. This is the correct default. The alternative (default allow) would make every new AWS account a security incident waiting to happen.

Explicit Deny Wins Everything

An explicit Deny statement in any policy type overrides every Allow in every other policy type. If an SCP allows s3:* and an identity-based policy allows s3:DeleteBucket, but a resource-based policy on the bucket has an explicit Deny for s3:DeleteBucket, the request is denied. No exceptions. No precedence tricks. This property is what makes IAM's security model composable: you can layer restrictions without worrying about some other policy overriding your guardrail.

The Resource-Based Policy Shortcut

Resource-based policies (S3 bucket policies, KMS key policies, SQS queue policies) have a special property in same-account scenarios: they can grant access directly to a principal without requiring a matching identity-based policy. If a bucket policy says "Principal": {"AWS": "arn:aws:iam::123456789012:role/MyRole"}, "Effect": "Allow", "Action": "s3:GetObject", the role can access that bucket even if its identity-based policy says nothing about S3. This shortcut does not apply cross-account. Cross-account access requires both the resource-based policy and the identity-based policy (or role trust policy) to allow the action.

Policy Types and How They Layer

AWS has six distinct policy types. They serve different purposes, attach at different levels, and interact through intersection and union operations.

Policy Type	Attached To	Grants Permissions?	Sets Maximum?	Managed By
Identity-based	Users, groups, roles	Yes	No	Account admin
Resource-based	S3 buckets, KMS keys, SQS queues, etc.	Yes	No	Resource owner
Permissions boundary	Users, roles	No (ceiling only)	Yes	Account admin
Service Control Policy (SCP)	Organization OUs, accounts	No (ceiling only)	Yes	Org admin
Resource Control Policy (RCP)	Organization OUs, accounts	No (ceiling only)	Yes	Org admin
Session policy	Temporary credentials	No (ceiling only)	Yes	Caller of AssumeRole/GetFederationToken

Identity-Based Policies

The workhorses. These are the policies you attach to IAM users, groups, and roles. They come in two forms:

Managed policies are standalone JSON documents with their own ARN. AWS provides over 1,000 AWS-managed policies. You can also create customer-managed policies. A single managed policy can be attached to multiple identities.

Inline policies are embedded directly in a single user, group, or role. They have no ARN and cannot be reused. Use inline policies only when you need a strict 1:1 relationship between a policy and an identity, typically for exceptions that should not be accidentally attached to another identity.

Permissions Boundaries

A permissions boundary is an advanced feature that sets the maximum permissions an identity-based policy can grant. The effective permissions are the intersection of the identity-based policy and the permissions boundary. If the identity-based policy allows s3:* but the permissions boundary only allows s3:GetObject and s3:PutObject, the effective permissions are s3:GetObject and s3:PutObject.

The primary use case: delegating IAM administration. You want developers to create their own IAM roles for Lambda functions, but you do not want them creating roles with AdministratorAccess. Attach a permissions boundary to the roles they create. The boundary caps what those roles can do regardless of what identity-based policies the developers attach.

Service Control Policies (SCPs)

SCPs apply to entire AWS accounts or organizational units (OUs) within AWS Organizations. They do not grant permissions. They define the maximum permissions available to principals in the account. Even the root user of a member account is constrained by SCPs (the management account is exempt).

Common SCP patterns:

Pattern	SCP Statement	Purpose
Region restriction	Deny all actions where `aws:RequestedRegion` is not in allowed list	Prevent resource creation in unapproved regions
Service restriction	Deny specific service actions across the OU	Block services that are not approved for use
Protect security tooling	Deny `iam:Delete*`, `cloudtrail:StopLogging`, `config:StopConfigurationRecorder`	Prevent disabling of audit/compliance infrastructure
Require encryption	Deny `s3:PutObject` without `s3:x-amz-server-side-encryption`	Enforce encryption at rest

Session Policies

When you call AssumeRole, AssumeRoleWithSAML, AssumeRoleWithWebIdentity, or GetFederationToken, you can pass an optional session policy. This policy further restricts the temporary credentials beyond what the role's identity-based policies allow. Session policies cannot grant more permissions than the role already has. They are a ceiling on a ceiling.

I use session policies for tenant isolation in multi-tenant systems. A shared IAM role serves all tenants, but each AssumeRole call passes a session policy scoped to that tenant's S3 prefix or DynamoDB partition key. The role itself has broad permissions; the session narrows them to exactly what the tenant should access.

Cross-Account Access Architecture

Multi-account AWS architectures depend on cross-account role assumption. The pattern involves a trust relationship between two accounts and a set of permissions granted to the assumed role.

Cross-account role assumption flow

Trust Policies

Every IAM role has a trust policy (also called an assume role policy) that specifies which principals can assume it. For cross-account access, the trust policy must explicitly name the source account or specific principals.

{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Principal": {"AWS": "arn:aws:iam::111111111111:root"},
    "Action": "sts:AssumeRole",
    "Condition": {
      "StringEquals": {"sts:ExternalId": "unique-external-id-here"}
    }
  }]
}

The External ID Problem

When a third party assumes a role in your account, the confused deputy problem arises. The third party's service (Account C) assumes a role in your account (Account B) on behalf of multiple customers. Without an external ID, a malicious customer could provide their own Account B role ARN to the third party, tricking Account C into accessing Account B's resources on the attacker's behalf.

The sts:ExternalId condition key solves this. The external ID is a shared secret between you and the third party. The trust policy requires it, and the third party includes it in the AssumeRole call. An attacker who does not know the external ID cannot exploit the trust relationship.

Cross-Account Evaluation Differences

Cross-account access has a critical difference from same-account access: both accounts must allow the action. The identity-based policy (or trust policy) in the source account must allow sts:AssumeRole. The trust policy in the target account must allow the source principal. And the role's permissions in the target account must allow the requested action. A resource-based policy shortcut that works same-account does not apply cross-account for most services. Both sides must explicitly permit the access.

ABAC vs. RBAC

AWS supports two authorization models. Most production environments use both.

Role-Based Access Control (RBAC)

RBAC assigns permissions based on job function. You create IAM roles with names like DataEngineerRole, ApplicationDeployRole, ReadOnlyAuditorRole and attach policies that grant the permissions each function needs. When a new service or resource appears, you update the relevant role policies.

Attribute-Based Access Control (ABAC)

ABAC assigns permissions based on attributes (tags) attached to the principal and the resource. Instead of listing specific resource ARNs in policies, you write condition expressions that compare tags.

{
  "Effect": "Allow",
  "Action": ["ec2:StartInstances", "ec2:StopInstances"],
  "Resource": "*",
  "Condition": {
    "StringEquals": {
      "aws:ResourceTag/Project": "${aws:PrincipalTag/Project}"
    }
  }
}

This policy allows starting and stopping EC2 instances only when the instance's Project tag matches the principal's Project tag. No resource ARNs. No account-specific references. The same policy works across every account in the organization without modification.

Criterion	RBAC	ABAC
Policy updates for new resources	Required	Not required (tags handle it)
Number of policies	Grows with resources/functions	Stays small
Works across accounts	Requires per-account ARNs	Works with tags (account-agnostic)
Complexity	Simple to understand	Requires disciplined tagging
Auditability	Clear role-to-permission mapping	Requires tag inventory
Service support	Universal	Varies by service (check condition key support)

When to Use ABAC

ABAC pays off at scale. If you have 50 AWS accounts and 200 microservices, maintaining RBAC policies with specific ARNs for each resource in each account is a full-time job. ABAC policies that match on Project, Environment, and Team tags work identically everywhere. The tradeoff: ABAC depends entirely on consistent tagging. One untagged resource or mistagged principal and the access silently fails (or silently succeeds, which is worse).

IAM Access Analyzer and Least Privilege

IAM Access Analyzer provides two categories of findings that directly support least-privilege architecture.

External Access Findings

Access Analyzer identifies resources shared with principals outside your account or organization. If an S3 bucket policy grants access to an external account, Access Analyzer flags it. Same for KMS keys, IAM roles with external trust policies, Lambda function resource policies, SQS queues, and Secrets Manager secrets.

Unused Access Findings

The more valuable capability. Access Analyzer tracks which permissions each IAM role and user has actually exercised over a configurable lookback period. It identifies:

Finding Type	What It Detects
Unused roles	Roles that have not been assumed in the lookback period
Unused access keys	Access keys with no API activity
Unused passwords	Console passwords with no sign-in activity
Unused services	Services listed in policies but never called
Unused actions	Specific actions allowed by policy but never invoked

Policy Generation

Access Analyzer can generate a least-privilege policy based on CloudTrail activity. Feed it a role and a lookback window, and it produces a policy containing only the actions the role actually used. I treat these generated policies as starting points, not final products. They reflect past behavior, not future requirements. A role that has not yet processed a monthly batch job will not have those permissions in the generated policy. Always review and adjust before applying.

Integration Pattern

The production pattern I use: enable Access Analyzer with unused access findings across the organization, pipe findings to Security Hub, create EventBridge rules that notify the team when new external access or overly broad unused permissions appear, and run quarterly reviews where generated policies replace hand-crafted ones. This creates a continuous tightening cycle rather than a one-time audit.

Quotas and Limits That Bite

IAM has limits that surprise teams when they hit production scale.

Resource	Default Limit	Maximum (after increase)	Notes
Managed policies per role	10	25	Request increase via Service Quotas
Managed policy size	6,144 characters	Not increasable	Whitespace counts
Inline policy aggregate (roles)	10,240 characters	Not increasable	Sum of all inline policies on the role
Inline policy aggregate (users)	2,048 characters	Not increasable	Very tight; avoid inline on users
Roles per account	1,000	5,000	Request increase
Instance profiles per account	1,000	5,000	1:1 with roles in most cases
SAML providers per account	100	Not increasable	Relevant for large SSO deployments
Policy versions	5 per managed policy	Not increasable	Delete old versions before creating new ones
Trust policy size	2,048 characters	4,096 (with increase)	Cross-account patterns eat this fast
Groups per account	300	500	Often forgotten until it blocks

The managed policy size limit (6,144 characters) is the one that bites most often. A policy with fine-grained permissions for a complex application can hit this limit easily. Strategies: split into multiple managed policies (up to 10 or 25 per role), use wildcards strategically on resource ARNs, switch to ABAC to reduce policy size, or use inline policies to access the separate 10,240-character pool.

The trust policy size limit (2,048 characters) catches teams with many cross-account trust relationships. Each trusted principal ARN consumes ~60 characters. At around 30 trusted accounts, you hit the limit. The fix: trust the organization root or an OU, then use condition keys to restrict which accounts can assume the role, rather than listing each account individually.

Common Failure Modes

IAM failures in production follow predictable patterns. I have encountered every one of these at least twice.

Privilege Escalation via PassRole

The iam:PassRole permission is the most dangerous permission in AWS that nobody talks about. It allows a principal to assign an IAM role to an AWS service. If a developer has iam:PassRole with Resource: "*" and can create Lambda functions, they can create a Lambda function with an administrator role attached, then invoke it to perform any action. The Lambda function runs with the passed role's permissions, not the developer's.

Privilege escalation via iam:PassRole

Fix: Always scope iam:PassRole to specific role ARNs. Never use Resource: "*" for PassRole.

SCP Blocking Valid Operations

SCPs apply to all principals in member accounts, including service-linked roles in some cases. A region-restriction SCP that denies all actions outside us-east-1 and us-west-2 will break global services that operate in us-east-1 by default (IAM itself, CloudFront, Route 53, AWS Organizations). Always include global service exceptions in region-restriction SCPs.

{
  "Condition": {
    "StringNotEquals": {
      "aws:RequestedRegion": ["us-east-1", "us-west-2"]
    },
    "ArnNotLike": {
      "aws:PrincipalARN": "arn:aws:iam::*:role/aws-service-role/*"
    }
  }
}

The Wildcard Trap

"Resource": "*" in an identity-based policy is sometimes necessary (for iam:ListUsers, s3:ListAllMyBuckets, and other list/describe operations that do not support resource-level restrictions). The failure mode is using * as a lazy default for actions that do support resource-level restrictions. s3:PutObject with Resource: "*" grants write access to every bucket in the account. I have seen production incidents where a misconfigured Lambda function overwrote data in the wrong bucket because its role had overly broad S3 permissions.

Tag-Condition Failures

ABAC policies fail silently when tags are missing. If a policy condition requires aws:ResourceTag/Environment to equal production and the resource has no Environment tag, the condition evaluates to false. The request is denied with a generic AccessDenied error. CloudTrail does not tell you which condition failed. Debugging requires reading the policy, checking every tag on the resource and the principal, and working through the evaluation logic manually. Mandate tags via SCPs or AWS Config rules to prevent this class of failure.

Session Token Expiration

Temporary credentials from AssumeRole default to one hour. Long-running batch jobs, data pipelines, or CI/CD processes that exceed this window fail mid-execution. The maximum session duration is configurable per role (up to 12 hours, or 36 hours for some session types). Set it explicitly rather than relying on the default. For processes longer than 12 hours, implement credential refresh logic using the STS client.

Key Patterns

Start with deny-everything SCPs, then allow. The allow-list SCP model (deny everything except explicitly permitted services) is more secure than the deny-list model (allow everything except explicitly denied services). It requires more upfront work but prevents entire categories of misuse.

Scope PassRole to specific role ARNs. This is the single highest-impact IAM hardening step you can take. Never grant iam:PassRole with Resource: "*".

Use ABAC for multi-account, RBAC for single-account. ABAC's value scales with the number of accounts and resources. For a single-account workload, RBAC is simpler and more auditable.

Trust organization roots, not individual accounts. For cross-account trust policies, trusting arn:aws:iam::*:root with an aws:PrincipalOrgID condition is more maintainable than listing 50 account IDs. See AWS Cognito User Authentication: An Architecture Deep-Dive for how Cognito integrates with these IAM patterns.

Run Access Analyzer continuously. Enable unused access findings. Pipe to Security Hub. Review quarterly. Replace hand-written policies with generated ones after verification.

Set explicit session durations. Do not rely on the one-hour default. Match the session duration to the workload's actual runtime. Implement refresh logic for anything that runs longer.

Test IAM changes in a sandbox account first. IAM policy changes take effect immediately. There is no rollback. A bad SCP deployed to the wrong OU can lock out every account in the organization. I have seen it happen. The recovery involves the management account root user and a very bad afternoon.