Cloud Role Segregation

5 min readDec 26, 2021

Role-Based Access Control (RBAC) in Azure, AWS, GCP, and other clouds provides fine-grained access management (authorization) to cloud features and resources. A role defines the set of permissions and sometimes the scope that a user principal is entitled to use. The permissions are the list of actions or operations that can be performed, such as read, write, and delete.

Principle of Least Privilege

The principle of “least privilege” needs to be applied using a risk-based approach to address a business or security risk where the absence of least privilege would result in a significant risk to the company. That is, least privilege should NOT be treated as a law. Rather, it must be balanced with manageability and the additional complexity introduced.

I have seen organizations define thousands of roles in an attempt to achieve least privilege nirvana. It is an epic failure because there were so many roles, that keeping them up to date and properly assigned was impossible.

Custom Roles

In order to manage least privilege, I typically define 10 to 15 cloud custom roles (or ‘policies’ in AWS) which align with the structure and needs of the organization. You might wonder why so few, but I find that these custom roles can be augmented with the cloud provider’s built-in roles / policies, which are often narrowly permissioned.

But this also means that sometimes users may be granted more cloud privileges than they necessarily need for their job. But generally those permissions are not far outside the scope of their business role anyway. A modicum of trust is required of course, but employees need to be reasonably trusted not to mis-use privileges anyway. Activity auditing is your friend here under “trust but verify”.

The bespoke roles I have typically defined (usually to align with business functional responsibilities as well as to compartmentalize permissions into meaningful groupings) include:

Cloud Admin — Full administrative access, excluding data access.
Reader — Provides read-only access to virtually all cloud configuration and many operating metrics.
Network Admin — Provides administrative access to all networking capabilities and selected other administrative capabilities (network administration is often a separate responsibility in most IT organizations).
Firewall Operator — Provides administrative capability to all security groups, cloud network ACLs, and PaaS firewalls so that firewall rules may be managed. This is typically a subset of the Network Admin capabilities.
Security Admin — Provides administrative access to security functions for use by Information Security team.
Security Operator — Provides limited security operator access for use by the Information Security operations team.
Service Admin — Provides limited administrative access to resources, allowing for starting and stopping of VMs, limited reconfiguration capability, tag management, and more. Useful for operations teams.
Data Access Admin — Provides data access capabilities. It is crucial that access to sensitive data be in a segregated role that must be specifically assumed when direct access to (sensitive) data is needed.
Troubleshooter — Provides full read access to configuration data, service metrics, and most logs. Also enables some very limited operational actions such as VM start/stop, volume snapshotting, and diagnostics configuring. Typically, this is the follow-on role for developers in staging and production environments.
Full Stack Developer — Allows administration of most resources (but not all) including network-related services such as load balancers, firewalls, subnets, etc. Generally excluded are VPC/VNET creation, VPNs, Express Networking, CDNs, and route table management since these are controlled by the networking team. This role is only available in development environments.
Data Scientist — Allows creation and running of queries on Hadoop, MapReduce, and data lakes for the purposes of big data analytics.
Auditor — Provides read-only access to audit and activity logs.

The above are useful for roles assigned to people. Where automation service principals are concerned, I take a stricter approach in the permissions granularity in order to limit any potential abuse that could occur, should the service principal be compromised.

Challenges

Maintaining custom role definitions takes work. The custom role /policy definitions often do not explicitly list every granted (or denined) permission. Rather, wildcards (*) are often used in the definitions in order to convey intent without the tediousness of ongoing maintenance.

AWS, for example, uses ‘List’ as a prefix in many of its permissions. I generally grant ‘List*’ as the permission rather than listing 10 to 30 individual permissions. For one, AWS policy sizes have a maximum size limit. Second, I don’t want to have to update the definition each time a new List permissions emerges. Its a risk tradeoff I make when defining each role. This does mean, however, that custom role definitions must be reviewed on a regular basis. Or preferably, automation is used to flag when a new permission emerges.

The details of custom AWS and Azure roles is outside the scope of this blog.

Listing All Possible Permissions

Another challenge is knowing ALL of the possible permissions. Azure provides a page of “Resource Provider Operations” (here) but Amazon does not have any such equivalent. AWS has over 300 different providers, so crawling the documentation to construct custom role definitions is somewhat tedious.

Personally, I use a spreadsheet when constructing custom roles so that I can clearly and quickly see the implications of any wildcards I may choose to use.

To help with AWS permissions, you can see the list of services and permissions along with a PowerShell script (here) which can be used to retrieve the current set of AWS permissions (on a best effort basis). I have a similar script for Azure (here).

In the near future, I plan to automate the above scripts to detect additions and deletions to permissions so that role administrators can keep abreast of changes.

Attribute-Based Access Control (ABAC)

Emerging is ABAC, a policy-based model that determines permissions based on attributes instead of roles. In general, there are user attributes, resource attributes, and environmental attributes . Administrators develop a security policy, generally with eXtensible Access Control Markup Language (XACML), that determines permissions by taking into account all related attributes.

Personally, I don’t believe the ABAC will supercede RBAC, primarily because ABAC is close to mapping attributes to specific permissions. Rather, roles will be defined using attributes, which in turn, map to specific permission(s).

A good read on RBAC vs. ABAC may be found here.