Efficiently Computing Permissions at Scale—Our Engineering Approach

Eugène Nélou

Eugène is a staff engineer at Gitguardian.
He likes building tools that benefit everyone
and working on improving GitGuardian products' quality.

A few weeks ago, we introduced a new Role-based Access Management (RBAC) feature in the GitGuardian Internal Monitoring platform. This release resulted from several months of hard work where we had to thoroughly review our data model and implement a very resource-efficient permissions calculation mechanism. I thought this was the perfect opportunity to offer a deep dive into the research, problems, and dead-end roads we encountered on this journey.

Disclaimer: I’ll be using Django in my code examples, but the ideas can be generalized; however, a relational database is a stronger requirement.

I. Defining the problem

In a nutshell, the RBAC feature creates the notion of “Teams”, a perimeter where each member can see and act upon a restricted number of incidents. In our domain, an incident is a logical unit corresponding to a unique leaked secret. Since a secret can leak in multiple repos, we call occurrences the various locations of this secret in one or more repositories. A set of repos defines a team, so a user belonging to a team can act on any secret detected once or more on one of these repos.

Since an incident can have two occurrences owned by two different teams, our first conceptual problem was: how to distribute incidents across teams?

💡
Note: Directly attaching repositories to a team is by no mean the only possibility. We could decide, for example, to assign an entire GitHub organization to a team so that repositories created later in this organization are automatically added to the team's perimeter. But this implementation is out of the scope of this article, so we’ll assume we have a direct link between teams and repositories.

But that’s not all. We also needed to allow the possibility of giving access to one particular incident to a user or a team. A user has its own perimeter, which is the union of the perimeter of its teams, and the incidents they have been granted access to individually.

Here is a visualization to help you grasp the relationships between these concepts:

Class diagram for our models

Finally, knowing which incidents are part of the user’s perimeter was just half the story; what we wanted, in the end, was to know what the user could do with them. Here are the three permission levels:

  • READ: user can see the incident
  • WRITE: user can act on the incident—ignore, assign, resolve, etc.
  • ADMIN: user can share the incident with other users and teams, adding it to their perimeter.

These permissions can, again, be inherited from the team the user is a member of, or be directly attributed. For a given incident, the user’s permission is therefore the maximum permission level accorded by these two means. And this maximum permission needs to be computed dynamically (on the fly).

Why we didn’t go for the straightforward solution

One straightforward solution would be to have a table persisting per-user permissions. But this would be very difficult to maintain. Why? Let’s imagine a user is removed from a team. The incidents for which he had inherited some permissions by belonging to the team are no longer in the user’s perimeter. Therefore, all the permissions for the team’s incidents should be recomputed, to check if the user lost access or had their permission reduced on the incidents.

Going with a per-user permissions table would necessarily imply a higher order of magnitude in the number of operations necessary to update all the user permissions.

As we wanted to keep table operations as synchronous as possible, we added the permission fields on three relations to dispatch the workload:

  • the User-Incident relation
  • the Team-Incident relation
  • the User-Team relation

After doing some research, we decided to compute these permissions in SQL. Not relying on per-user permissions also meant we could not rely on the common Django permissions libraries (including django.contrib.auth), which are all object-based.

In the table below, we map the number of rows impacted by a new event (new incident, new repo added to a team’s perimeter, etc..). We can see that the per-object solution scales linearly with the number of users in a team. But we don’t want our team sizes to be limited:

condition affected # user-incident affected # of team-incident
new incident # of teams × # of team’s users # of teams
new repository in the team # of repository incident × # of team’s users # of repository incidents
new user in the team # of team incidents 0
new team incident (direct access) # of team users 1

Although we discarded early on the User-Incident relation as the ultimate source of truth, we had to use per-object permissions for the Team-Incident relation. That choice was motivated by performance reasons: the read operation through the Repository and Occurrence tables was too slow, and we made the assumption that the number of teams would be smaller than the number of users.

II. How our model works

A simple trick: using binary masks

Once we defined the permissions specs, we needed to determine how to store them in our database. I mentioned three levels of permission, but it was evident that in the future we would need to add many more to allow more granularity in the business domain roles. To avoid having many boolean fields and to simplify the logic of checking authorizations, we preferred to store authorizations in their binary representation. Thanks to the use of binary masks, we can store all the permissions in a single Integer field.

💡 How to check permissions stored as a binary mask
Let’s say we have 2 resources A and B, and the permissions READ and WRITE
We’ll store that in two bytes. Let’s assume for simplicity that WRITE implies READ,
Cases for A:

  • 0b0011 is the WRITE: A permission
  • 0b0001 is the READ: A permission

Cases for B:

  • 0b1100 is the WRITE: B permission
  • 0b0100 is the READ: B permission

and obviously:

  • 0b0000 is no permission.

With a bitwise AND, we encode for example 0b0111 as being the WRITE: A and READ: B permission. Conversely, to check a permission, all we have to do is a bitwise AND on the permission mask and the binary value of the field.

So to check if a user has permission WRITE: A, we’ll do 0b0011 & the permission of the user. The result will be equal to the mask only if the user has the permissions:

  • 0b1111 & 0b0011 = 0b0011 → OK
  • 0b0111 & 0b0011 = 0b0011 → OK
  • 0b1101 & 0b0011 = 0b0001 → not OK
  • 0b0000 & 0b0011 = 0b0000 → not OK

To implement this in Django we used the IntegerChoices classes, as well as a simple helper to help check permissions in our Python code.

from django.db import models

class Permission(models.IntegerChoices):
    READ = 0b001
    WRITE = 0b011
    ADMIN = 0b111

    @classmethod
    def is_authorized(
        cls, mask: "Permission", scope: "int | Permission"
    ) -> bool:
        """
        GIVEN a mask and a scope
        Return true if the scope matches the mask
        ex: 0b100 & 0b110 = 0b100 != 0b110
        """
        return bool((scope & mask) == mask)

Django models

Now that we know the relationships between our objects, and where to store the permissions we need, we can implement it with Django models.

Let’s say we use the default Django User model, here are our models:

class TeamUser(models.Model):
    team = ForeignKey("Team", ...)
    user = ForeignKey("User", ...)
    permission = PositiveSmallIntegerField(default=Permission.READ)

class Team(models.Model):
    name = TextField(...)
    users = ManyToManyField("User", through="TeamUser", ...)

class TeamIncident(models.Model):
    team = ForeignKey("Team", ...)
    incident = ForeignKey("Incident", ...)
    permission = PositiveSmallIntegerField(default=Permission.READ)

class UserIncident(models.Model):
    user = ForeignKey("User", ...)
    incident = ForeignKey("Incident", ...)
    permission = PositiveSmallIntegerField(default=Permission.READ)

class Incident(models.Model):
    name = TextField(...)
    teams = ManyToManyField(Team, through="TeamIncident", ...)
    users = ManyToManyField(User, through="UserIncident", ...)

Quite straightforward, let’s move on to the use cases.

III. Our implementation in practice

Filtering the incidents for a user

First, getting all the incidents of a user, or all users having access to an incident is simple, because the existence of the models themselves implies the READ permission, so we don’t have to check permissions. We can do the following:

# list incidents of user
Incident.objects.filter(Q(users=user) | Q(teams__users=user)).distinct()

# list user having access to an incident
User.objects.filter(Q(incidents=incident) | Q(teams__incidents=incident)).distinct()
💡
Distinct is needed because a user can be allowed access to an incident through multiple lines.

The query could be done through subqueries instead. In practice, we leverage the fact that we already have access to the user’s teams to simplify it.

After checking what incidents to display to a user, we want to know which permissions they have on these incidents to know which actions they are allowed to do.

Let’s stay with three permission levels:

  • 0b001 is READ which allows seeing the incident
  • 0b011 is WRITE (implying READ) which allows acting on the incident
  • 0b111 is ADMIN (implying READ + WRITE) which allows granting access to the incident to other users and teams.

And, of course, 0b000 is no permissions at all.

Let’s write the Django query for this, by constructing the user_permission annotation that will contain the aggregated permission of the user on each incident.

A user’s permission within a team is the lowest (computed with the AND binary operation) permission between the permission of the team in the incident and the permission of the user in the team:

F("team__team_incident__permission").bitand(F("team__team_user__permission"))

# and filter the relation by
queryset.filter(team__team_user=user)

And a user’s permission within multiple teams is the highest permission (computed with the OR binary operation) across all teams:

BitOr(
    F("team__team_incident__permission").bitand(F("team__team_user__permission")),
    output_field=PositiveSmallIntegerField(),
)
💡
BitOr is PostgreSQL specific in Django

But the user can also get access to incidents individually, so we’ll use Coalesce(..., 0) that will replace nullish values with 0, our null permission, when the user does not have access through teams, or individually. Otherwise, we couldn’t apply our binary operation (NULL is not a binary value).

user_permission_expression = Coalesce(
    BitOr(
        F("team_incident__permission").bitand(F("team_incident__team__team_user__permission")),
        output_field=PositiveSmallIntegerField(),
    ),
    0,
).bitor(Coalesce(F("user_incident__permission"), 0))

Finally, we filter the queryset for our user:

queryset = Incident.objects.filter(
    Q(user_incident__user=user) | Q(team_incident__team__team_user__user=user)
).annotate(user_permission=user_permission_expression).distinct()

Filtering a queryset by permission

We have everything we need, but it’s not yet practical to fetch all the user’s objects for which they have a certain permission level with our binary logic.

We could craft a custom queryset filter, but let’s make something more reusable: let’s define a custom Lookup to implement the Permission.is_authorized method directly in SQL:

class IsAuthorized(Lookup):
    """
    GIVEN a mask and a scope
    Return true if the scope matches the mask
    ex: 0b100 & 0b110 = 0b100 != 0b110
    """

    lookup_name = "isauthorized"

    def as_sql(self, compiler, connection):
        lhs, lhs_params = self.process_lhs(compiler, connection)
        rhs, rhs_params = self.process_rhs(compiler, connection)
        params = lhs_params + rhs_params + rhs_params

        # The binary operation happens here
        return "%s & %s = %s" % (lhs, rhs, rhs), params

Field.register_lookup(IsAuthorized)

# usage, assuming the of_user queryset method annotates the user_permission
Model.objects.of_user().filter(user_permission__isauthorized=Permission.WRITE)

It’s important to note that although our incident permissions computation works in all cases, we should not forget about shortcuts.

For example, the Manager role enables access to all the incidents, so it doesn’t make sense to compute the permissions for it. Similarly, the “all-incidents team” provides access to all the organization’s incidents, allowing us to eliminate the perimeter computation.

Also, in paginated endpoints, we just have to compute the permissions on the page we want to return!

We’re done!

Implementing the Teams feature was far from straightforward, and I know that we are not the first engineering team to be confronted with this kind of challenge. It required a thoughtful reflection on the data models we use, and on how to implement the feature with the least possible impact both on performance and on the rest of the application. In the end, I think this was a really good exercise and we learned many things that we will be able to apply to other parts of our code.

Time for our next challenge!