← Notes on the descriptor Protocol | Error Starting Gunicorn in Supervisor →

Inside the Django ORM: Aggregates

Hello again, blog. Long time, no see.

I just took my first steps into the internals of the Django ORM. I'm hoping to get a deeper understanding of how Django turns your code into queries for the database.

For my first step, I took a look at the aggregation mechanisms. Although it isn't documented in the page on aggregation, django makes it fairly easy to write your own aggregators.

As an aside, a lot of this is based on slides from a talk called Pro ORM that Alex Gaynor gave at some unknown time. Go read it if you're interested. "Pro ORM" is the link.

Django supports several Aggregators out of the box, including Count, Sum, Avg, Max, and Min. But there are other aggregators out there that Django doesn't support. You can check the documentation of your database for specifics.

I decided to try adding support for a new aggregator, just to see how everything works. Looking at the documentation for aggregates in SQLite, I discovered a function called group_concat, which takes a field and returns the values of that field, concatenated together with commas, or another separator which can be specified. So in SQL, to get the first names of all the users in your auth_user table who share a last name, you could do:

SELECT last_name, group_concat(first_name)
    FROM auth_user
    GROUP BY last_name

This might return:

"Smith", "Alice,Barbara,Carlo"
"Jones", "Willa,Xavier,Yesenia,Zachary"
...

group_concat can also take a second argument which is the separator to be used.

SELECT last_name, group_concat(first_name, ':')
    FROM auth_user
    GROUP BY last_name

would return:

"Smith", "Alice:Barbara:Carlo"
"Jones", "Willa:Xavier:Yesenia,Zachary"

To implement an aggregator in django, you need to create a subclass of the ORM-level Aggregate class (django.db.models.aggregates.Aggregate), which manages the details of the query, and then passes the data back to an appropriate subclass of the SQL-level Aggregate class (django.db.models.sql.aggregates.Aggregate), which converts its arguments to the appropriate SQL snippet ("group_concat(first_name)", in the example above) when the sql gets generated.

Both of these classes are well documented in the doc strings. Go read the source. You can find it here:

ORM Level Aggregate Class

The ORM-level Aggregate class defines three functions: __init__, _default_alias, and add_to_query. You can safely ignore _default_alias.

It's useful to know that __init__ will accept any number of arguments. The first, lookup is the name of a field that the aggregate operates on. As usual, you can traverse relationships with double-underscores. The rest of the arguments must be keyword arguments, and are stored in a dictionary on self.extra.

The last method, add_to_query, is the one you will be overriding. It takes several arguments, all of which are well documented in the source code. Go read it. Seriously. If you want to learn this stuff, you're going to have to get familiar with the source code. Here it is again in case you missed it the first time:

add_to_query basically selects the appropriate SQL level aggregator for the active backend, instantiates that aggregator with all the arguments the user passed in, and then stuffs the instance into an aggregates dict on the query object. The default logic for selecting the appropriate SQL level Aggregate class doesn't work for us, so we'll be doing something different

from django.db.models import Aggregate

class Concat(Aggregate):
    def add_to_query(self, query, alias, col, source, is_summary):
        aggregate = SQLConcat(col, source=source, is_summary=is_summary, **self.extra)
        query.aggregates[alias] = aggregate

Note that we have an undefined class: SQLConcat in there. We'll define that in a moment. If different backends need to do different things to implement this aggregation, we can provide separate SQLiteConcat, PostgresConcat, MySQLConcat classes, and choose appropriately based on the backend as indicated by the query parameter's using attribute.

SQL Level Aggregate

Now we need to create SQLConcat, which is a subclass of the SQL level Aggregate base class. Don't worry about the methods on this class, we really just need to concern ourselves with a few class attributes it has defined.

From the django source (edited) (which you should really go look at):

class Aggregate(object):
    """
    Default SQL Aggregate.
    """
    is_ordinal = False
    is_computed = False
    sql_function = ""
    sql_template = '%(function)s(%(field)s)'

We will need to set sql_function to the name of our function (group_concat), and override sql_template to match the syntax of the sql snippet we're trying to generate. is_ordinal indicates that the output should be an int, while is_computed indicates that it should be a float. Don't ask. Those are just the names they chose. sql_template can interpolate the values function and field (representing the value of sql_function and the resolved field name of the lookup value passed into our ORM-level Aggregate subclass as the first argument. In addition, sql_template can interpolate the names of any other keyword arguments passed in to the __init__ method of our ORM-level Aggregate subclass.

We could use the sql_template as it stands, and simply define our SQLConcat object as follows:

from django.db.models.sql.aggregates import Aggregate as SQLAggregate
# Import renamed so as to disambiguate it from the other Aggregate import,
# in case you want to put both in one class.

class SQLConcat(SQLAggregate):
    sql_function = 'group_concat'

This already works for the simple case. We can import our Concat aggregator and use it something like this:

>>> from django.contrib.auth import models
>>> group = models.Group.objects.all().annotate(Concat('user__first_name'))[0]
>>> group.user__first_name
'Alice,Barbara,Carlo'

But if we try to add a separator argument, it just gets ignored.

>>> from django.contrib.auth import models
>>> group = models.Group.objects.all().annotate(Concat('user__first_name', separator=':'))[0]
>>> group.user__first_name
'Alice,Barbara,Carlo'

Note that we were expecting 'Alice:Barbara:Carlo'.

The sql_template variable will interpolate any of the keys from the extra dict, which were originally populated from the keyword arguments passed to Concat. So we could write our SQLConcat class like this:

from django.db.models.sql.aggregates import Aggregate as SQLAggregate

class SQLConcat(SQLAggregate):
    sql_function = 'group_concat'
    sql_template = '%(function)s(%(field)s), "%(separator)s")'

Now we can instantiate Concat with a separator keyword argument, and get the results we expect. Unfortunately, it breaks if we don't pass in the separator argument. We want to be able to render our SQL with one or two arguments, but the python templating language doesn't provide that kind of flexibility. The solution for this is not something that was coded into Django; it's a little bit of intermediate-level python. We will replace the sql_template attribute with a sql_template property that checks to see if extras['separator'] exists, and returns the appropriate template.

from django.db.models.sql.aggregates import Aggregate as SQLAggregate

class SQLConcat(SQLAggregate):
    sql_function = 'group_concat'

    @property
    def sql_template(self):
        if separator in self.extra and separator is not None:
            return '%(function)s(%(field)s, "%(separator)s")'
        else:
            return '%(function)s(%(field)s);'

This does what we want whether or not we pass in the separator argument. It doesn't check to prevent other arguments from being passed in, but then again, neither do the existing Aggregate subclasses. (Go ahead: try Max(field, parrot='Dead'). I bet it'll work.)

If you want to allow users to pass in the second argument as a positional argument instead of a keyword argument, you'll have to modify the __init__ method on Concat. It can certainly be done, but I'll leave that as an excercise for the reader.

To recap, our code now looks like this:

from django.db.models import Aggregate
from django.db.models.sql.aggregates import Aggregate as SQLAggregate

class Concat(Aggregate):
    def add_to_query(self, query, alias, col, source, is_summary):
        aggregate = SQLConcat(col, source=source, is_summary=is_summary, **self.extra)
        query.aggregates[alias] = aggregate


class SQLConcat(SQLAggregate):
    sql_function = 'group_concat'

    @property
    def sql_template(self):
        if separator in self.extra and separator is not None:
            return '%(function)s(%(field)s, "%(separator)s")'
        else:
            return '%(function)s(%(field)s);'

So with two fairly simple classes, we are able to leverage features of our database engine that Django doesn't support out of the box. Before today, I probably would have dropped down to a raw query if I needed to do this. Now, I can do it cleanly and elegantly, leveraging the power of the Django ORM.

Comments !