← I Fixed the Go Diagrams | Inside the Django ORM: Aggregates →

Notes on the descriptor Protocol

I just figured out how the descriptor protocol works. It is not the most intuitive part of python, but it allows for some cool constructions. Just as a way of writing down what I've learned so I remember it. Say you have a class that you're trying to fit into an API that expects a different class. The original class has an attribute that is a dict. Your class is backed by a database table, and you want to be able to SELECT or INSERT objects in the database as people get or set items in the dictionary attribute.

The descriptor protocol has a __get__() method which gets called whenever the attribute gets accessed, and a __set__() method which gets called whenever somebody tries to set the attribute.

My solution to the API problem was to have the API class handle database access, and create a descriptor which calls methods in the API class whenever its __getitem__() or __setitem__() methods are accessed.

Note

Remember reading an object with (for instance) shop['cheese'] calls shop.__getitem__('cheese'), and writing an object with shop['cheese'] = 'wensleydale' calls shop.__setitem__('cheese', 'wensleydale')

The descriptor that does such a thing looks like this:

class AttributeDict(object):
    """This is the descriptor
    def __init__(self, getter, setter):
        """getter and setter are the names of the methods on the container
        class that will be called whenever the attribute is accessed like
        a dictionary."""

        self.getter = getter
        self.setter = setter

    def __get__(self, container, container_type):
        """When somebody reads this attribute off of the container
        instance, this gets called."""
        return AttributeDictObject(self, container, container_type)


class AttributeDictObject(object):
    def __init__(self, descriptor, container, container_type):
        self.descriptor = descriptor
        self.container = container
        self.container_type = container_type

    def __getitem__(self, key):
        """Call the getter method on the container instance, and return
        the return value of that call."""
        return getattr(self.container, self.descriptor.getter)(key)

    def __setitem__(self, key, value):
        """Call the setter method on the container instance"""
        getattr(self.container, self.descriptor.setter)(key, value)

When the descriptor is accessed, it calls __get__() with the container instance and container class as its arguments. From here, I return

Note I had to create two classes. Originally, I had __get__() return self, and defined __getitem__() and __setitem__() on the descriptor. That construction was subtly broken though. The problem is that every instance of the calling class shares the same descriptor instance, because a descriptor is instantiated on the class that contains them, not on the instances of that class, so multiple instances of the container class will share the same descriptor instance.

Jeff Bradberry pointed this pattern out to me. Returning a new object for each container instance that can access both the descriptor and the container instance works around this issue. (I told you it wasn't simple, but once you see the problem, the solution is as elegant as it can be).

This descriptor is used in the API class like this:

import os
import sqlite3

class UserNotes(object):
    users = AttributeDict('get_userdata', 'set_userdata')
    db_name = 'tmp.db'

    def __init__(self):
        new_db = not os.path.exists(self.db_name):
        self.datastore = sqlite3.connect(self.db_name)
        if new_db:
            cursor = self.datastore.cursor()
            cursor.execute(
                '''CREATE TABLE IF NOT EXISTS users
                   (username CHAR(8) PRIMARY KEY NOT NULL,
                   data VARCHAR);'''
            )
            self.datastore.commit()

    def get_userdata(self, key):
        cursor = self.datastore.cursor()
        cursor.execute('SELECT * FROM users WHERE username == ?', (key,))
        result = cursor.fetchone()
        return result[1]

    def set_userdata(self, key, value):
        cursor = self.datastore.cursor()
        cursor.execute('INSERT INTO users (username, data) VALUES (?,?);', (key, value))
        self.datastore.commit()

The UserNotes class has a descriptor attribute users, which can be used as a dictionary, but which accesses the get_userdata() and set_userdata() methods (thus getting access to UserNotes's database). It is used as follows:

>>> notes = UserNotes()
>>> notes.users['jcd'] = 'Some guy'
>>> notes.users['gvr'] = 'Kind of a big deal'
>>> print 'jcd:', notes.users['jcd']
jcd: Some guy
>>> print 'gvr:', notes.users['gvr']
gvr: Kind of a big deal

Hope that helps someone else understand descriptors a little bit as well.

Comments !