Tuesday, 12 October 2010

Subclassing frozenset

I encountered a strange "feature" of Python standard library a few days ago. I needed to subclass frozenset (an immutable set that is a part of Python standard library) to add some validation check on the contents in the constructor. My first approach looked like this:
def validate(iterable):
print 'validation:', iterable # complex validation here


class MyFrozenset(frozenset):

def __init__(self, iterable):
listed = list(iterable)
validate(listed)
frozenset.__init__(self, listed) # ***

if __name__ == '__main__':
fs = MyFrozenset([0, 1, 2, 3, 4])
print fs
Note that I took precautions and converted iterable to list to avoid consuming a generator before it reached frozenset's __init__. The code worked, but I received a strange warning:
validation: [0, 1, 2, 3, 4]
blog.py:17: DeprecationWarning: object.__init__() takes no parameters
frozenset.__init__(self, listed)
MyFrozenset([0, 1, 2, 3, 4])
It seemed as if frozenset didn't override __init__ at all! And true enough, removing the call to the parent constructor (marked with *** in the code) removed the warning while the code still worked!

I started wondering how it is possible for the set to be filled without passing the iterable to the parent. I made another check, but used a generator as argument instead:
def get_five():
for i in xrange(5):
yield i

if __name__ == '__main__':
fs = MyFrozenset(get_five())
print fs
Based on the previous warning, the results were not very surprising:
validation: []
MyFrozenset([0, 1, 2, 3, 4])
It seems that the generator had already been consumed before entering my __init__ method! I then remembered that there is another object initialization method you can call beside __init__: the __new__ method! I quickly checked Python help and found:
__new__() is intended mainly to allow subclasses of immutable types (like int, str, or tuple) to customize instance creation.

It seems this is exactly what I was looking for! I quickly changed __init__ to __new__:
class ProperFrozenset(frozenset):

def __new__(cls, iterable):
listed = list(iterable)
validate(listed)
return frozenset.__new__(cls, listed)

if __name__ == '__main__':
fs = ProperFrozenset(get_five())
print fs
And it worked:
validation: [0, 1, 2, 3, 4]
ProperFrozenset([0, 1, 2, 3, 4])
I'm not sure why it's done this way in Python, but I guess it's for consistency: __init__ is a method that is called after an object has been created, so if the object is immutable, no method should be able to modify it. __new__, on the other hand, is called in the class context before the desired instance is created, so manipulation of arguments is still possible.

One mystery remains: if initialization of frozenset is done in its __new__ method, why do you still receive the iterable argument if you override __init__ and why not having an overridden __init__ doesn't raise a warning?

0 comments: