Photo by Chris Ried on Unsplash

Python bits: dataclasses — Part 2

Fernando Barbosa

--

Reminder: This is a learning journal

In my previous post, I began exploring dataclasses applied to a domain driven design. These are the rules we created last time:

Notice that rule_2 had the same attribute rule_id as rule_1 . This was on purpose to demonstrate equality between these objects.

As we can see, the == operator produces True for the comparison of two objects with the same values. However, are they really the same object?

Cool! The is operator does produce a False and we can infer that they do not have the same id:

>>>print(id(rule_1))
140660181370432
>>>print(id(rule_2))
140660181370384

So what?

A lot of whats! But let’s focus on the main ones: dataclasses provides the frozen argument:

frozen: If true (the default is False), assigning to fields will generate an exception. This emulates read-only frozen instances. If __setattr__() or __delattr__() is defined in the class, then TypeError is raised.”

>>> @dataclass(frozen = True)
... class Rule:
... rule_id: str
... criteria: str
... operator: int
... value: int
...
>>> rule_1 = Rule("0001", "years_old", ">", 18)
>>> rule_1.rule_id = '0002'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<string>", line 4, in __setattr__
dataclasses.FrozenInstanceError: cannot assign to field 'rule_id'

💥Baaammm! Another fantastic feature, it saves time and quite a few lines of code because it takes care of mutability for us. It returns a `FrozenInstanceError`.

With that in mind, let’s look at aliasing if we don't use frozen=True:

>>> @dataclass()
... class Rule:
... rule_id: str
... criteria: str
... operator: int
... value: int
...
>>> rule_1 = Rule("0001", "years_old", ">", 18)
>>> rule_2 = Rule("0002", "years_old", ">", 18)

Notice that we made the rule_id different for the second rule, this time. So, even though we have exactly the same values for Rule `criteria, operator, and value`, our comparison returns False:

>>> print(rule_1 == rule_2)
False
>>> print(rule_1 is rule_2)
False

Well, that is to be expected, isn’t it? Yes, but if we change our code only slightly, we notice a completely different behavior. What if, instead of creating a new class from scratch, we simply “copy” or “assign” rule_1 to rule_2 :

>>> rule_2 = rule_1
>>> print(rule_1 == rule_2)
True
>>> print(rule_1 is rule_2)
True
>>> print(id(rule_1)==id(rule_2))
True

Both objects are the same and if we make a change to one of them, it’ll be applied to both:

>>> rule_2.rule_id= "0002"
>>> print(rule_1.rule_id)
0002

To the point: this type of “copy” is called aliasing and it can cause a lot of problems if used recklessly. However, we can avoid these issues by using dataclasses with its frozen argument set to True .

There we go:

>>> rule_2 = rule_1
>>>
>>> print(rule_1 == rule_2)
True
>>>
>>> print(rule_1 is rule_2)
True
>>>
>>>
>>> rule_2.rule_id = '0003'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<string>", line 4, in __setattr__
dataclasses.FrozenInstanceError: cannot assign to field 'rule_id'

Now we have a “copy”, or rather the reference of the same object without running the risk of whimsically changing it in the future.

Review of Python Basics:

Mutability: when a Python object’s value can be changed we say this object is mutable. Otherwise, immutable (ish) 😅.

Mutability is tricky and deserves our attention, check out this amazing post by Ventsislav Yordanov. The cherry on top: he also explains how the operators == and is work.

--

--

Fernando Barbosa

Data Scientist with special interests: MLOps, Design of Experiments and Prescriptive Analytics.