5 Must-Know Python Concepts for Experienced Developers

5 Must-Know Python Concepts for Experienced Developers

In-depth insights into advanced Python features

Prerequisites- This article assumes an intermediate level understanding of Programming concepts like OOPs, Classes, Inheritance etc.

Developers use Python programming language in different capacities. Many use it for glue-coding separate components, others for automating tasks, and many use it as their primary programming language. To be considered a serious Python programmer, one needs to know concepts beyond basic Python constructs.

Below is a compilation of a few such concepts, in no particular order, which can help develop a deeper understanding of the beautiful Python language.

1. Multiple Inheritance and its handling using MRO

Multiple inheritance is an OOPs construct where a child class is allowed to have more than one parent class, like

class Toddler(ParentA, ParentB):  
    pass

Python allows multiple inheritances, but many other high-level languages like Java don’t allow it.

Such languages disapprove of multiple inheritances primarily to avoid the Diamond problem. Below are the parent classes of our Toddler class.

class ParentA:  
    def how_to_talk():  
        print("Hello")

class ParentB:  
    def how_to_talk():  
        print("Namaste")

Diamond problem is an ambiguity that arises when a child class(Toddler) inherits from more than one parent class(ParentA, ParentB). The ambiguity arose as the child class has not overridden(has its own implementation of) the method(how_to_talk ), which has a different implementation in both parent classes. Effectively, our Toddler is not sure how_to_talk, to say Hello(as per ParentA) or Namaste(as per ParentB)!.

Python resolves the issue of the Diamond problem by using Method Resolution Order(MRO). As the name suggests, MRO defines the order of method lookup in parent classes. In Python, in simple terms, it goes from bottom to top and left to right. So, in our example, MRO will look at parents classes from left to the right and will encounter ParentA first. So, our Toddler will use the ParentA’s how_to_talk and say Hello.

To be precise, Python’s MRO uses the C3 linearization algorithm to resolve order. This algorithm guarantees that for any complex relationship of classes, it will always give the same order of resolution or else it will give failure if no such order can be determined.

One thing of note is that Python has a different MRO till version 2.2 and moved to the new one using C3 linearization from version 2.3.

2. Metaclasses and dual-use of keyword type

Note: We will be discussing Class and Metaclass concepts with respect to Python3 only.

You must have learned in OOPs concepts that a Class can be considered a template describing the properties and behaviour of all the Object created from them. In Python, if you wish to know the class of an object, you may use the keyword type as below:

//Method 1  
class Car:  
  wheel_count = 4

my_car = Car()

>>> type(my_car)  
<class '__main__.Car'>

Here, we defined a class Car and created its object my_car. Then we use the keyword type to obtain the class of our object. Since Python is a dynamically typed language, the type keyword is helpful to get the kind of any variable at runtime.

What separates Python from most other languages is that a Class is itself an Object in Python!. So in the above snippet, when I did class Car: an object was created. This object is of a class known as Metaclass, a class for creating classes.

When no metaclass is provided in Python, the default metaclass is type, our same good old keyword type, which also lets you know about any variable’s type. We can also provide Metaclass in class definition as below:

//Method 2  
class Car(object, metaclass=type):  
    wheel_count = 4

The above definition of class Car(Method 2) is the same as one defined a while ago(Method 1) as the object is the default superclass, and as mentioned, type is the default metaclass of all classes. We can also use the keyword type to create classes dynamically as below:

//Method 3
Car = type(‘Car’, (object), {'wheel_count':4})

where the first argument is class-name, the second is superclasses tuple, and the third is class namespace attributes dictionary. Though Method 3 is not exactly the same as the prior two methods because class keywords aren’t just syntactic sugar, it does some extra things, like setting an adequate __qualname__ and __doc__ properties or calling __prepare__.

A custom metaclass can be defined by subclassing the default metaclass type. Then it can be used to create a class just as we saw before.

class MyCustomMetaClass(type):  
    will_fly = True

class FlyingCar(metaclass=MyCustomMetaClass):  
    pass

There is more to what metaclasses are, what they can do and when they should be used. But it warrants a separate post in itself, and above is sufficient what an experienced Python developer should know about metaclasses.

3. Python program execution and PVM

We have learned that Python is an interpreted language in contrast to Java, a compiled language. Unfortunately, this is a too simplistic view of the actual execution process. If you deconstruct the Python interpreter, you will find a Python Virtual Machine and a Compiler(gasp!). So let’s begin from the beginning.

When we execute our python file by doing > python helloworld.py, the code is first compiled to a simpler version called bytecode and is stored in a .pyc or .pyo extension like helloworld.pyc.

The bytecode represents the fixed set of instructions created by Python developers representing all types of operations. The size of each byte code instruction is 1 byte(or 8 bits), and hence these are called ‘byte’ code instructions. This bytecode is a low-level set of instructions that are then interpreted by the Python Virtual Machine.

The compilation of python code to bytecode is a multi-step process, handling different aspects of the compilation. For, e.g. the first step parses the Python code into a parse tree, which identifies and flags all the syntax errors in the code.

When a Python program is compiled, the interpreter checks the last edit timestamp of the .py file and compare it against the timestamp of bytecode in .pycfile. The interpreter recompiles and generates new bytecode if the bytecode is older than Python code. Otherwise, it will skip the compilation and reuse the existing bytecode.

The interpreter saves all the bytecode .pyc files in the __pycache__ folder. Therefore, it is a good development practice not to commit your __pycache__ folder or .pyc files. To prevent the interpreter from saving bytecode files, one may also set the PYTHONDONTWRITEBYTECODE flag to any non-zero value. This flag comes quite handy while creating Python-based Dockerfiles, as we don’t want to save bytecode files in our docker images.

Unlike other compilers, which compile a high-level language into CPU understandable machine code, the Python compiler converts the Python code into another simpler code(bytecode) which is not machine code and hence can’t be executed by the CPU.

Instead, Python Virtual Machine or PVM interprets this bytecode into machine code to be executed by the CPU. This is why Python is considered an interpreted language despite having compilation as one of its execution stages.

The PVM allows Python to be platform-independent. Given the same bytecode and PVM version, Python bytecode can be executed on any platform. The next step would be to learn how to package python applications and distributions.

4. Python’s GIL and challenges to Multi-threading

It is pretty common knowledge that Python performs poorly in multi-threading. The root cause for this is how Python performs the Garbage collection, using the Reference Counting method.

In the reference counting garbage collection, Python records how many variables point to a particular variable, which keeps on changing. If the count of variables referring to an object reaches zero, the variable is eligible for garbage collection.

>>> import sys  
>>> my_dict = dict()  
>>> another_variable = my_dict  
>>> sys.getrefcount(my_dict)  
 3

In the above example, we initialize a dictionary object, and two variables, my_dict and another_variable are pointing towards it. We check the variable referencing our object using the sys.getrefcount() method, and while checking, we are also assigning one more variable, the function argument, to our object. Hence, we are getting the expected result as three. As mentioned earlier, when this reference count reaches zero, the object is eligible for garbage collection.

Now suppose two threads are updating the reference count of the same object. They might write simultaneously, causing the race condition. This describes race condition succinctly as:

A race condition is a condition of a program where its behavior depends on relative timing of multiple threads or processes. One or more possible outcomes may be undesirable, resulting in a bug.

The race condition can be avoided by introducing locks for each object, but they come with the issue of deadlock.

Python has a super lock on the interpreter itself called Global Interpreter Lock(GIL) to mitigate these complexities. However, if two threads are running, they take turns to acquire the GIL, and then the other thread cannot access the interpreter itself. This solves the problem of race condition and deadlock but prohibits simultaneous execution of both the threads, rendering Python effectively single-threaded even if the code is multi-threaded.

Efforts have been made to do away with GIL, but none has been successful yet. Recently, developer Sam Gross has proposed a significant change in GIL to boost the multi-threading performance. So, let’s keep our fingers crossed.

5. Magic methods and practical use of str and repr

Most Python developers are familiar with magic or dunder methods like __str__, __repr__, __add__ etc. In Python, Magic methods or Dunder methods are predefined methods that start and end with dunder(double underscore). They are primarily used for operator overloading and providing additional features to classes. Let’s see an example of a magic method __add__ used to overload the operator +.

class Handbag:  
  def __init__(self, item_count):  
    self.item_count = item_count

handbag1 = Handbag(item_count=5)  
handbag2 = Handbag(item_count=3)

>> handbag1 + handbag2  
>> TypeError: unsupported operand type(s) for +: 'Handbag' and 'Handbag'

In the above code, we have defined a class Handbag, which stores the number of items in it. Now we wish our Handbag class to have a property that when we add two Handbags, we will get the total item in both Handbags.

But, as we see in the above example we get an error that + is unsupported for Handbag class. To achieve our objective, we have to do the following:

class Handbag:  
  def __init__(self, item_count):  
    self.item_count = item_count

 def __add__(self, second_handbag):  
    return self.item_count + second_handbag.item_count

handbag1 = Handbag(item_count=5)  
handbag2 = Handbag(item_count=3)

>> handbag1 + handbag2  
>> 8

In the above example, we have added the magic method __add__, which will allow us to use + operator with Handbag objects. The __add__ takes two object inputs, one is self, the first operand, and the other is the second operand, and returns the sum of items in both bags.

str and repr magic methods

The __str__ and __repr__ are two popular magic methods often confused. However, their adequate understanding can go a long way in debugging and documentation.

The default objective of __repr__ is to have a string representation of the object from which object can be formed again using Python’s eval such that below holds true:

object = eval(repr(object))

This is possible for simple objects like list but not for complex objects. Let’s see an example of a simple object, a list.

>>> mylist = [1, 2, 3]  
>>> repr(mylist)  
'[1, 2, 3]'

>>> eval(repr(mylist))  
[1, 2, 3]

>>> mylist == eval(repr(mylist))  
True

The other magic method __str__ is used to get a string representation of an object for documentation and logging purposes. If __str__ of any class is not defined, but __repr__ is defined, the class’s objects will use __repr__ for __str__, but the reverse is not valid.

While writing code, the goal of __repr__ should be unambiguity, and that of __str__ should be readability. The __repr__ should provide a representation that can uniquely identify an object. As we saw in the list example above, it could be any identifier assigned to each object or the object’s constituents.

On the other hand, the objective of __str__ should be readability, where if an object is logged and printed, a user can easily read and understand its description, even if it is at the cost of uniqueness.

Magic methods are quite helpful tools in the Python toolkit and have the ability to add and modify many basic features with ease.

Above mentioned topics are by no means exhaustive and are picked based on my interaction with fellow developers. I feel a few more topics should be included in this but not included due to the blog’s length. In the future, I will compile the remaining such topics and write a supplement article.

That’s all for this blog, please follow for upcoming articles, thank you! If you liked what you read, check this story as well:

The (near) Perfect Dockerfile for Django Applications