Python quirks

Structural Pattern Matching In Python

Once upon a time, Guido van Rossum, the Python creator, decided to reject a simple feature (PEP 3103): Switch case

The reason for rejecting this PEP is:

A quick poll during my keynote presentation at PyCon 2007 shows this proposal has no popular support. I therefore reject it.

~ Guido van Rossum

Lots of software developers who wanted to use it searched for substitutions. They found lots of guides to mimic the behavior. Even the official documentation suggests using if ... elif ... elif ... else when we desire to use switch case.

Soon, in Python 3.10, we’ll be able to do that big time. The new feature is much more than a simple switch case.

Illustration of me hearing about it for the first time

You see, in contrast to C language, the structural pattern matching feature can use instances, guards, and more when using a case statement.

For example, this is how we use it in C language:

switch(expression) {
   case constant_expression:
      statement();
      break;
   case another_constant_expression:
      another_statement();
      break;
   default:
      default_statement();
}

The equivalent Python code looks like this:

match expression:
    case constant_expression:
        statement()
    case another_constant_expression:
        another_statement()
    case _:
        default_statement()

First of all, it looks cleaner which is great, yet not surprising because most of (all?) Python code looks cleaner than C code.

Whenever we want to match a value, the interpreter tries to match it to different cases from top to bottom and then it automatically breaks the statement. It means we don’t need to use the break statement. If we try to use a break, we get an exception:

SyntaxError: 'break' outside loop

If we want to join cases together, we can use Union.

As I wrote above, This is not a simple switch case. We’re not done yet. We can do something crazy like this (output in comments and explanation afterward):

from dataclasses import dataclass


@dataclass
class Point:
    x: int
    y: int


def match_point(point):
    match point:
        case Point(_, 0):
            print("This point is located on the X axle")
        case Point(0, _):
            print("This point is located on the Y axle")
        case Point(x, y) if x == y:
            print("This point has identical x and y values")
        case Point():
            print("This point is not special")
        case dict():
            print("Why is there a dict?")
        case int() | float():
            print("A number?!")
        case _:
            print(f"This is not a point, but a {type(point)}")


match_point(Point(x = 0, y = 42))  # This point is located on the Y axle
match_point(Point(x = 42, y = 42))  # This point has identical x and y values
match_point(Point(x = 4, y = 2))  # This point is not special
match_point(Point(x = 0, y = 0))  # This point is located on the X axle
match_point({"this is not": "a point at all!"})  # Why is there a dict?
match_point(42)  # A number?!
match_point(4.2)  # A number?!
match_point(None)  # This is not a point, but a <class 'NoneType'>

What do we have here? Main things in this code:

  1. Line #5: Dataclass for holding properties of a Point class: x and y
  2. Line #10: Function that matches points (and some other stuff too)
  3. Line #14: We can create a generic pattern for the Point class. In this case, the code accepts every point on the y axle.
  4. Line #16: We can use a guard inside the case statement. No need to add an else clause.
  5. Line #18: We can create a generic pattern for the Point class. In this case, the code accepts every point.
  6. Line #22: We can put a union of cases together.
  7. Line #24: The generic pattern that receives every value is optional. The default behavior when no pattern is found is doing nothing.

Note that we don’t create an instance of a class when we “instantiate” the class in the case statements. We create a pattern there. The __new__ and __init__ functions aren’t called as you can see:

class A:
	def __new__(cls, *args, **kwargs):
		print("new")
		return super(A, cls).__new__(cls)

	def __init__(self):
		print("init")


a = A()  # output: new\ninit


def foo():
	match a:
		case A():  # no output
			pass


foo()

Performance comparison

How long does it take? Which way is better? The old if / elif statements chain or the new structural pattern matching feature? I added a dictionary implementation to the comparison, even though it can’t hash every element and sometimes we need to do more than returning a value.

A different type of race condition

Scenarios

  1. Comparing a number to one case
  2. Comparing a number to one thousand cases
  3. Comparing an instance with two members to one case
  4. Comparing an instance with two members to one thousand cases
  5. Comparing a list with three items to one case
  6. Comparing a list with three items to one thousand cases

You can see the code here. You’re welcome to fork it, test it, and push your numbers 🙂

Before I ran the tests I tried to guess the results. Since the interpreter matches a value, it knows that each comparison contains the same value. In if / elif statements, the interpreter doesn’t know in advance it’s about to compare the same value each time. Thus, the interpreter should work faster when it has lots of cases to compare to.

Comparing something to one case should be close enough because, in both situations, we compare only once.

Results

I run it using Python 3.10.0a7+ and Intel’s i7 6700 CPU.

Scenariosif / elifmatchdictionary
Comparing a number to one case0.0758170.0765320.081379
Comparing a number to one thousand cases25.03816417.8273200.084813
Comparing an instance with two members to one case0.5689100.4661630.383612
Comparing an instance with two members to one thousand cases611.594758397.5815290.390234
Comparing a list with three items to one case0.1201230.132838unhashable
Comparing a list with three items to one thousand cases82.28608962.894945unhashable
Performance comparison (results in seconds)

As suspected, performance improvement is crucial when we have lots of cases. Obviously, the dictionary with the O(1) efficiency is much better than the others O(n) efficiency

Summary

It’s interesting to see that 15 years after the feature proposal, we get such a strong pattern matching mechanism.

Note that usually we don’t write a thousand cases, but the way it looks compared to if / elif, the better maintenance, and the performance improvement are great things and I can’t wait to start using it when needed.

References

https://www.tutorialspoint.com/cprogramming/switch_statement_in_c.htm

https://www.youtube.com/watch?v=-79HGfWmH_w


If you have any questions feel free to comment or contact me directly.

Thank you for reading this. Please share and visit soon again,

Orian Zinger.

Leave a comment