Python quirks

Python Strings Prefixes

Strings are easy. kind of.

Sometimes though, you can find one character right before the left apostrophe (or quotation mark). In this article, I’ll explain the difference between the prefixes you can write there.

There are four possible literals:

  1. u – unicode
  2. b – byte
  3. r – raw
  4. f – formatting

Literals comparison

Regular string

string = 'Hello world'

This one is pretty self-explanatory.

Unicode string

string = u'Hello world'

What makes our string so special? Well, we can put unicodic characters there too:

string = u'שלום עולם'

Python 3 made all strings unicodic by default, so there is no need to mark it. Actually, in Python 3.0-3.2 marking strings with u literal is considered as an error. They removed the error in 3.3 for making the code compatible with Python 2.

Since Python 2 is a dead language, I recommend not using it at all. Thus, I won’t elaborate on this one.

Byte string

not_a_string = b'Hello world'

Don’t call it a string

If it looks like a string, it’s shown in the “Python Strings Prefixes” article and printed like a string, it’s a string.

Actually, the type of this variable is bytes and not string.

Did you know ducks are help for debug?

What’s the difference? A variable of type bytes is a sequence of bytes. Each byte is a number in the range of 0-255.

A variable of type string is a sequence of characters. Each character is an encoded number. Since they’re different, comparing similar values returns False:

>>> 'Hello world' == b'Hello world'
False

When is it useful?

There are lots of different cases, but the one I think is the most important is when we want to send data from one socket to another:

import socket

# Server:

serverSocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
serverSocket.bind(('127.0.0.1', 1234))
serverSocket.listen()

clientConnection, _ = serverSocket.accept()
dataFromClient = clientConnection.recv(1024)
print('Data type:', type(dataFromClient), 'decoded data:', dataFromClient.decode())
response = b'Hello Client!'  # Looks better than 'Hello Client!'.encode()
clientConnection.send(response)

# Client:

clientSocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
clientSocket.connect(('127.0.0.1', 1234))

data = b'Hello Server!'  # Looks better than 'Hello Server!'.encode()
clientSocket.send(data)
dataFromServer = clientSocket.recv(1024)
print(dataFromServer.decode())

# output:
# Data type: <class 'bytes'> decoded data: Hello Server!
# Hello Client!

This program creates a server socket and waits for a connection. A client socket sends a message to the server socket.

Take a look at line #20. We encode the string to bytes before sending it to the server because sockets understand bytes, not strings. A string is an abstraction to bytes. The recipient of the data decodes the bytes back to a regular string since it’s easier to handle a string.

Note: don’t forget to shut down the socket and close its file descriptor

Raw string

string = r'Hello\tworld'

A raw string makes (almost) all characters in the string as literals, so when we define it or use it, the interpreter won’t adjust it accordingly. In our case, \t is the way to encode in strings an ASCII Horizontal Tab (TAB). By using raw string, we tell the interpreter to keep the literals \ and t as-is. Let’s print this string with/without r prefix and compare the results:

print('hello\tworld')  # output: hello	world
print('hello\\tworld')  # output: hello\tworld
print(r'Hello\tworld')  # output: hello\tworld

When is it useful?

It’s useful mostly in these cases:

  1. Regular expression: Usually, RegEx templates are hard to read and we use lots of backslashes. By declaring the template as raw, we can reduce the number of backslashes.
  2. File paths in Windows: When we want to point to a file in Windows, we use something like C:\path\to\folder\file.extension. Writing two backslashes (or forgetting to use two) is redundant if we use raw strings. Note that using os.path.join function is better than building the path manually in most cases.

Formatting string

name = 'world'
string = f'Hello {name}'

In Python there are a couple of ways to format a string:

  1. Using format function: 'Hello {}'.format('world')
  2. Using % syntax: 'Hello %s' % 'world'
  3. Using the f prefix as in the example above

When is it useful?

When we want to format a string with lots of substitutes, we have a string with lots of {} or %s and we need to remember the order or the names of the parameters when we provide them. For example:

# without parameters names:
string = 'Name: {}, age: {}, spiritual animal: {}'.format('Orian', 30, 'kitten')
# with parameters names:
string = 'Name: {name}, age: {age}, spiritual animal: {animal}'.format(name = 'Orian', age = 30, animal = 'kitten')

We can simplify the formatting procedure by declaring the string with f prefix:

my_name = 'Orian'
my_age = 30
my_spiritual_animal = 'kitten'
string = f'Name: {my_name}, age: {my_age}, spiritual animal: {my_spiritual_animal}'

Now, we don’t need to remember the order nor the names of the parameters since the interpreter evaluates the code inside the {}. This means that when we use the f prefix, we must format the string immediately. If we can’t know the values of the substitution on string declaration time, we need to use the other ways of formatting.

Note that usually, we don’t declare the variables just right before, but calling some functions or members for that.

Mixing literals

If we want to create a formatted raw string, for example, we can combine those literals:

name = 'world'
print(fr'Hello\t{name}')  # output: Hello\tworld

Note that not all literals can be combined. For example, we can’t combine f and b since a `bytes` variable cannot be formatted.

Incase-sensitive

Those literals are incase-sensitive. You can use lower letters, capital letters, a mix of cases. I don’t recommend using mixed cases though. It looks weird.

References

https://docs.python.org/3/reference/lexical_analysis.html#string-and-bytes-literals

https://pythontic.com/modules/socket/send

https://stackoverflow.com/questions/2464959/whats-the-u-prefix-in-a-python-string

https://stackoverflow.com/questions/2081640/what-exactly-do-u-and-r-string-flags-do-and-what-are-raw-string-literals


If you have any questions feel free to comment or contact me directly.

Thank you for reading this. Please share and visit soon again,

Orian Zinger.

One thought on “Python Strings Prefixes

Leave a comment