Code Structure, Modules, and Namespaces
How to get what you want when you want it
Code Structure
In Python, the structure of your code is determined by whitespace. This is nicely clear, and you’ve probably already figured it out, but we’ll formally spell it out here:
How you indent your code determines how it is structured
block statement:
some code body
some more code body
another block statement:
code body in
that block
end of "another" block statement
still in the first block
outside of the block statement
The colon that terminates a block statement is also important…
One-liners
You can put a one-liner after the colon:
In [167]: x = 12
In [168]: if x > 4: print(x)
12
But this should only be done if it makes your code more readable. And that is rare.
So you need both the colon and the indentation to start a new a block. But the end of the indented section is the only indication of the end of the block.
Spaces vs. Tabs
Whitespace is important in Python.
An indent could be:
Any number of spaces
A tab
A mix of tabs and spaces:
If you want anyone to take you seriously as a Python developer:
Always use four spaces – really! (PEP 8)
Note
If you do use tabs (and really, don’t do that!) python interprets them as the equivalent of eight spaces. Text editors can display tabs as any number of spaces, and most modern editors default to four – so this can be very confusing! so again:
Never mix tabs and spaces in Python code
Spaces Elsewhere
Other than indenting – space doesn’t matter, technically.
x = 3*4+12/func(x,y,z)
x = 3*4 + 12 / func (x, y, z)
These will give the exact same results.
But you should strive for proper style. Isn’t this easier to read?
x = (3 * 4) + (12 / func(x, y, z))
Read PEP 8 and install a linter in your editor.
Modules and Packages
Python is all about namespaces – the “dots”
name.another_name
The “dot” indicates that you are looking for a name in the namespace of the given object. It could be:
A name in a module
A module in a package
An attribute of an object
A method of an object
The only way to know is to know what type of object the name refers to. But in all cases, it is looking up a name in the namespace of the object.
So what are all these different types of namespaces?
Modules
A module is simply a namespace. But a module more or less maps to a file with python code in it.
It might be a single file, or it could be a collection of files that define a shared API.
But in the common and simplest case, a single file is a single module.
So you can think of the files you write that end in .py
as modules.
When a module is imported, the code in that file is run, and any names defined in that file are now available in the module namespace.
For a really simple example, if you have the following in the trivial.py
file:
1 x = 1
2 y = 2
3
4 def do_nothing(a, b, c):
5 print("do_nothing was called with:", a, b, c)
6
7 print("at the end of the trivial module")
What do you think happens when you import that module? What will get printed?
What names will be defined in that module?
How would you access those names?
Before running this code, think about it a bit. Recall what happens when you import a module:
The code is run in the module, top to bottom.
The names defined in the module (its global namespace) are made available in the modules namespace.
So: Lines 1-2 assign two names, x
and y
. lines 4-5 define a function, named do_nothing
. Line 7 prints something.
So: when run, there are three names defined, and one print function run.
Now try it:
>>> import trivial
at the end of the trivial module
yes, we got that print function run.
Let’s see if the names are there:
>>> trivial.x
1
>>> trivial.y
2
>>> trivial.do_nothing(3,4,5)
do_nothing was called with: 3 4 5
yes, there are, in the trivial
namespace.
Packages
A package is a module with other modules in it.
On a filesystem, this is represented as a directory that contains one or more .py
files, one of which must be called __init__.py
. The __init__.py
file can be empty (and often is) – but it must be there.
When there is a package available, you can import only the package, or any of the modules inside it. When a package is imported, the code in the __init__.py
file is run, and any names defined in that file are available in the package namespace.
Here we define about the simplest package possible:
Create a directory (folder) for your package:
mkdir my_package
Save a file in that package, called __init__.py
, and put this in it:
name1 = "Fred"
name2 = "Bob"
Save another file in your my_package dir called a_module.py
, and put this in it:
name3 = "Mary"
name4 = "Jane"
def a_function():
print("a_function has been called")
You now have about the simplest package you can have. Make sure your current working dir is the dir that my_package
is in, and start python or iPython. Then try this code:
In [1]: import my_package
In [2]: my_package.name1
Out[2]: 'Fred'
In [3]: my_package.name2
Out[3]: 'Bob'
The names you’ve defined are available in the package namespace.
What about the module?
In [4]: my_package.a_module
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-4-8b9269cdf0e5> in <module>()
----> 1 my_package.a_module
AttributeError: module 'my_package' has no attribute 'a_module'
the a_module
name does not exist. It must be imported explicitly:
In [1]: import my_package.a_module
Now the names defined in the a_module.py
file are all there:
In [2]: my_package.a_module.name3
Out[2]: 'Mary'
In [3]: my_package.a_module.name4
Out[3]: 'Jane'
In [4]: my_package.a_module.a_function()
a_function has been called
Note that you can also put a package inside a package. So you can create arbitrarily deeply nested hierarchy of packages. This can be helpful for a large, complex collection of related code, such as an entire Web Framework. But from the Zen of Python:
“Flat is better than nested.”
So don’t overdo it – only go as deep as you really need to to keep your code organized.
Importing modules
You have probably imported a module or two already:
import sys
import math
But there a handful of ways to import modules and packages.
import modulename
Is the simplest way: this adds the name of the module to the global namespace, and lets you access the names defined in that module:
modulename.a_name_in_the_module
If you want only a few names in a module, and don’t want to type the module name each time, you can import only the names you want:
from modulename import this, that
This brings only the names specified (this
, that
) into the global namespace. All the code in the module is run, but the module’s name is not available. But the explicitly imported names are directly available.
Sometimes you want the entire module, but maybe not want to type its entire name eadh time you use. So you can rename a module when you import it. (you may also want to do this if a module has the same name as a variable you want to use…)
import modulename as a_new_name
This imports the module, and gives it a new name in the global namespace. For example, the numpy package is usually imported as:
import numpy as np
Because numpy has a LOT of names, some of which may conflict with builtins or other modules, and users want to be able to reference them without too much typing.
You can also import a name within a module and rename it at the same time:
from modulename import this as that
This imports only one name from a module, while also giving it a new name in the global namespace.
Examples
You can play with some of this with the standard library:
In [1]: import math
In [2]: math.sin(1.2)
Out[2]: 0.9320390859672263
In [3]: from math import cos
In [4]: cos(1.2)
Out[4]: 0.3623577544766736
In [5]: import math as m
In [6]: m.sin(1)
Out[6]: 0.8414709848078965
In [7]: from math import cos as cosine
In [8]: cosine(1.2)
Out[8]: 0.3623577544766736
My rules of thumb
If you only need a few names from a module, import only those:
from math import sin, cos, tan
If you need a lot of names from that module, just import the module:
import math
math.cos(2 * math.pi)
Or import it with a nice short name:
import math as m
m.cos(2 * m.pi)
import * ?
Warning:
You can also import all the names in a module with:
from modulename import *
But this leads to name conflicts, and a cluttered namespace. It is NOT recommended practice anymore.
Importing from packages
Packages can contain modules, which can be nested – ideally not very deeply.
In that case, you can simply add more “dots” and follow the same rules as above.
from packagename import my_funcs.this_func
Packages and Packaging goes into more detail about creating (and distributing!) your own package.
What does import
actually do?
When you import a module, or a symbol from a module, the Python code is compiled to bytecode.
The result is a module.pyc
file.
Then after compiling, all the code in the module is run at the module scope – that is, in the namespace of the module.
For this reason, it is good to avoid module-scope statements that have global side-effects. This includes things as simple as a print()
– it will only print the first time the module is imported.
Re-import
The code in a module is not re-run when imported again. This makes it efficient to import the same module multiple places in a program. But it means that if you change the code in a module after importing it, that change will not be reflected when it is imported again.
If you DO want a change to be reflected, you can explicitly reload a module:
import importlib
importlib.reload(modulename)
This is rarely needed (which is why it’s a bit buried in the importlib
module), but is good to keep in mind when you are interactively working on code under development.
Import Interactions
Another key point to keep in mind is that all code files in a given python program are sharing the same modules. So if you change a value in a module, that value’s change will be reflected in other parts of the code that have imported that same module.
This can create dangerous side effects and hard to find bugs if you change anything in an imported module, but it can also be used as a handy way to store truly global state, like application preferences, for instance.
A rule of thumb for managing global state is to have only one part of your code change the values, and everywhere else considers them read-only. You can’t enforce this, but you can structure you own code that way.
Let’s take a look at an example of this.
Create three modules (python files):
mod1.py
, mod2.py
, mod3.py
mod1.py
is very simple – one name declared:
x = 5
mod2.py
is where a bit actually goes on:
#!/usr/bin/env python3
import mod1
print(f"In mod2: mod1.x = {mod1.x}")
input("pausing (hit enter to continue >")
print("importing mod3")
import mod3
print(f"Still in mod2: mod1.x = {mod1.x}")
print("mod3 changed the value in mod1, and that change shows up in mod2")
Here, we import mod1
, and we can now see the names defined in it, and print the value of x
. Then it pauses, waiting for input. After the user hits the <enter> key, it then imports mod3
, and again prints the value of x
that is in mod1
. Let’s now look at mod3.py
:
import mod1
print("In mod3 -- changing the value of mod1.x")
mod1.x = 555
Other than the print – all mod3
does is re-set the value of x
that is on mod1
.
Running mod2.py
results in:
$ python mod2.py
In mod2: mod1.x = 5
pausing (hit enter to continue >
importing mod3
In mod3 -- changing the value of mod1.x
Still in mod2: mod1.x = 555
mod3 changed the value in mod1, and that change shows up in mod2
You can see that when mod2
changed the value of mod1.x
, that changed the value everywhere that mod1
is imported. You want to be very careful about this.
If you are writing mod2.py
, and did not write mod3
(or wrote it long enough ago that you don’t remember its details), you might be very surprised that a value in mod1
changes simply because you imported mod3
. This is known as a “side effect”, and you generally want to avoid them!