RobyB

Wow ! What a lesson ! I will bookmark it, it’s very interesting, and clarifies a lot of things.

Tony Ngok

So, in algorithms classes when we talk about the ‘time complexity’, are we focusing on the CPU bound computation?

Bartosz Zaczyński RP Team

@tonyn1999 Time complexity is an abstract measure of how often your algorithm performs the dominant operation. Some algorithms have better time complexity than others when they achieve the same goal in fewer steps.

Consider the following example:

``````def load(path):
with open(path) as file:
for line in file:
process(line)
``````

Here, the time complexity depends on the number of lines in the file since the most important operation, `process(line)`, is called once for each line. Depending on the nature of this processing, it could represent either an I/O-bound or a CPU-bound task. In particular, when `process(line)` just writes the line to another file or if it performs some complicated calculations.

Tony Ngok

Given that:

1. Mathematic operations are CPU bound
2. Multiprocessing (parallelism) are for CPU bound processes, while multithreading (concurrency) are for IO bound tasks

Then, why does this link says that `numpy.dot()` (dot product of two matrices) uses multithreading instead of multiprocessing?

Bartosz Zaczyński RP Team

@Tony Ngok The reason why multithreading is used for I/O-bound tasks while multiprocessing for CPU-bound tasks is the global interpreter lock (GIL). However, outside of Python, threads can indeed help in both types of situations.

NumPy is primarily implemented in C with bits of glue code that makes it possible to call its functions from Python. As such, it isn’t limited by the GIL, which is why NumPy can leverage threads to run code in parallel on multiple CPU cores.

If you’re interested, there are ways to bypass the GIL to achieve thread-based parallelism in Python.

Christopher Trudeau RP Team

Hi @Tony,

Bartosz is of course right, the GIL is what makes Python worse at this stuff, and using low-level libraries gives you ways of working around it.

Fundamentally though, a single CPU(*) can’t really do more than one thing at a time. Threading takes advantage of the fact that often a CPU is waiting on something. I can’t speak to NumPy’s decision specifically, but 1) talking to memory means the CPU has to wait, 2) using a co-processor (possibly even on-chip for floating point operations) means the CPU has to wait. Any time the CPU has to wait for something there is a chance that switching threads can mean the CPU can do something else. Switching threads has a cost though, so each application has to figure out if the “waiting for something” is long enough to gain an advantage by switching contexts.

(*) even this is an over simplification in modern processors. Most CPUs now have vector operations, meaning you can perform an “add” (or most other math ops) on multiple numbers at a time. NumPy definitely takes advantage of this, in fact depending on which flags NumPy is compiled with and which processor it is running on, you can see significant differences in its performance. And even that is an oversimplification, as modern CPUs do black-magic things like branch prediction and pipe-line loading of instructions – switching contexts means those things might have to be performed again when the code switches back in.

If you’re trying to understand how all this works, you’re asking all the right questions. The short answer is: it is complicated and messy. If you’re trying to optimize actual code, the best thing to do is write code without concurrency first, measure it, then figure out if adding concurrency will give you the speed-up you’re look for.

to join the conversation.

Lesson Completed!
Lesson Bookmarked
Request Failed :(