Unless the C++ code was doing something wrong there’s literally no way you can write pure Python that’s 10x faster than it. Something else is going on there. Maybe the c++ code was accidentally O(N^2) or something.
In general Python will be 10-200 times slower than C++. 50x slower is typical.
Unless the C++ code was doing something wrong there’s literally no way you can write pure Python that’s 10x faster than it. Something else is going on there.
Completely agreed, but it can be surprising just how often C++ really is written that inefficiently; I have had multiple successes in my career of rewriting C++ code in Python and making it faster in the process, but never because Python is inherently faster than C++.
Yeah exactly. You made it faster through algorithmic improvement. Like for like Python is far far slower than C++ and it’s impossible to write Python that is as fast as C++.
Nope, if you’re working on large arrays of data you can get significant speed ups using well optimised BLAS functions that are vectorised (numpy) which beats out simply written c++ operating on each array element in turn. There’s also Numba which uses LLVM to jit compile a subset of python to get compiled performance, though I didnt go to that in this case.
You could link the BLAS libraries to c++ but its significantly more work than just importing numpy from python.
Numba is interesting… But a) it can already do multithreading so this change makes little difference, and b) it’s still not going to be as fast as C++ (obviously we don’t count the GPU backend).
Python is written in C too, what’s your point? I’ve seen this argument a few times and I find it bizarre that “easily able to incorporate highly optimised Fortran and C numerical routines” is somehow portrayed as a point against python.
Numpy is a defacto extension to the python standard that adds first class support for single type multi-dimensional arrays and functions for working on them. It is implemented in a mixture of python and c (about 60% python according to github) , interfaces with python’s c-api and links in specialist libraries for operations. You could write the same statement for parts of the python std-lib, is that also not python?
Its hard to not understate just how much simpler development is in numpy compared to c++, in this example here the new python version was less than 50 lines and was developed in an afternoon, the c++ version was closing in on 1000 lines over 6 files.
I was responding to your general statement that python is slow and so there is no point in making it faster, I agree that removing the GIL wont do much to improve the execution speed for programs making heavy use of numpy or things calling outside it.
That’s a bit suss too tbh. Did the C++ version use an existing library like Eigen too or did they implement everything from scratch?
It was written entirely from scratch which is kind of my point, a well writen python program can outperform a naive c implementation and is vastly simpler to create.
If you have the expertise and are willing to put in the effort you likely can squeze that extra bit of performance out by dropping to a lower level language, but for certain workloads you can get good performance out of python if you know what you are doing so calling it extremely slow and saying you have to move to another language if you care about performance is missleading.
I am indeed thinking of CPython because a) approximately nobody uses PyPy, and b) this article is about CPython!!
In any case, PyPy is only about 4x faster than CPython on average (according to their own benchmarks) so it’s only going to be able to compete with C++ in random specifics circumstances, not in general.
You’re both at least partly right. The only interpreted language that can compete with compiled for execution speed is Java and it has the downside of being Java.
That being said, you might be surprised at how fast you can make Python code execute, even pre-GIL changes. I certainly was. Using multiprocessing and code architected to be run massively parallel, it can be blazingly fast. It would still be blown out of the water by similarly optimized compiled code but, is worth serious consideration if you want to optimize for iterative development.
My view on such workflows would be:
Write iteration of code component in Python.
Release.
Evaluate if any functional changes are required. If so, goto 1.
Port component to compiled language, changing function calls/imports to make use of the compiled binary alongside the other interpreted components.
Release.
Refactor code to optimize for compiled language, features that compiled language enables, and/or security/bug fixes.
Release.
Evaluate if further refactor is required at this time, if so, goto 6.
The only interpreted language that can compete with compiled for execution speed is Java
“Interpreted” isn’t especially well defined but it would take a pretty wildly out-there definition to call Java interpreted! Java is JIT compiled or even AoT compiled recently.
it can be blazingly fast
It definitely can’t.
It would still be blown out of the water by similarly optimized compiled code
Well, yes. So not blazingly fast then.
I mean it can be blazingly fast compared to computers from the 90s, or like humans… But “blazingly fast” generally means in the context of what is possible.
Port component to compiled language
My extensive experience is that this step rarely happens because by the time it makes sense to do this you have 100k lines of Python and performance is juuuust about tolerable and we can’t wait 3 months for you to rewrite it we need those new features now now now!
My experience has also shown that writing Python is rarely a faster way to develop even prototypes, especially when you consider all the time you’ll waste on pip and setuptools and venv…
“Interpreted” isn’t especially well defined but it would take a pretty wildly out-there definition to call Java interpreted! Java is JIT compiled or even AoT compiled recently.
Java is absolutely interpreted, supposing that the AoT isn’t being used. The code must be interpreted by JVM (an interpreter and JIT compiler) in order to output binary data that can run on any system, the same as any interpreted language. It is a pretty major stretch, in my mind to claim that it’s not. The simplest test would be: “Does the program require any additional programs to provide the system with native binaries at runtime?”
It definitely can’t.
Well, yes. So not blazingly fast then.
I mean it can be blazingly fast compared to computers from the 90s, or like humans… But “blazingly fast” generally means in the context of what is possible.
I find that context marginally useful in practice. In my experience it is prone to letting perfect be the enemy of good and premature optimization.
My focus is more in tooling, however, so, might be coming from very different places. In my contexts, things are usually measured against existing processes and tooling and frequently on human scale. Do my something in 5 seconds that usually takes a human 15 minutes and that’s an improvement of nearly 3 orders of magnitude.
My extensive experience is that this step rarely happens because by the time it makes sense to do this you have 100k lines of Python and performance is juuuust about tolerable and we can’t wait 3 months for you to rewrite it we need those new features now now now!
You’re not wrong. I’m actually in the process of making such a push where I’m at, for the first time in my career. It helps a lot if you can architect it so that you can have runner and coordinator components as those, at their basics, are simple to implement in most languages. Then, things can be iteratively ported over time.
My experience has also shown that writing Python is rarely a faster way to develop even prototypes, especially when you consider all the time you’ll waste on pip and setuptools and venv…
That’s… an odd perspective to me. Pip and venv have been tools that I’ve found to greatly accelerate dev setup and application deployment. Installing any third-party dependencies in a venv with pip means that one can pip freeze later and dump directly to a requirements.txt for others (including deployment) to use.
Pip and venv have been tools that I’ve found to greatly accelerate dev setup and application deployment.
I’m not saying pip and venv are worse than not using them. They’re obviously mandatory for Python development. I mean that compared to other languages they provide a pretty awful experience and you’ll constantly be fighting them. Here’s some examples:
Pip is super slow. I recently discovered uv which is written in Rust and consequently is about 10x faster (57s to 7s in my case).
Pip gives terrible error messages. For example it assumes all version resolution failures are due to requirements conflicts, when actually it can be due to Python version requirements too so you get insane messages like “Requirement foo >= 2.0 conflicts with requirement foo == 2.0”. Yeah really.
You can’t install multiple versions of the same dependency, so you end up in dependency resolution hell (depA depends on foo >= 3 but depB depends on foo <= 2).
No namespace support for package names so you can’t securely use private PyPI repositories.
To make static typing work properly with Pyright and venv and everything you need some insane command like pip install --conf-settings editable-mode=compat --editable ./mypackage. How user friendly. Apparently when they changed how editable packages were installed they were warned that it would break all static tooling but did it anyway. Good job guys.
When you install an editable package in a venv it dumps a load of stuff in the package directory, which means you can’t do it twice to two different venvs.
The fact that you have to use venvs in the first place is a pain. Don’t need that with Deno.
There’s so much more but this is just what I can remember off the top of my head. If you haven’t run into these things just be glad your Python usage is simple enough that you’ve been lucky!
I’m actually in the process of making such a push where I’m at, for the first time in my career
Oh my. Yeah. I have seen these before indeed. IMO, that’s also a sure sign that a compiled language needs to come into the picture (port the simplest conflicting component). Especially if, for some reason multiple dependency versions are needed (I have hit that in particular myself).
I’ve not yet had the pleasure of working with Rust much but that’s the target for the next version that we start, so, will be fun.
Unless the C++ code was doing something wrong there’s literally no way you can write pure Python that’s 10x faster than it. Something else is going on there. Maybe the c++ code was accidentally O(N^2) or something.
In general Python will be 10-200 times slower than C++. 50x slower is typical.
Completely agreed, but it can be surprising just how often C++ really is written that inefficiently; I have had multiple successes in my career of rewriting C++ code in Python and making it faster in the process, but never because Python is inherently faster than C++.
Yeah exactly. You made it faster through algorithmic improvement. Like for like Python is far far slower than C++ and it’s impossible to write Python that is as fast as C++.
Nope, if you’re working on large arrays of data you can get significant speed ups using well optimised BLAS functions that are vectorised (numpy) which beats out simply written c++ operating on each array element in turn. There’s also Numba which uses LLVM to jit compile a subset of python to get compiled performance, though I didnt go to that in this case.
You could link the BLAS libraries to c++ but its significantly more work than just importing numpy from python.
Numpy is written in C.
Numba is interesting… But a) it can already do multithreading so this change makes little difference, and b) it’s still not going to be as fast as C++ (obviously we don’t count the GPU backend).
So you get the best of both worlds then: the speed of C and the ease of use of Python.
Sure but that’s not relevant to the current discussion. The point is that removing the GIL doesn’t affect Numpy because Numpy is written in C.
Python is written in C too, what’s your point? I’ve seen this argument a few times and I find it bizarre that “easily able to incorporate highly optimised Fortran and C numerical routines” is somehow portrayed as a point against python.
Numpy is a defacto extension to the python standard that adds first class support for single type multi-dimensional arrays and functions for working on them. It is implemented in a mixture of python and c (about 60% python according to github) , interfaces with python’s c-api and links in specialist libraries for operations. You could write the same statement for parts of the python std-lib, is that also not python?
Its hard to not understate just how much simpler development is in numpy compared to c++, in this example here the new python version was less than 50 lines and was developed in an afternoon, the c++ version was closing in on 1000 lines over 6 files.
The point is that eliminating the GIL mainly benefits pure Python code. Numpy is already multithreaded.
I think you may have forgotten what we’re talking about.
That’s a bit suss too tbh. Did the C++ version use an existing library like Eigen too or did they implement everything from scratch?
I was responding to your general statement that python is slow and so there is no point in making it faster, I agree that removing the GIL wont do much to improve the execution speed for programs making heavy use of numpy or things calling outside it.
It was written entirely from scratch which is kind of my point, a well writen python program can outperform a naive c implementation and is vastly simpler to create.
If you have the expertise and are willing to put in the effort you likely can squeze that extra bit of performance out by dropping to a lower level language, but for certain workloads you can get good performance out of python if you know what you are doing so calling it extremely slow and saying you have to move to another language if you care about performance is missleading.
You’re thinking of CPython. PyPy can routinely compete with C and C++, particularly in allocation-heavy or pointer-heavy scenarios.
I am indeed thinking of CPython because a) approximately nobody uses PyPy, and b) this article is about CPython!!
In any case, PyPy is only about 4x faster than CPython on average (according to their own benchmarks) so it’s only going to be able to compete with C++ in random specifics circumstances, not in general.
And PyPy still has a GIL! Come on dude, think!
You’re both at least partly right. The only interpreted language that can compete with compiled for execution speed is Java and it has the downside of being Java.
That being said, you might be surprised at how fast you can make Python code execute, even pre-GIL changes. I certainly was. Using multiprocessing and code architected to be run massively parallel, it can be blazingly fast. It would still be blown out of the water by similarly optimized compiled code but, is worth serious consideration if you want to optimize for iterative development.
My view on such workflows would be:
“Interpreted” isn’t especially well defined but it would take a pretty wildly out-there definition to call Java interpreted! Java is JIT compiled or even AoT compiled recently.
It definitely can’t.
Well, yes. So not blazingly fast then.
I mean it can be blazingly fast compared to computers from the 90s, or like humans… But “blazingly fast” generally means in the context of what is possible.
My extensive experience is that this step rarely happens because by the time it makes sense to do this you have 100k lines of Python and performance is juuuust about tolerable and we can’t wait 3 months for you to rewrite it we need those new features now now now!
My experience has also shown that writing Python is rarely a faster way to develop even prototypes, especially when you consider all the time you’ll waste on pip and setuptools and venv…
Java is absolutely interpreted, supposing that the AoT isn’t being used. The code must be interpreted by JVM (an interpreter and JIT compiler) in order to output binary data that can run on any system, the same as any interpreted language. It is a pretty major stretch, in my mind to claim that it’s not. The simplest test would be: “Does the program require any additional programs to provide the system with native binaries at runtime?”
I find that context marginally useful in practice. In my experience it is prone to letting perfect be the enemy of good and premature optimization.
My focus is more in tooling, however, so, might be coming from very different places. In my contexts, things are usually measured against existing processes and tooling and frequently on human scale. Do my something in 5 seconds that usually takes a human 15 minutes and that’s an improvement of nearly 3 orders of magnitude.
You’re not wrong. I’m actually in the process of making such a push where I’m at, for the first time in my career. It helps a lot if you can architect it so that you can have runner and coordinator components as those, at their basics, are simple to implement in most languages. Then, things can be iteratively ported over time.
That’s… an odd perspective to me. Pip and venv have been tools that I’ve found to greatly accelerate dev setup and application deployment. Installing any third-party dependencies in a venv with pip means that one can
pip freeze
later and dump directly to a requirements.txt for others (including deployment) to use.I’m not saying pip and venv are worse than not using them. They’re obviously mandatory for Python development. I mean that compared to other languages they provide a pretty awful experience and you’ll constantly be fighting them. Here’s some examples:
uv
which is written in Rust and consequently is about 10x faster (57s to 7s in my case).pip install --conf-settings editable-mode=compat --editable ./mypackage
. How user friendly. Apparently when they changed how editable packages were installed they were warned that it would break all static tooling but did it anyway. Good job guys.There’s so much more but this is just what I can remember off the top of my head. If you haven’t run into these things just be glad your Python usage is simple enough that you’ve been lucky!
Good luck!
Oh my. Yeah. I have seen these before indeed. IMO, that’s also a sure sign that a compiled language needs to come into the picture (port the simplest conflicting component). Especially if, for some reason multiple dependency versions are needed (I have hit that in particular myself).
I’ve not yet had the pleasure of working with Rust much but that’s the target for the next version that we start, so, will be fun.
Rust is a lovely language if you’re okay getting deep into the nuts and bolts.