I know people joke around about this a lot. But I was blown away by a recent switch to I did from using cv2 with python to using cv2 with C++.
I had literally hundreds of thousands of images to analyze for a dataset. My python script would have taken 12 hours.
I ported it to C++ and it literally destroyed it in 20 minutes.
I'm sure I was doing something that really wasn't optimized well for python. I know somewhere in the backend it probably was using a completely different library with multi thread optimization. Or maybe turbojpg is just garbage in python. I'm still not even sure what the bottleneck was. I don't know enough to really explain why.
But holy shit. I never had that much of a performance difference in such a simple task.
Was very impressed.