My program is too slow. How do I speed it up?
That’s a tough one, in general. There are many tricks to speed up Python code; consider rewriting parts in C as a last resort.
In some cases it’s possible to automatically translate Python to C or x86 assembly language, meaning that you don’t have to modify your code to gain increased speed.
Pyrex can compile a slightly modified version of Python code into a C extension, and can be used on many different platforms.
Psyco is a just-in-time compiler that translates Python code into x86 machine code. If you can use it, Psyco can provide dramatic speedups for critical functions.
The rest of this answer will discuss various tricks for squeezing a bit more speed out of Python code. Never apply any optimization tricks unless you know you need them, after profiling has indicated that a particular function is the heavily executed hot spot in the code. Optimizations almost always make the code less clear, and you shouldn’t pay the costs of reduced clarity (increased development time, greater likelihood of bugs) unless the resulting performance benefit is worth it.
There is a page on the wiki devoted to performance tips.
Guido van Rossum has written up an anecdote related to optimization at http://www.python.org/doc/essays/list2str.html.
One thing to notice is that function and (especially) method calls are rather expensive; if you have designed a purely OO interface with lots of tiny functions that don’t do much more than get or set an instance variable or call another method, you might consider using a more direct way such as directly accessing instance variables. Also see the standard module profile which makes it possible to find out where your program is spending most of its time (if you have some patience — the profiling itself can slow your program down by an order of magnitude).
Remember that many standard optimization heuristics you may know from other programming experience may well apply to Python. For example it may be faster to send output to output devices using larger writes rather than smaller ones in order to reduce the overhead of kernel system calls. Thus CGI scripts that write all output in “one shot” may be faster than those that write lots of small pieces of output.
Also, be sure to use Python’s core features where appropriate. For example, slicing allows programs to chop up lists and other sequence objects in a single tick of the interpreter’s mainloop using highly optimized C implementations. Thus to get the same effect as: