I remember hearing somewhere that "large functions might have higher execution times" because of code size, and CPU cache or something like that.
How can I tell if function size is imposing a performance hit for my application? How can I optimize against this? I have a CPU intensive computation that I have split into (as many threads as there are CPU cores). The main thread waits until all of the worker threads are finished before continuing.
I happen to be using C++ on Visual Studio 2010, but I'm not sure that's really important.
I'm running a ray tracer that shoots about 5,000 rays per pixel. I create (cores-1) threads (1 per extra core), split the screen into rows, and give each row to a CPU thread. I run the
trace function on each thread about 5,000 times per pixel.
I'm actually looking for ways to speed this up. Es posible for me to reduce the size of the main tracing function by refactoring, and I want to know if I should expect to see a performance gain.
A lot of people seem to be answering the wrong question here, I'm looking for an answer a este specific question, even if you think I can probably do better by optimizing the contents of the function, I want to know if there is a function size/performance relationship.
preguntado el 12 de junio de 12 a las 16:06
It's not really the size of the function, it's the total size of the code that gets cached when it runs. You aren't going to speed things up by splitting code into a greater number of smaller functions, unless some of those functions aren't called at all in your critical code path, and hence don't need to occupy any cache. Besides, any attempt you make to split code into multiple functions might get reversed by the compiler, if it decides to inline them.
So it's not really possible to say whether your current code is "imposing a performance hit". A hit compared with which of the many, many ways that you could have structured your code differently? And you can't reasonably expect changes of that kind to make any particular difference to performance.
I suppose that what you're looking for is instructions that are rarely executed (your profiler will tell you which they are), but are located in the close vicinity of instructions that are executed a lot (and hence will need to be in cache a lot, and will pull in the cache line around them). If you can cluster the commonly-executed code together, you'll get more out of your instruction cache.
Practically speaking though, this is not a very fruitful line of optimization. It's unlikely you'll make much difference. If nothing else, your commonly-executed code is probably quite small and adjacent already, it'll be some small number of tight loops somewhere (your profiler will tell you where). And cache lines at the lowest levels are typically small (of the order of 32 or 64 bytes), so you'd need some very fine re-arrangement of code. C++ puts a lot between you and the object code, that obstructs careful placement of instructions in memory.
perf can give you information on cache misses - most of those won't be for executable code, but on most systems it really doesn't matter which cache misses you're avoiding: if you can avoid some then you'll speed your code up. Perhaps not by a lot, unless it's a lot of misses, but some.
Anyway, what context did you hear this? The most common one I've heard it come up in, is the idea that function inlining is sometimes counter-productive, because sometimes the overhead of the code bloat is greater than the function call overhead avoided. I'm not sure, but profile-guided optimization might help with that, if your compiler supports it. A fairly plausible profile-guided optimization is to preferentially inline at call sites that are executed a larger number of times, leaving colder code smaller, with less overhead to load and fix up in the first place, and (hopefully) less disruptive to the instruction cache when it is pulled in. Somebody with far more knowledge of compilers than me, will have thought hard about whether that's a bueno profile-guided optimization, and therefore decided whether or not to implement it.
Unless you're going to hand-tune to the assembly level, to include locking specific lines of code in cache, you're not going to see a significant execution difference between one large function and multiple small functions. In both cases, you still have the same amount of work to perform and that's going to be your bottleneck.
Breaking things up into multiple smaller functions will, however, be easier to maintain and easier to read -- especially 6 months later when you've forgotten what you did in the first place.
Function size is unlikely to be a bottleneck in your application. What you do in the function is much more important that it's physical size. There are some things your compiler can do with small function that it cannot do with large functions (namely inlining), but usually this isn't a huge difference anyway.
You can profile the code to see where the real bottleneck is. I suspect calling a large function is not the problem.
You should, however, break up the function into smaller function for code readability reasons.
It's not really about function size, but about what you do in it. Depending on what you do, there is possibly some way to optimize it.