Perfilado de funciones específicas C++

I have looked into gprof. But dont quite understand how to acheive the following:

I have written a clustering procedure. In each iteration 4 functions are called repetitively. There are about a 100000 iterations to be done. I want to find out how much time was spent in each function.
These functions might call other sub functions and may involve data structures like hashmaps, maps etc. But I dont care about these sub functions. I just want to know how much total time was spent in all those parent functions over all the iterations. This will help me optimize my program better.

The problem with gprof is that, it analyzes every function. So even the functions of the stl datastructures are taken in to account.

Currently I am using clock_gettime. For each function, I output the time taken for each iteration. Then I manipulate this outputfile. For this I have to type a lot of profiling code. The profiling code makes my code look very complex and I want to avoid it. How is this done in industries?

¿Hay alguna forma más fácil de hacer esto?

If you have any other cleaner ways, please let me know

preguntado el 24 de agosto de 12 a las 07:08

By using Intel VTune Amplifier -

3 Respuestas

If I understand correctly, you're interested in how much time was spent in the four target functions you're interested in, but not any of the child functions called by those functions.

This information is provided in gprof's "flat" profile under "self seconds". Alternatively, if you're looking at the call graph, this timing is in the "self" column.

Respondido 24 ago 12, 08:08

Yo echaría un vistazo a Telemetría. It's mainly targeted at game developers which wants to compare per frame data, but it seems to fit your requirements very well.

Respondido 24 ago 12, 08:08

You want the self-time of those 4 functions, so you can optimize them specifically.

gprof will show you that, as a % of total time. Suppose it is 10%. If so, even if you were able to optimize it to 0%, you would get a speedup factor of 100/90 = 1.11, or a speedup of 11%. If it took 100 seconds, and that was too slow, chances are 90 seconds is also too slow.

Sin embargo, a pesar de la incluso (self plus callees) time taken by those functions is likely to be a much larger %, 80%, to pick a number. If so, you could optimize it much more by having it make fewer calls to those callees. Alternatively, you could find that the callees are spending a big % doing things that you don't strictly need done, such as testing their arguments for generality's sake, in which case you could replace them with ad-hoc routines.

In fact, strictly speaking, there is no such thing as self time. Even the simplest instruction where the program counter is found is actually a call to a microcode subroutine.

Here is some discussion of the issues and a constructive recommendation.

contestado el 23 de mayo de 17 a las 11:05

No es la respuesta que estás buscando? Examinar otras preguntas etiquetadas or haz tu propia pregunta.