Valid
	XHTML 1.1! Valid CSS!
Created 2009-08-01   Modified 2009-05-29
Chelton Evans

proj gprof home

Intro
Example 1
Example 2
Easy Profiling

Intro

gprof is a profiler that compiles profile testing code in with the -pg compiler switch. It is really good at showing where you spend time in code, and changing bad habits only when its necessary - that is in functions which are critical.

A couple of good programmers said I should use gprof. And since they may use this tool to look at your code too it is interesting. This is a performance enhancing tool but also tells you where algorithms are spending most of their time in evaluation, and I almost never guess correctly, especially for complicated algorithms.

Politics: Apple have not supported gprof. What can I say but why not? Cross platform development is valued by the Unix community, any other tool is additional effort and cost. Could you not have both?

Easy Profiling

On Linux box

make with -pg -g options, in my ide
Run the main program.
$ gprof -p -b ./main | head -n 15

Example 1

In proj/nbody directory

$ make clean
$ mkerrors gprof
$ ./main benchmark=1 numPartices=500
$ gprof -p -b ./main | head -n 15

Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
 80.25      3.82     3.82 124874750     0.00     0.00  particle::isIntersecting(particle const&) const
 15.34      4.55     0.73     1000     0.00     0.00  integration<particlev0spaced, d2toindex>::bruteforce()
  1.37      4.62     0.07   508548     0.00     0.00  boxcollision::up(particle&, double) const
  1.05      4.67     0.05   510076     0.00     0.00  boxcollision::left(particle&, double) const
  0.74      4.70     0.04   509355     0.00     0.00  boxcollision::right(particle&, double) const
  0.63      4.73     0.03   529296     0.00     0.00  particle::step(double)
  0.42      4.75     0.02   507776     0.00     0.00  boxcollision::down(particle&, double) const
  0.21      4.76     0.01   510076     0.00     0.00  boxcollision::walls(particle&, double) const
  0.00      4.76     0.00     5038     0.00     0.00  particle::collisionDynamics(particle&)
  0.00      4.76     0.00     5038     0.00     0.00  std::sqrt(float)

Example 2

Here is a 5 minute session where I got a 25% performance improvement. Since I was looking at starting timing experiments that require some computer time I decided to do this doc so I do not forget.

Firstly I disabled all debug code except the profiling code, recompiling all object files. The code is also running a Gui. The experiment is done once with 5000 points and the program is exited, but the times are not totally correct. I get the total time myself with calls before and after the experiment.

? seconds total time (I lost the time as the screen scrolled away)
Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
  8.78      2.68     2.68   839987     0.00     0.01  trianglespace::trianglespace(point2<double>, point2<double>, point2<double>)
  7.52      4.97     2.29  2520236     0.00     0.00  d2halfspace<double>::set(double, double, double, double)
  7.17      7.15     2.19   839987     0.00     0.03  d3tess::move_terminated(bool&, unsigned int&, unsigned int)
  6.16      9.03     1.88  2520236     0.00     0.00  d2halfspace<double>::d2halfspace(point2<double> const&, point2<double> const&)
  5.86     10.81     1.79  2519961     0.00     0.00  d2halfspace<double>::operator=(d2halfspace<double> const&)
  5.27     12.42     1.61  4665618     0.00     0.00  point3<double>::operator point2<double> const() const
  5.02     13.95     1.53  2519961     0.00     0.00  d2halfspace<double>::d2halfspace()
  4.40     15.29     1.34  2624677     0.00     0.00  d2halfspace<double>::eval(double, double) const

I implemented a constructor in trianglespace without using constant references. So objects were being constructed instead of references being passed. Fixing this I moved on.

Total of 11.36 seconds in time
Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
  8.27      2.33     2.33   839987     0.00     0.01  trianglespace::trianglespace(point2<double> const&, point2<double> const&, point2<double> const&)
  8.25      4.66     2.33   839987     0.00     0.03  d3tess::move_terminated(bool&, unsigned int&, unsigned int)
  7.70      6.83     2.17  2520236     0.00     0.00  d2halfspace<double>::set(double, double, double, double)
  7.65      8.98     2.16  2519961     0.00     0.00  d2halfspace<double>::operator=(d2halfspace<double> const&)
  6.30     10.76     1.78  2520236     0.00     0.00  d2halfspace<double>::d2halfspace(point2<double> const&, point2<double> const&)
  5.54     12.32     1.56  2624677     0.00     0.00  d2halfspace<double>::eval(double, double) const
  5.50     13.87     1.55  2519961     0.00     0.00  d2halfspace<double>::d2halfspace()

The constructor trianglespace was still dominating, so I thought I had better have a look at the code. Here is what it looks like after I again edited the constructor. The assignment operator was costly so I killed it and replaced it with set, which did the same job as the previous half space constructor. I did have to make a new function set up. See the next timing results.

trianglespace::trianglespace(pt2c & p0, pt2c & p1, pt2c & p2)
{
  pi[0] = p0;
  pi[1] = p1;
  pi[2] = p2;

  hi[0].set(pi[2],pi[1]);
  hi[1].set(pi[0],pi[2]);
  hi[2].set(pi[1],pi[0]);

/*
  hi[0] = d2halfspace<double>(pi[2],pi[1]);
  hi[1] = d2halfspace<double>(pi[0],pi[2]);
  hi[2] = d2halfspace<double>(pi[1],pi[0]);
*/
}
Total of 9.36 seconds in time.

Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 10.17      2.52     2.52  2519961     0.00     0.00  d2halfspace<double>::set(point2<double> const&, point2<double> const&)
  8.68      4.67     2.15  2624677     0.00     0.00  d2halfspace<double>::eval(double, double) const
  8.28      6.72     2.05   839987     0.00     0.03  d3tess::move_terminated(bool&, unsigned int&, unsigned int)
  7.55      8.59     1.87   839987     0.00     0.01  trianglespace::trianglespace(point2<double> const&, point2<double> const&, point2<double> const&)
  6.66     10.24     1.65  4665618     0.00     0.00  point3<double>::operator point2<double> const() const
  5.47     11.60     1.35  2519961     0.00     0.00  d2halfspace<double>::d2halfspace()

OK I thought that I have been doing a good job, so why not go all the way. Well I should have done more profiling but I need that coffee, so what the heck delete all the object files and recompile with the most aggressive optimization I could quickly find -O3.

Total of 2.04 seconds in time.

Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 25.51      2.27     2.27   875340     0.00     0.01  d3tess::move_terminated(bool&, unsigned int&, unsigned int)
 22.70      4.29     2.02  2716241     0.00     0.00  d2halfspace<double>::eval(double, double) const
 16.52      5.76     1.47  2626020     0.00     0.00  d2halfspace<double>::set(point2<double> const&, point2<double> const&)
 15.51      7.14     1.38   875340     0.00     0.00  trianglespace::trianglespace(point2<double> const&, point2<double> const&, point2<double> const&)
  8.76      7.92     0.78   875340     0.00     0.00  trianglespace::eval(point2<double> const&) const
  4.94      8.36     0.44  2626020     0.00     0.00  d2halfspace<double>::~d2halfspace()
  1.80      8.52     0.16    29944     0.01     0.01  d3tessdraw::draw(unsigned int)
  1.12      8.62     0.10     5000     0.02     1.70  d3tess::addpoint(unsigned int)

This is over a 400% improvement.