gprof
Intro
Example 1
Example 2
Easy Profiling
gprof is a profiler that compiles profile testing code
in with the -pg compiler switch. It is really good
at showing where you spend time in code, and changing bad habits
only when its necessary - that is in functions which are critical.
A couple of good programmers said I should use gprof. And since they may use this tool to look at your code too it is interesting. This is a performance enhancing tool but also tells you where algorithms are spending most of their time in evaluation, and I almost never guess correctly, especially for complicated algorithms.
Politics: Apple have not supported gprof. What can I say but why not? Cross platform development is valued by the Unix community, any other tool is additional effort and cost. Could you not have both?
On Linux box
make with -pg -g options, in my ide
Run the main program.
$ gprof -p -b ./main | head -n 15
In proj/nbody directory
$ make clean
$ mkerrors gprof
$ ./main benchmark=1 numPartices=500
$ gprof -p -b ./main | head -n 15
Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls s/call s/call name 80.25 3.82 3.82 124874750 0.00 0.00 particle::isIntersecting(particle const&) const 15.34 4.55 0.73 1000 0.00 0.00 integration<particlev0spaced, d2toindex>::bruteforce() 1.37 4.62 0.07 508548 0.00 0.00 boxcollision::up(particle&, double) const 1.05 4.67 0.05 510076 0.00 0.00 boxcollision::left(particle&, double) const 0.74 4.70 0.04 509355 0.00 0.00 boxcollision::right(particle&, double) const 0.63 4.73 0.03 529296 0.00 0.00 particle::step(double) 0.42 4.75 0.02 507776 0.00 0.00 boxcollision::down(particle&, double) const 0.21 4.76 0.01 510076 0.00 0.00 boxcollision::walls(particle&, double) const 0.00 4.76 0.00 5038 0.00 0.00 particle::collisionDynamics(particle&) 0.00 4.76 0.00 5038 0.00 0.00 std::sqrt(float)
Here is a 5 minute session where I got a 25% performance improvement. Since I was looking at starting timing experiments that require some computer time I decided to do this doc so I do not forget.
Firstly I disabled all debug code except the profiling code, recompiling all object files. The code is also running a Gui. The experiment is done once with 5000 points and the program is exited, but the times are not totally correct. I get the total time myself with calls before and after the experiment.
? seconds total time (I lost the time as the screen scrolled away) Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls ms/call ms/call name 8.78 2.68 2.68 839987 0.00 0.01 trianglespace::trianglespace(point2<double>, point2<double>, point2<double>) 7.52 4.97 2.29 2520236 0.00 0.00 d2halfspace<double>::set(double, double, double, double) 7.17 7.15 2.19 839987 0.00 0.03 d3tess::move_terminated(bool&, unsigned int&, unsigned int) 6.16 9.03 1.88 2520236 0.00 0.00 d2halfspace<double>::d2halfspace(point2<double> const&, point2<double> const&) 5.86 10.81 1.79 2519961 0.00 0.00 d2halfspace<double>::operator=(d2halfspace<double> const&) 5.27 12.42 1.61 4665618 0.00 0.00 point3<double>::operator point2<double> const() const 5.02 13.95 1.53 2519961 0.00 0.00 d2halfspace<double>::d2halfspace() 4.40 15.29 1.34 2624677 0.00 0.00 d2halfspace<double>::eval(double, double) const
I implemented a constructor in trianglespace without using constant references. So objects were being constructed instead of references being passed. Fixing this I moved on.
Total of 11.36 seconds in time Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls ms/call ms/call name 8.27 2.33 2.33 839987 0.00 0.01 trianglespace::trianglespace(point2<double> const&, point2<double> const&, point2<double> const&) 8.25 4.66 2.33 839987 0.00 0.03 d3tess::move_terminated(bool&, unsigned int&, unsigned int) 7.70 6.83 2.17 2520236 0.00 0.00 d2halfspace<double>::set(double, double, double, double) 7.65 8.98 2.16 2519961 0.00 0.00 d2halfspace<double>::operator=(d2halfspace<double> const&) 6.30 10.76 1.78 2520236 0.00 0.00 d2halfspace<double>::d2halfspace(point2<double> const&, point2<double> const&) 5.54 12.32 1.56 2624677 0.00 0.00 d2halfspace<double>::eval(double, double) const 5.50 13.87 1.55 2519961 0.00 0.00 d2halfspace<double>::d2halfspace()
The constructor trianglespace was still dominating, so I thought I had better have a look at the code. Here is what it looks like after I again edited the constructor. The assignment operator was costly so I killed it and replaced it with set, which did the same job as the previous half space constructor. I did have to make a new function set up. See the next timing results.
trianglespace::trianglespace(pt2c & p0, pt2c & p1, pt2c & p2)
{
pi[0] = p0;
pi[1] = p1;
pi[2] = p2;
hi[0].set(pi[2],pi[1]);
hi[1].set(pi[0],pi[2]);
hi[2].set(pi[1],pi[0]);
/*
hi[0] = d2halfspace<double>(pi[2],pi[1]);
hi[1] = d2halfspace<double>(pi[0],pi[2]);
hi[2] = d2halfspace<double>(pi[1],pi[0]);
*/
}
Total of 9.36 seconds in time. Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls ms/call ms/call name 10.17 2.52 2.52 2519961 0.00 0.00 d2halfspace<double>::set(point2<double> const&, point2<double> const&) 8.68 4.67 2.15 2624677 0.00 0.00 d2halfspace<double>::eval(double, double) const 8.28 6.72 2.05 839987 0.00 0.03 d3tess::move_terminated(bool&, unsigned int&, unsigned int) 7.55 8.59 1.87 839987 0.00 0.01 trianglespace::trianglespace(point2<double> const&, point2<double> const&, point2<double> const&) 6.66 10.24 1.65 4665618 0.00 0.00 point3<double>::operator point2<double> const() const 5.47 11.60 1.35 2519961 0.00 0.00 d2halfspace<double>::d2halfspace()
OK I thought that I have been doing a good job, so why not go all the way. Well I should have
done more profiling but I need that coffee, so what the heck delete all the object files
and recompile with the most aggressive optimization I could quickly find -O3.
Total of 2.04 seconds in time. Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls ms/call ms/call name 25.51 2.27 2.27 875340 0.00 0.01 d3tess::move_terminated(bool&, unsigned int&, unsigned int) 22.70 4.29 2.02 2716241 0.00 0.00 d2halfspace<double>::eval(double, double) const 16.52 5.76 1.47 2626020 0.00 0.00 d2halfspace<double>::set(point2<double> const&, point2<double> const&) 15.51 7.14 1.38 875340 0.00 0.00 trianglespace::trianglespace(point2<double> const&, point2<double> const&, point2<double> const&) 8.76 7.92 0.78 875340 0.00 0.00 trianglespace::eval(point2<double> const&) const 4.94 8.36 0.44 2626020 0.00 0.00 d2halfspace<double>::~d2halfspace() 1.80 8.52 0.16 29944 0.01 0.01 d3tessdraw::draw(unsigned int) 1.12 8.62 0.10 5000 0.02 1.70 d3tess::addpoint(unsigned int)
This is over a 400% improvement.