Monthly Archives: May 2014

Commentary: GPU vs. CPU comparison done right

I have in earlier posts complained about how some researchers, through unfair comparisons, make GPU computing look more attractive than it really is.

It is thus only appropriate to also commend those who do it right. As part of some ongoing research, I came across a paper published in Journal of Chemical Information and Modeling:

Anatomy of High-Performance 2D Similarity Calculations

Similarity measures based on the comparison of dense bit vectors of two-dimensional chemical features are a dominant method in chemical informatics. For large-scale problems, including compound selection and machine learning, computing the intersection between two dense bit vectors is the overwhelming bottleneck. We describe efficient implementations of this primitive as well as example applications using features of modern CPUs that allow 20–40× performance increases relative to typical code. Specifically, we describe fast methods for population count on modern x86 processors and cache-efficient matrix traversal and leader clustering algorithms that alleviate memory bandwidth bottlenecks in similarity matrix construction and clustering. The speed of our 2D comparison primitives is within a small factor of that obtained on GPUs and does not require specialized hardware.

Briefly, the authors compare the speed of with which fingerprint-based chemical similarity searches can be performed on CPUs and GPUs. In contrast to so many others, the authors went to great lengths to give a fair picture of the relative performance:

  • Instead of using multiple very expensive Nvidia Tesla boards, they used an Nvidia GTX 480. This card cost roughly $500 when released and was the fastest gaming card available at the time.
  • For comparison, they used an Intel i7-920. This CPU cost approximately $300 when released and was a high-end consumer product.
  • They compared the GPU implementation of the algorithm to a highly optimized CPU implementation. The CPU implementation makes use of SSE4.2 instructions available on modern Intel CPUs and is multi-threaded to utilize all CPU cores.

The end result was that the GPU implementation gives a respectable but non-exceptional 5x speed-up over a pure CPU implementation. If one further takes into account that the GPU is probably 40% of the cost of the whole computer, this reduces to a 3x improvement in price-performance ratio.

The authors conclude:

In summary: GPU coding requires one to think of the hardware, but high-speed CPU programming is the same; spending time optimizing CPU code at the same level of architectural complexity that would be used on the GPU often allows one to do quite well.

I can only agree wholeheartedly.