WRF Performance Analysis and Optimization

Last updated: Apr 6th 2009

Introduction

We ran the WRF Conus Benchmark using different compile-time and runtime options for bluefire.

Conus 12km (low resolution)
Conus 2.5km (high resolution)

Compile-time Options

WRF V3.0.1.1 was compiled using xlf V11.1, with the default WRF options for AIX (including dmpar, i.e. MPI-only) plus the following:

  1. basic) nothing, just the default
  2. auto) the default plus -qarch=auto -qtune=auto -qcache=auto
  3. qhot) the default plus -qarch=auto -qtune=auto -qcache=auto plus -qthot

Note that -qthot is not recommended from WRF developers for WRF V3.0.1.1 because of reported problems with model results under certain configurations. The README says "Use at your own risk", but we thought it was worth investigating its possible performance benefits.

RunTime Options

We investigated:

  1. large (namely, 64KB) page sizes, versus normal sized (namely, 4KB) page size
  2. processor binding
  3. SMT

Processor Binding is shown with a letter b in the legend. Largepages is shown with a l, and SMT with a s. To keep the legend aligned, we show a letter x when the relevan option is not in use.


Conus 2.5km

The results for the high resolution case are below. It is clear why processor binding is mandatory: the performance increase is impressive.

Use of largepages is also helping performance a little, but the careful use of compile-time options and SMT are more important. When it's possible, the use of -qhot may help a lot (the caveat is that -qhot might alter the program semantic, and thus produce incorrect results).


Conus 12km

The results for the low resolution case are below. They confirm what is shown for the high resolution case, but the difference between different cases is smaller.

The reason for such a smaller difference among different options is due to the smaller size of the problem: any single node has less data to crunch and more data to transfer (this case is more communication-bond than the previous). For the same reason its scaling is also not as good as the higher resolution case.