SCD News: You are hereGo to UCAR home pageGo to NCAR home pageGo to SCD home pageSend email to Digital Information GroupGo to SCD internal pagesSearch SCD siteGo to SCD News table of contentsGo to UCAR home pageGo to NCAR home page
Go to SCD News table of contentsGo to photo of the weekGo to features archiveGo to news archiveGo to tips archiveGo to updates archive

Go to UCAR home pageGo to NCAR home pageGo to SCD home pageGo to SCD News home pageGo to SCD internal pagesGo to SCD News home pageGo to Features archiveGo to News archiveGo to Tips archiveGo to Updates archive SCD News > Tips

To share or not to share
(that is the question on the IBM)

Tips for parallel programming on blackforest

Juli Rew

Juli Rew

 

 

by Juli Rew

If you are parallel programming on the IBM SP, a number of issues can arise pertaining to the use of multiple CPUs and multiple nodes. One of these is sharing: you may need to use multiple nodes that communicate across a network. Whether and what you choose to share with other users may affect the performance of your job.

If you are using OpenMP, you may have been advised that it is best not to share a node. If you're programming with MPI, you've probably been advised that you want to "share the network." What does sharing mean in these contexts? Are these guidelines always true? How do you specify them in your jobs on the SP? Does sharing one resource conflict with not sharing another?


Sharing nodes

On blackforest, each node (currently Winterhawk II nodes) has four processors, so an OpenMP job will be most efficient when it is sufficiently parallel to run on all four processors (i.e., by setting the OMP_NUM_THREADS environment variable to 4). However, if your job requires only two threads, the default is to have you share the node with another job that only requires one or two threads.

This may impact the performance of your code in cases where you need all the memory on the node or you are making heavy use of the memory subsystem. So even if you have less than four threads, you may still want the whole node to yourself. You will need to decide which type of node usage works best for your job.

Note that syntax is different for LoadLeveler (batch) jobs and interactive jobs. Batch jobs use the LoadLeveler keyword node_usage. Interactive jobs should use the MP_CPU_USE environment variable to indicate whether or not you wish to share the node (see Table 1).


Sharing the network

In large MPI jobs, tasks may need to communicate with each other across nodes. If messages cross a node boundary, they go via a communications switch (denoted by the abbreviation csss). If combined with node sharing, sharing the switch means that both the switch and the node can be shared among your and other users' tasks.

If combined with node_usage = not_shared, only your program's tasks have access to the node's CPU, but other programs' tasks can share the switch. In most cases, sharing is desirable, and is the default.1 If you don't share the network, but have allowed node_usage to be shared, the scheduler drains the work on the nodes because it thinks other users running on the nodes need to share the switch. This drain has the effect of blocking the queue and taking the nodes assigned to that queue effectively out of the system.

The default communication is IP over Ethernet, so it is usually beneficial to specify us (user space protocol), which is optimized for the SP switch.

Again, syntax is different for LoadLeveler and interactive jobs. Batch jobs use the LoadLeveler keyword network.MPI. Interactive jobs should use the MP_ADAPTER_USE environment variable to indicate whether or not you wish to share the network, as well as specifying EUIDEVICE=csss and EUILIB=us (see Table 1).


Sharing the memory

Wait! There's yet a third form of sharing that you can specify on the SP. If you are running an MPI job on one node, you can set an environment variable, MP_SHARED_MEMORY=yes, to prevent tasks from unnecessary communications with the switch, since they are not going off-node. You can use this variable effectively even when some of the tasks are going off-node, because it will reduce port congestion and allow for better performance of the inter-node communication.

Table 1 lists the LoadLeveler keywords and environment variables for sharing nodes, networks, and memory. Note that LoadLeveler keywords override environment variables in batch jobs.

Table 1. Sharing nodes and network
LoadLeveler keyword Environment variable
#@node_usage = shared|not_shared
(default is shared)
MP_CPU_USE unique|multiple (interactive only)
#@network.MPI = csss,shared,us Set these three together:
MP_EUIDEVICE=css0
MP_ADAPTER_USE=shared|dedicated (interactive only)
MP_EUILIB=us
MP_SHARED_MEMORY=yes


Using Totalview: Share the node and switch

The Totalview debugger is a useful tool for debugging parallel programs on the SP. Because you run totalview interactively, it requires that the MP_ADAPTER_USE environment variable be set to shared and MP_CPU_USE be set to multiple.


1Although shared is the default value for network.MPI, you should specify it explicitly, because under some combinations of options, it may be set to not_shared.

Back to contents