|
|
Highlights and issues at CUG '98
Questions on tuning and configuration, IRIX, NQS . . .
![]() James Patton Jones, MRJ Technology Solutions |
This year the annual CUG meeting was held in Stuttgart, Germany, a clean and pleasant city with no shortage of rain. In true CUG fashion, interesting papers were presented on useful topics, and many fruitful hallway discussions took place. However, several issues emerged that I'd like to highlight, in hopes that SGI and CUG will quickly take action to resolve them.
O2K memory managementIn the area of operating system tuning and configuration, there was much discussion (and confusion) on how to manage the memory subsystem of the Origin2000. There seem to be two primary problems: (1) many folks with solid experience with Cray systems are having trouble mapping that experience onto the Origin2000; and (2) SGI's recommended memory configuration and tuning guidelines were based heavily on their workstation product line.There are two approaches to running Origin2000 systems: like a large workstation and like a physical memory Cray. (There is actually a third, a hybrid of the two mentioned.) Depending on which approach you choose, the way you handle the memory subsystem is quite different. Do you actively use virtual memory or do you limit application size to the physical memory? Do you configure the system to swap or avoid swapping? Do you oversubscribe memory or not? What exactly do "virtual memory" and "resident set size" (and so forth) mean on the Origin2000? Many similar questions were also raised. While some sites have learned the answers to these questions by trial and error, it has taken valuable time. Many Origin2000 sites (and undoubtly new Origin customers) would like to see SGI address this problem by publishing a set of new memory management guidelines that recognizes these differing approaches and spells out the benefits and drawbacks (!) of each. The guidelines could include a memory-terms definition section, since the vocabulary for the Origin is different than for a Cray system.
Cellular IRIX, NQE, and other issuesAnother observation about the Origin2000 was in regard to the SGI presentation on Cellular Irix. I was surprised that there were so few comments and questions. Why? Did folks already completly understand how it worked, or is there a general lack of interest? Given the magnatude of the impact this holds for future operating systems of SGI/Cray products, if I were SGI, I'd want answers to these questions.One announcement that took quite a few people by surprise was SGI stating that they were looking into discontinuing development of NQE. Citing the historical inablity to provide support for heterogenous systems, and an ever-increasing list of requested features, SGI proposed stablizing NQE at version 3.2, and partnering with another company to provide batch queuing and resource management software for future SGI systems. The process of seeking out a partner is just getting started. Needless to say, this news spurred increased interest in the Portable Batch System (PBS) from NASA Ames (the same folks who developed NQS). With the recent announcement of international availablity, a no-fee site licence, and a source code distribution, many sites have been downloading PBS from http://pbs.mrj.com. On a CUG-specific note, one criticism was voiced several times: there needs to be better communication between Special Interest Group and Focus Group chairs. The largest complaint was in the area of selecting conference papers and presentations. Overall, CUG 1998 was a beneficial conference. One of the most important benefits of CUG is the collective feedback to SGI on how to improve their products. As such, I hope they will be open to customer input.
|