Consulting Services Group  
   
The National Center for Atmospheric Research NCAR



 
Consulting Services Projects 2008

CSG Projects

CSG Projects

4th Aug 2008

Project Short Description
90 I completed the implementation, documentation and test of the memory usage tool for blueice, bluevista and bluefire. It makes possible for users to analyze the memory footprint of a given interactive, OpenMP, MPI or hybrid program. Since the tool and the user-provided program are tightly tied, if the latter crash badly, also the former could crash. I'm investigating if it's possible to make the tool more robust against user-provided program crashes.
63 Some user (group) needs to archive large amount of data to MSS. As one MSS can be as big as 12G, while user files are usually small. So, archive these files can be problematic for user.
71 CSG-RAL Collaboration on Cluster Parallel Application
72 Support modules utility on coral
61 WRF compiles OK with high optimization level, but generate unexpected results.
89 This tool measures the bandwidth and latency of an MPI 2-nodes network. It produces raw text data which are then plotted/analyzed by a python script. The tool is able to spawn any even number of processes, in pair of send/receive tasks. At present all the sending processes are on the same node, whereas all the receiving are on the other. All of them concurrently ask for MPI resources. The core of the code is an MPI send/receive, which repeatedly sent/received packets from 16 bytes to 8Mbytes long. Each packet is sent/received N times (N ranges from 1 to 100k) per process. The tool include a shell/LSF script which schedules the jobs in the premium queue with exclusive use of the nodes, to avoid possible interference from other jobs. Results for blueice and bluevista are available. Results are ready for bluefire (re-run after bluefire was released to the users), but further investingation is needed in understaing the results (which has been the same as the preliminary runs)
48 Mentoring a SOARS student in numerically solving linear adevction equation on the surface of a sphere, and possibly solve Shallow Water Equation on the same geometry provided time permits. The strategy is to use Runge-Kutta-Discontinuous-Galerkin approach as done by Nair et. al. in cubed sphere, we will extend it to squared sphere map.
87 A brief introduction to the SVN version control system has been held during the HSS Monthly Staff Mtg (Wednesday 13th)
65 Emmons has special request to run Chemical Forcast for ARCTAS. CISL was original planned to give them dedicated nodes. But it conflicts with Convection projeect, and it needs 2 nodes for 5 hours then 8 nodes for 3 hours, which makes it harder to configure.
50 Support compilers such as PathScale and Intel on Linux platforms
64 Help Dr. Piri and sudent Saeed Ovaysi from University of Wyoming to port their code to our supers.
66 WRF is now NCAR's flagship model. CISL wants to know what is happening to WRF by having a WRF liaison.
62 As bluefire ATP shares disk with blueice, disk space is limited to ATP suite.
67 Vapor wants to extend it domain to WRF users. CSG will help.
70 User default dotfiles
68 CSG is documenting Frost for NCAR and teragrid user.
88 Because of the different platform, the blueice/bluevista tool cannot be used on lightning. Possible alternatives to analyze the memory footprint of a given program are under development. A preliminary version has been completed which works with MPI programs.
69 From Scicomp13 meeting, CSG learned there is software to monitoring LSF job on Power machines. CSG wants to test it on blueice.
49 CSG members are working to make FFTPACK friendlier to F90 compilers.
47 After the upgrade of bluevista to level 5300-07-01-0748 some users have reported a slowdown to the extent of about 10-12%. CSG member verified and confirmed this claim using FV CAM.
55 NCAR TeraGrid Team and User Support
84 Following some complains from users (Extraview Tickets), I'm investigating the issue. The "standard" distribution/namefile works fine, the namefile provided by users is under test.
74 Compiled and installed the MPI/IO enabled versions of HDF5 and netCDF4 in /contrib. Both required some hacking in their configure scripts, which I also sent upstream. I'm working with them to have a "clean" and portable solution, and possible made the tool available to users (after some tests).
58 DTC has asked CSG to give them suggestions to transfer data to NCAR from several sites outside (NCAR firewall).
60 When bluefire became operational, CSG has installed LLView on bluefire. With this tool, CSG can identify users job using processor numbers on each node.
73 Contacted the users, and adviced them. E-mail works better than ExtraView for this purpose
45 Early this year Nancy Collins and Jeff Anderson described their computational work flow under DART. It was clear that without being able to schedule multiple parallel jobs from a single submit they will be severely limited in their computational ability.
86 We are participating to the WAG CMS working group (WG) biweekly meetings. The evaluation phases have been completed: Requirements have been Prioritized, Use Cases have been Identified. The working group selected Drupal 6.3, and WEG set-up a test installation, to whom they gave me access. A pre-production implementation should be handed to us at the beginning of August. In the meanwhile I re-installed it on our own machine and I configured it for the hands-on demo. More info are available here: https://wiki.ucar.edu/display/wag/Drupal+6.2
76 The MPI implementation of the bzip2 compression algorithm has been compiled on lightning, bluevista and bluefire. Unfortunately once in a while it seems to fail (subsequent calls with exactly the same arguments usually succeeded). I contacted the author and I discussed with him the possible reason.
83 http://mpip.sourceforge.net/ is a nice MPI profiler. We are evaluating it for our supers. The mpiP requires a compatible GNU binutils installation (or libelf or libdwarf) for source lookup and demangling features. At present we do not have any of those, so the tool is cripple. Jeph installed binutils, but the tool still does not provide the source code demangle.
75 Introduced the user to the NCAR facilities (lightning, extraview, email, cryptocard, mailing lists, etc). Explained syntax and meaning of the LSF scripts and arguments (queue, threading, processor number). Helped with data-transfer, including a workaround during a DNS outage, to have her data transferred without waiting for the DNS being repaired. Suggested the use of mpibzip2 to quickly compress the data before the transfer (which is 1.5h long, when uncompressed). Solved her bug with cron, and helped with unattended job submissions.
40 LSF hybrid code setup
59 CSG is working on ASD projects. One for Nested Regional Climate Model (NRCM), and one for Mechanisms of Convetive_Wave Interations. CSG has contact users and willing to help with them about code porting, compiling issues, tuning, performance, etc.
79 Start of the project. Contacted Michael Shay's team. Helping with data transfer, compiling and linking libraries. I helped them with a problem regarding fftw. I introduced them to TotalView debugger.
46 CSG member started participating in Green Density project
85 Timing, customizablility and report will be enhancements. In order to do that, I installed, configured and evaluated the textest (python) framework on blueice. I also investigated a java solution, as well as the current script-based one. I chose the java-based and I'm currently developing it.
57 WRF developers reported that WRF produce incorrect results with higher optimization level "-O3 -qhot" on bluefire.
21 Next steps in providing training by SCD
7 Tar on the supers
18 Support system information scripts on all platforms
16 Liaison between CISL and CGD CCSM Software Engineering Group
20 Computer training for SOARS, RESESS, and SIParCS college students
10 Postprocessing tools
11 CISL Resource Accounting update
12 Prepare documentation for users.
15 Assistance with questions sent to Consulting Office
52 CAM runtime errors on bluevista
19 Test and support Totalview usage on all supers
54 User Documentation
2 Porting assistance for bluefire
17 Implement ExtraView trouble ticket system
1 Real time and other special computing projects
14 Facilitate transition to LSF.
22 Creation of new CSG web site and collaboration tool
0 Ranger port of CCSM4
23 Miscellaneous duties
13 Maintenance and documentation of software products
6 Run benchmark tests, assist with local software
3 Software library installation policy
9 Provide user documentation for LSF batch scheduling system.
8 Test benchmark suites for regression and ATPs.
4 CTSS Testing and HelpDesk
5 Professional development
33 upgrade nco to 3.9.2
32 experimenting with lsf resource requirements for large memory jobs
26 need to find way to change a batch model run from SPMD to MPMD
27 Sciparcs preparation
35 Coordinating trip to CCR, Madison for end of April/early May '07.
29 The DART/WRF code is experiencing a high level of memory allocation when running on blueice. This has caused node crashes (TT #27519). user not agreeable with recommendation to run on 12 PEs per node on blueice. This code does not exhibit same behavior on bluevista (but has encountered MPI_LAPI errors (TT #27685).
39 Developed LSF reporting tool utilizing LSF API to provide access to detailed job information and provided to SSG. This tool can access more detailed information to LSF job parameters than bjobs or bhist. 1/26/07 - Complete.
30 User encountering MPI_LAPI errors on Bluevista. First report errors of this type for any user since Jan '07.
25 Supporting this ASD project on Bluefire.
28 Blueiec code running with very low efficiency
24 Supporting this ASD project on Bluefire.
36 working towards conversion of mudpack from OpenMP parallelism to MPI paralleism for HAO user. mudpack has been a bottleneck in scaling TIEGCM since it is restricted to running on a single node under the OpenMP paradigm. Converting to MPI, if it can be done, will allow the application to continue to scale beyond 48 PEs.
38 converting emoslib for DSS from 32-bit bluesky environemnt to 64-bit bluevista and blueice environment
34 Held initial meeting w/CSS head to discuss CSG role in petascale initiatives.
37 Converting echam4 from 32-bit bluesky environment to 64-bit bluevista and blueice environment for user
44 We want Platform computing to script for us the integration layer for using LoadLeveller as backend of LSF.
43 The asphilli was captured as inefficient user from Tom Engel's monitoring programs.
56 Mathematica guide
53 Software licensing
51 Splitting large files for MSS

The Changing Face of Consulting Services

One of the goals of the CISL Consulting Services Group over the past year has been to design and coordinate the implementation of a greatly enhanced new customer support system. The key to making this project succeed is to expand the consulting team to include other groups within CISL who can provide additional expertise.

In today's challenging computational environment, CISL realizes that excellent customer support is essential to making progress in computational science and scientific research. That support begins the moment a scientist decides to use CISL resources, and the collaboration may continue over months or even years until scientific results are published. It is our goal to provide customers with in-depth support using the full capability of the divisional staff.

Users desiring help getting accounts set up and beginning to compute will benefit from assistance from our Enterprise Services Section staff. Our Customer Support Services staff serve as a first point of contact to help users become familiar with the facilities. Our Outreach Group provides detailed, award-winning technical documentation on all aspects of supercomputer and mass storage system usage, while CISL Consulting Services and other CISL programming staff provide assistance with porting, math libraries, parallelization, and debugging of complex atmospheric simulation models. Our Data Support Services staff stand ready to provide assistance with obtaining scientific data, and our Data Analysis Services and Visualization groups provide expertise at high capacity postprocessing.

Telephone customer support is now available around the clock, and consulting support is available by email to consult1@ucar.edu, by walk-in at the Mesa Laboratory room 39, or by appointment. We look forward to working with you to achieve your scientific goals!

Evolution of User Services

CISL's strategic plan for user services is to provide a balanced set of services to enable researchers to easily and effectively utilize community resources.