advanced  

SIParCS Student Intern Project Proposals, Summer 2009

The following is a list of proposed SIParCS projects for the summer of 2009:

Visualization Opportunities
SIParCS offers opportunity for student involvement in visualization through the VAPOR project. CISL has worked closely with its university partners and various NCAR divisions to create an integrated environment for data analysis called the Visualization and Analysis Research Platform for Ocean, Atmosphere, and Solar Research (VAPOR). In 2005, a series of development versions of the software was developed and put into early use. The VAPOR software have been used in the exploration of a variety of data sets, both by NCAR scientists and the broader UCAR university community. See www.vapor.ucar.edu/doc/ for further information.

(SIParCS-001) Development of prototype client application for Earth System Grid web services
(SIParCS-002) Adding a robust interactive interface to the NCAR Command Language

(SIParCS-003) Applications of the Discrete Wavelet Transform to the solutions of high resolution Computational Fluid Dynamics simulations
 
Earth System Modeling Infrastructure (ESMI) Opportunities
SIParCS offers several opportunities for students to get involved in production quality infrastructure development. In particular, Earth System Modeling Framework core development team, located at NCAR is building high-performance, flexible software infrastructure to increase ease of use, performance portability, interoperability, and reuse in climate, numerical weather prediction, data assimilation, and other Earth science applications. See www.esmf.ucar.edu/ for more details about ESMF.

(SIParCS-005) Model Metadata for Climate Applications
 
Computer Science Research and Development Opportunities
The focus of the Computer Science Section (CSS) is on the development, deployment and evaluation of new High Performance Computing (HPC) applications and technologies. Within CSS, the Research Systems Evaluation Team (ReSET) works to deploy in production NCAR’s TeraGrid Resources, including a 100 TB storage cluster and a 5.7 TFLOPS peak, 2048 processor IBM Blue Gene/L system.


(SIParCS-009) High Performance System Benchmarking Automation

(SIParCS-010) Science Gateway Web Development
(SIParCS-011) HOMME Test Suite Redevelopment
(SIParCS-012) Accelerated Computing Model Development
(SIParCS-013) Optimizing Parallel I/O performance on the Lustre file system
 
Applied Mathematics and Statistics Research Opportunities
The High Order Multiscale Modeling Environment (HOMME) is an experimental atmospheric general circulation model dynamical core that provides several interesting research projects involving numerical methods. Statistical projects take advantage of the large effort in climate modeling at NCAR and the presence of a statistical research group as part of our center.


 
These projects have been developed by prospective NCAR mentors in the Computational and Information Systems Laboratory. The category is organized thematically. Students interested in applying for the SIParCS program should review this list and select their first and second choices. These choices should be entered in the application form i.e. by entering the Project Proposal Catalog Numbers (e.g. SIParCS-001). Alternatively, students who do not find a good fit, may produce their own project proposal and submit a description of it with their application. The SIParCS office will attempt to match high quality student-proposed projects to the needs and priorities of the laboratory and will also attempt to identify a SIParCS mentor.

(SIParCS-001)
Development of prototype client application for Earth System Grid web services - Luca Cinquini
The Earth System Grid (ESG [1]) is developing the IT infrastructure for the data management, analysis and visualization of the next generation of Climate Models, including the distribution of all modeled data for the upcoming 5th Assessment Report of the International Panel on Climate Change (IPCC-AR5). The student working in this position will work closely with ESG developers at NCAR to develop a prototype client application that could be installed on a user personal desktop to interact with a large portfolio of ESG web services, including services for semantic discovery of data collections, for download of large number of files, and for requesting advanced data products, such as regridding, temporal and geographic subsets, special analysis (like monthly or zonal means, compound variables and so on) and visualizations. The client will be developed first as a command line utility, and then front-ended by a GUI built upon Java Swing or the Eclipse Rich Client Platform (RCP). If time allows it, the application will be evolved to support compound functionality, so that a user will be able to define and request products resulting from a complex scientific workflow. This work involves interaction with a large collaboration of software engineers and climate scientists, working at several national laboratories and university research centers distributed across the U.S., as well as international collaborators.

Skills/education: The successful candidate must have a strong knowledge and experience in Java programming. Optional skills and abilities include understanding of the client/server interaction paradigm, working knowledge of the Spring framework [2], knowledge of semantic technologies and tools like RDF, OWL, Sesame and Protégé, previous programming experience with Swing and/or the Eclipse RCP, and a scientific background in climate or atmospheric sciences.

[1] ESG: http://www.earthsystemgrid.org/
[2] Spring framework: http://www.springsource.org/

(SIParCS-002) Adding a robust interactive interface to the NCAR Command Language - Mary Haley
This project offers students the opportunity to enhance an interactive interface to the NCAR Command Language (NCL), a scripting language for the analysis and visualization of scientific data. 

This would be an ideal project for a software engineering student who wants to extend their skills with practical experience in software design and implementation on a software product that is used by thousands of users worldwide.  Some of the interactive capabilities that could be implemented include:

  • command line completion
  • command line history
  • better debugging capabilities
  • color-coding of functions, comments, keywords, etc.
  • variable lookup
  • online help and search interface

The student would need to be an independent worker who’s willing to delve into existing code to understand the current structure. He/she would work closely with a senior software engineer to understand the initial requirements and to learn the basic code architecture. The student would then write a design document to list the requirements, including contributions of their own. Then the student would follow through on implementation and extensive testing.

Additional information about NCL can be found at http://www.ncl.ucar.edu/

Skills/Education: The successful candidate must have strong knowledge and experience in C/C++ programming (preferably C), and optionally, skills with command line editing libraries like GNU Readline.  Although knowledge of NCL would be helpful, it is not a requirement.

(SIParCS-003) Applications of the Discrete Wavelet Transform to the solutions of high resolution Computational Fluid Dynamics simulations – John Clyne
The Data Analysis Services Group has been investigating Discrete Wavelet Transforms (DWT) and their application in a number of areas related to the analysis of gridded data sets generated by high-resolution numerical models. Examples include facilitating the identification and extraction of coherent structures found in turbulent flows, data reduction, and progressive data access. A number of potential research and software development areas exist such as: evaluating structure identification algorithms applied to wavelet compressed data; exploring separable and non-separable multi-dimensional transforms; testing and implementing efficient, parallel transformation algorithms; exploring error metrics for compressed data; and investigating coefficient threshold selection strategies. Depending on interests and level of experience, a student selected for this project would help explore one or more of these outstanding issues. The student would gain experience in Computational Fluid Dynamics (CFD), and the DWT and some of its applications in CFD data analysis and management.

Skills/education
: Computer science, physical science, or math major. Familiarity with C++ programming, and high-level analysis tools such as Matlab or IDL is helpful.

(SIParCS-005) Model Metadata for Climate Applications - Sylvia Murphy
The student in this position will work with scientists  and software engineers to develop metadata describing a climate model, and to embed that metadata in the model  itself so that it is self-describing.  The metadata will eventually be part of a workflow that includes configuring and running the model from a web portal, and cataloging the results of the run back to the portal along with the model description.  This activity ties together a broad spectrum of computing technologies: high computing, grid and web applications, and databases and semantic languages and formats.  The student will be working with several large projects that are based at NCAR and have national and international collaborators.

(SIParCS-009) High Performance System Benchmarking Automation - Michael Oberg
The Research Equipment System Evaluation Team (ReSET) frequently executes benchmark suites to characterize the performance and scalability of new computational nodes, networks, and storage systems. New platforms are benchmarked to identify the performance of processors and memory, new network technologies are evaluated within clusters and across the wide-area, and storage systems are benchmarked to identify the performance of disk arrays, storage controllers, interconnects, and file system software.

The student working on this project will work with the ReSET team to automate the execution, data collection, and data analysis of the computational platform or storage benchmark suites. For example, in the area of computational platform benchmarking, the individual benchmarks must usually be tuned and executed manually. The desired automation would standardize the collection of benchmark results to a centralized repository, extract the relevant experimental data, and then store the final results in a database that allows easy extraction, plotting, and comparison. This will allow ReSET to track system performance over time, identify problems with current deployed infrastructure, and easily compare emerging competitive hardware platforms.

Skills/education: Programming skills in a high-level scripting language, such as Perl or Python (Python preferred), and experience with using the Linux operating system, are required. Experience with a database system, such as PostgreSQL or MySQL, is beneficial.

(SIParCS-010) Science Gateway Web Development - Matthew Woitaszek
The Asteroseismic Modeling Portal (http://amp.ucar.edu) is a science gateway that allows scientists to design asteroseismology simulations, submit these simulations to supercomputing resources at NCAR and on the TeraGrid, and interactively examine the results. The student intern in this area will work with the existing design team to implement new gateway functionality, integrate a new scientific model into the workflow, and enhance the gateway’s support for a growing user community. Students interested in web site design, ranging from graphic design to web page construction, may focus on Python/Django-based web site construction. Students interested in supercomputing and Grid computing may focus on back-end system engineering, including workflow management and Globus-based remote job execution and data management.

Skills/education: Basic programming skills and familiarity with web site design or programming are required. Experience with Python is highly desired.

(SIParCS-011) HOMME Test Suite Redevelopment - John Dennis/Rory Kelly
HOMME is the High Order Multiscale Modeling Environment (HOMME), a global circulation model that runs on systems ranging from desktop workstations to supercomputers. The student intern working on this project will design and implement a new automated testing system based on current testing and Grid-computing frameworks, simplifying the verification of the model’s correct operation and publishing test results to a web site. Depending on student interest and time, the student may also work on the scalability and optimization of the model itself.

Skills/education: Basic programming skills and familiarity with a scripting language, such as Perl or Python, are required.

(SIParCS-012) Accelerated Computing Model Development - Jose Garcia/Rory Kelly
Capturing subgrid-scale cloud dynamics in a global atmospheric model has the potential to improve climate prediction but is computationally expensive.  The Cell Broadband Engine (Cell BE) and graphics processor units (GPUs) are two technologies used to accelerate scientific computation by offloading critical computational kernels to high-performance hardware. The objective of this project is to evaluate the use of accelerators for a 2D cloud-scale model embedded in the columns of a large scale 3D global atmosphere model.  The student will work on porting an existing cloud model to a GPU, and implementing an accelerated version of the model to run on NVIDIA graphics hardware.

Skills/education
: Demonstrated programming skills in C or C++ is required.  Deep knowledge of computer architectures with emphasis in memory management, as well as vectorization (focus on SIMD). Prior experience with acceleration technologies and programming languages, in particular CUDA, Brook+ or IBM Cell SDK is highly desired. Knowledge of Pthreads programming is also desirable.

(SIParCS-013) Optimizing Parallel I/O performance on the Lustre file system - John Dennis (NCAR)/Pat Kovatch (NICS)
Successful use of Petascale computing will involve the efficient use of disk I/O subsystems.  A collaboration between the Computer Science Section and several national laboratories has resulted in the development of the Parallel I/O (PIO) library. The PIO library simplifies the integration of parallel I/O into geoscience applications by providing an easy to use interface to I/O methods based on MPI-IO, netcdf, and pnetcdf. 

This project is a collaborative effort between the National Institute for Computational Science (NICS) and National Center for Atmosphere Research (NCAR).  There are two openings for this project. The students working on this project will be involved in testing and optimizing the PIO library and its implementation within geoscience applications on the Lustre file system in support of large scale computing campaigns at NICS.  This project will provide experience in working with large-scale parallel file systems and parallel library development. 

Skills/education:  Demonstrated programming skills in one or more compiled language, C,C++, FORTRAN.  Experience with distributed memory parallel programming with MPI.  Interest in either the application or operating systems aspects of high performance computing.  Experience with scripting languages such as csh or Perl.

(SIParCS-014) Semi-implicit Semi-Lagrangian Time-stepping for Climate - Amik St-Cyr
This project consists of including recent developments on a fully scalable semi-implicit semi-Lagrangian (SI-SL) scheme for high-order methods into the High Order Multiscale Modeling Environment (HOMME) model. The first step consists of merging the existing time-stepping procedure from an AMR branch of the HOMME model. The second part would concern the study of various approaches for SEM-hybrid and discontinuous Galerkin-type discretizations of the shallow water equations on the sphere.

(SIParCS-015) Multiple testing in spatial fields - Steve Sain
Many analyses in the geosciences involve identifying regions of interest in collections of spatial fields. For example, one might be interested in identifying areas at risk for extreme climate change on the basis of an ensemble of climate model output. From a statistical point-of-view, this problem can be framed as a multiple comparison problem in spatially-correlated random fields. This project will examine the current state-of-the-art methodologies for multiple comparisons, such as the commonly used procedure based on controlling the false discovery rate, and their use for examining differences in spatial fields. Using the output from the North American Regional Climate Change Assessment Program (NARCCAP), we will apply this methodology to 1) compare regional models and identify spatial regions where the models agree/disagree, and 2) compare current and future runs to identify spatial regions suggestive of local climate change. It is expected that this project will yield additional functionality for the fields, software package for the R statistical programming environment as well as publishable results.

Skills/education: Students should have a background in applied statistics and familiarity with R or Matlab.

(SIParCS-016) Accuracy in regridding spatial fields - Doug Nychka
The North American Regional Climate Change Assessment Program (NARCCAP) is to date the largest series of regional climate predictions, involving over 50 scientists and costing several million dollars. The value of this effort depends on the comparison among different climate models and the comparison, in turn, hinges on how the gridded model output has been transformed to a common format. This project is important to give some quantification into the reliability and perhaps the limitations of the current NARCCAP model data. This project also provides an introduction into the statistical methods for spatial fields and mathematical theory for estimating functions from discrete data.

Often geophysical data, such as surface temperatures are in the form of a regular grid, equally spaced values in space usually in a rectangular pattern.  For example, nearly all of the numerical models used to project future climate changes have as their output regular spatial and temporal grids of key physical variables that characterize the Earth's climate system. An insidious problem in transforming geophysical fields is the regridding of a surface on one set of grid points to another or the conversion from one resolution to another. This is usually done to simplify analysis among several different grids or to change from one set of coordinates to another. The error in the regridding process is often not reported and many regridding algorithms are not well described. This project proposes to use statistical methods of interpolation to determine the accuracy of regridding and to study the uncertainty when a field reported on one set of coordinates is extrapolated to another. The NARCCAP output fields will be used for the test suite and it is likely that this project will yield publishable results. The project will use documented and well developed tools in the R statistical environment and some UNIX shell programming.

Skills/education: Students should have a background in multivariate statistics, familiarity with R or Matlab, and a strong background in linear algebra.

(SIParCS-018) Locally Conservative High-Order Methods - Ram Nair
High-order local methods are gaining prominence in computational sciences. A major reason for this is the local nature of the resulting algorithm, which is highly desirable for the present-day parallel computers. Some of these methods also have the computationally attractive features such as the spectral convergence and conservation properties. The High-Order Method Modeling Environment (HOMME) developed at NCAR is framework for developing numerical models in the cubed-sphere geometry for atmospheric modeling applications. HOMME currently employs high-order Galerkin-based approaches such as the spectral element (SE) and discontinuous Galerkin (DG) methods.

The spectral finite-difference (SFD) and spectral finite-volume are relatively new local high-order methods which are not based on Galerkin approaches.  The SFD method is particularly interesting, it is a locally conservative method and shares many computational features of SE and DG methods. The solutions of SFD methods are defined at the nodes of a Gaussian quadrature rule and fluxes are evaluated at the Gauss-Lobatto points.  It will be interesting to test an SFD based transport scheme in HOMME framework.  In the first part of the project, advection scheme based on SFD will be developed in a simple Cartesian domain, later this will be extended to the HOMME framework for testing the parallel efficiency (scalability) and accuracy.

(SIParCS-019) Designing limiters for a new cubed-sphere multi-tracer finite-volume transport/advection scheme - Peter Lauritzen
Global weather and climate models have traditionally been defined on regular latitude-longitude grids, however, atmospheric solvers designed on such grids are likely not to scale well on massively parallel supercomputers due to the inherently non-local filtering in the polar regions that is needed to stabilize the model dynamics.

Other grids, such as the cubed-sphere and icosahedral grids, that are more isotropic (and therefore do not require non-local filtering for stability) are being considered for next generation modeling systems. Changing the underlying discretization grid has triggered renewed interest in basic numerical algorithms. In this project one important aspect is addressed: Performing transport which is computationally cheap for a large number of tracers, does not generate spurious maxima or minima in the transported fields (non-oscillatory) and is competitive with respect to accuracy and efficiency compared to existing state-of-the-art schemes.

A new cubed-sphere finite-volume multi-tracer transport scheme has recently been developed. This particular version of the scheme is optimized for the cubed-sphere spherical geometry. It is fully two-dimensional and is based on the incremental remapping idea.  Hence fully two-dimensional sub-grid-scale reconstructions are needed during the remap step of the algorithm. This project will focus on the sub-grid-scale reconstruction method and, in particular, on how to filter/limit the reconstructions to render them non-oscillatory without unnecessarily degrading the accuracy of the scheme.

Skills/education: MS or equivalent in Atmospheric/Oceanographic Sciences and/or Applied Math with expertise in finite-volume methods. Fortran 90 scientific programming background is highly desirable.