The Vector Multiprocessor
by P. N. Swarztrauber,
International
Journal of High Speed Computing,
11(2000).
Abstract
The Vector Multiprocessor brings to the multiprocessor what
vectorization brought to the single processor. In addition to the
usual complement of logic and arithmetic units, each processor
contains a programmable communication unit with registers that
communicate directly with comparable registers in neighboring
processors via an n-dimensional interconnection network.
Interprocessor communication tasks are performed to and from these
registers in the same way that computational tasks are performed on a
vector uniprocessor. Communication is shown to be optimal for a large
class of communication tasks. Elements are transmitted, in parallel,
to their destination processors at an average rate of one per
communication cycle. This result, called O(1) access, is used to
develop a balanced communication system where local and global access
are comparable. It is also used to support the "vector parallel
paradigm" where all arrays are uniformly distributed and the user
interface "looks" like a vector uniprocessor interface. Both coarse-
and fine-grain performance models are provided, which demonstrate the
unexpected result that communication is asymptotically negligible
compared to computational time. Finally, three performance models are
presented for the spherical harmonic transform, which is the most
communication-intensive part of climate model dynamics.
Mail comments to Paul Swarztrauber.