Converting Cray-style datasets for use on non-Cray computers

The NCARU library can be used to read and write Cray-style datasets on non-Cray computers. It performs three functions:

NCARU is no longer provided for all vendor systems at NCAR. It may be downloaded from http://www.dss.ucar.edu/libraries/. The library is encapsulated into a set of standard I/O-API calls, so the user interface consists of subroutine calls to reads, writes, opens, and closes.

The library also includes a number of numeric conversion routines to convert both from IEEE format to Cray format, and from Cray format to IEEE format. Both 32-bit and 64-bit formats are also available.

For historical purposes, the "packing" routines are also located in this library. This set of routines allows you to pack two or four floating-point words into a single 64-bit space to reduce the size of a data file. You should be aware that packing data causes loss of significant precision, and the precision lost is directly related to the range of the data being packed.

By far the most common usage of the NCARU library is to read Cray datasets on non-Cray machines. The subroutines discussed here are the most-used subroutines in the library. These are:

You will need to reference the man pages for these routines. This document describes the most common usage applications of these routines. Make sure that your MANPATH environment variable is set and contains the path "/usr/local/man".

There are additional routines in the library you might find helpful, to see what routines are in the library, type man ncaru . This man page outlines all the routines in the NCARU library.

Understanding binary file structure and data formats

There are two aspects to a binary file created via Fortran. One is the structure of the file, the other is the format of the numeric data within that file structure.

File structure defines records within the binary file. Records are defined by the Fortran standard, especially to support the Fortran BACKSPACE statement. Most vendors have implemented a very simple file structure for a Fortran unformatted (binary) file. This is a 4-byte control word, followed by data bytes, followed by the same 4-byte control word. The control word is merely the integer number of bytes in the data part of the record.

Cray created a different strategy. A Cray-blocked dataset (or COS-blocked dataset) contains a number of 8-byte control words. After every 512 words there is a "Block-Control" word. Additionally there are "End-of-Record" control words, at least one "End-of-File" control word, and a final "End-of-Dataset" control word. Further, a Cray unformatted binary file can contain multiple "files," hence the term Cray dataset.

The other difference between Cray binary files and other binary files is the format of the data within the records. The following discussion refers to the differences between Cray binary data and IEEE binary data for single-precision Fortran. You've probably heard the term "IEEE format" used to describe the format of binary data on numerous vendors' machines. This refers to a specific format for both integer and floating-point binary data. The IEEE format defines the number and location of bits for the mantissa and exponent for floating-point data, as well as the number of bits for integer data.

Cray defined a different format for both floating-point and integer data. For example, although a Cray integer occupies 8 bytes of memory, only 46 bits are used to describe the value. Cray floating-point words also use 8 bytes of memory (single precision) and both the number of bits, and the location of the mantissa and the exponent are different from IEEE. The NCARU library understands these differences and can be used to automatically convert between these two formats.

NCARU's limitations and features

Before making data conversions, you should understand certain key features of the NCARU library.
  1. The CRAYREAD and CRAYWRITE routines cannot handle mixed-type records. Fortran allows you to write a record that contains different types of data. For example:
    	REAL 	A(100)
    	INTEGER I(50)
    
    	WRITE(10) A, I
    

    The NCARU routines cannot directly handle this type of record. The problem is that the NCARU library doesn't have the same information about the variables that the Fortran compiler would have. You can use the library to read this record correctly; however it is an involved process. You can use the section Strategies for mixed-type Cray records to learn how to handle this type of record.

  2. Since you are adding or replacing Fortran I/O statements with local library calls, you might be introducing portability issues.

    The NCARU library is installed on all non-Cray computers operated by SCD.

  3. CRAYREAD is record oriented, that is, a CRAYREAD call always moves the record pointer to the next record in the dataset. If you were using the partial record read feature of the Cray extension BUFFER IN, you will have to convert your I/O strategy to use full record processing.

  4. The library includes both Fortran and C bindings. All example code snippets will show Fortran usage, please refer to the appropriate man page for C language details.

  5. Although you are "twiddling bits" during numeric conversions, the NCARU library is quite fast. All the routines in the library have been optimized.

  6. The NCARU library cannot handle Cray DOUBLE PRECISION format. This is a 128-bit floating-point number.

  7. For cases where numeric conversion would result in either underflow or overflow, the numeric conversion routines reset the converted value to 0 (zero) or the maximum representative value respectively.

Using CRAYOPEN

You use the CRAYOPEN routine to open either an existing Cray dataset, or to create one for writing. Here is an example:
	EXTERNAL          	CRAYOPEN
	INTEGER          	CRAYOPEN
	INTEGER			IFC, IFLAG, MODE
	CHARACTER*80		FLNM

	FLNM 	= "/ptmp/data"
	IFLAG 	= 0
	MODE 	= 0

	IFC = CRAYOPEN(FLNM, IFLAG, MODE)

The first argument "FLNM" is the path to the file. Specify a character string or a character variable. Fortran users do not have to add a zero byte to the last character of the pathname. The "IFLAG" argument defines the type of open, 0 (zero) tells NCARU that you will be reading an existing Cray-blocked dataset. The last argument, "MODE", is used only if you want to create a Cray dataset; this argument specifies the permissions on the newly created file.

CRAYOPEN returns an integer that must be used for all future NCARU calls. In fact, note that CRAYOPEN was declared as an INTEGER in the code above. You can think of this as a Fortran UNIT number to the particular file; however, it is not a Fortran UNIT number. Do not attempt to use this value as a Fortran UNIT number.

During a CRAYOPEN, a library buffer is allocated to help increase I/O performance. By default, a CRAYOPEN will allocate a 1 MB (megabyte) library buffer. You can control the size of this buffer with a call to the CRAYBLOCKS routine; see the man page for details. This buffer is released during a CRAYCLOSE.

Using CRAYREAD

The CRAYREAD routine reads a record from the Cray dataset. This routine automatically converts the record data from Cray format to IEEE, providing that the correct conversion flag is specified.

You can have CRAYREAD convert to 32-bit IEEE format, 64-bit IEEE format, and no conversion. For reading CHARACTER data, make sure you specify "no conversion", or 0 (zero) for the conversion flag.

Here is an example:

	EXTERNAL          	CRAYREAD
	INTEGER          	CRAYREAD
	INTEGER			NWDS
	REAL			A(100)
	INTEGER			WDS

	NWDS = 100

	WDS = CRAYREAD(IFC, A, NWDS, 1)

"IFC" is the handle previously obtained from a CRAYOPEN. "A" is the destination for the record. The third argument specifies the number of words you want transferred. The last argument specifies the conversion flag; in this case we want the Cray floating-point words converted into 32-bit IEEE floating-point format. The return value, WDS, is the number of words transferred, or a negative value indicating either a EOF (End-Of-File), EOD (End-Of-Dataset), or an error. See the man page for a detailed list of possible error conditions.

CRAYREAD is record oriented, that is, after this call, the record pointer is positioned at the next record in the file. CRAYREAD always transfers up to NWDS, depending on the actual record length.

This means that if you are unsure of the number of words in the record, you can specify a value for NWDS larger than you suspect are actually in the record, like this:

	EXTERNAL                CRAYREAD
	INTEGER          	CRAYREAD
	REAL			A(10000)
	INTEGER			WDS

	WDS = CRAYREAD(IFC, A, 10000, 1)

	PRINT*, "record contains: ", WDS, " words"

You can also use CRAYREAD to skip a number of records by asking it to read zero words. This will cause the record pointer to move to the next record. Note that I/O may still be performed on the file depending on the size of the records and the size of the I/O buffer. Suppose you want to skip the first 100 records in the dataset. Something like this would work:

	EXTERNAL                CRAYREAD
	INTEGER                 CRAYREAD
	REAL			A(100)
	INTEGER			WDS

	DO I = 1, 100
		WDS = CRAYREAD(IFC, A, 0, 0) 
	END DO

	WDS = CRAYREAD(IFC, A, 100, 1)

For other situations, please contact the SCD Consulting Office 303-497-1278.

Using CRAYWRITE

The CRAYWRITE routine writes data for the Cray blocked or unblocked dataset. The routine will automatically convert the native (IEEE) data to Cray 64-bit format. Here is an example:
	EXTERNAL                CRAYWRITE
	INTEGER                 CRAYWRITE
	REAL			A(100)
	INTEGER			WDS

	WDS = CRAYWRITE(IFC, A, 100, 1)

In this case, we will write 100 words from "A", and automatically convert the 32-bit (4-byte) IEEE words to Cray floating-point format.

Using CRAYCLOSE

The CRAYCLOSE routine flushes (if necessary) the library buffer, deallocates any resources used by the routines, and closes the Cray dataset. Here is an example:
	EXTERNAL                CRAYCLOSE
	INTEGER                 CRAYCLOSE

	IER = CRAYCLOSE(IFC)

It is absolutely critical that you call this routine if you are writing a Cray dataset; failure to do so will result in a corrupted Cray dataset. Failure to call CRAYCLOSE when reading will only result in memory growth of your process, but will not corrupt any data.

Using CRAYBACK

The CRAYBACK routine is used just like a Fortran BACKSPACE statement. This routine moves the record pointer in the Cray dataset backward one record. An example:
	EXTERNAL                CRAYBACK
	INTEGER                 CRAYBACK
	INTEGER			RC

	RC = CRAYBACK(IFC)

Strategies for mixed-type Cray records

You can use the NCARU library to read mixed-type records; however, it requires you to do some coding. You must also know the exact format of the Cray record.

There are three basic strategies you can use:

  1. You can read the record with no conversion into an INTEGER array, then perform the numeric conversion yourself. See example 1 below.

    Note: If you use the automatic promotion option on the compiler, you will have to adjust the examples below appropriately.

  2. You can read the same record twice, with the conversion flag set appropriately for the data type. In this case you would also use the NCARU routine CRAYBACK to backspace after reading the record the first time. A side effect here is that the second read would result in the converted data being offset into the specified array. Example 2 illustrates this effect.

  3. You can read the Cray mixed-type record on a non-Cray Unix system. You write a record in Cray-blocked format and Cray-format number representation. Then you read the Cray record into a "raw" character buffer and decode the information into IEEE format as real, integer, and character variables. See example 3.

Example 1:
Suppose you have a Cray record that contains 50 INTEGERs followed by 50 REALs. Something like the following would allow you to read this mixed-type record:

	EXTERNAL                CRAYREAD
	INTEGER                 CRAYREAD
	INTEGER			WDS
	INTEGER			RAW(200)
	INTEGER			I(50)
	REAL			F(50)

	WDS = CRAYREAD(IFC, RAW, 100, 0)

	CALL CTOSPI(RAW, I, 50)

	CALL CTOSPF(RAW(101), F, 50)
First we declare an INTEGER array of 2 times the number of Cray words; this is necesary because the INTEGER word is 4 bytes long and the Cray word is 8 bytes long. In this case the CRAYREAD conversion flag is set to 0, which means that CRAYREAD will merely transfer the Cray words into the INTEGER array as is. We tell CRAYREAD to read 100 Cray words, the length of the Cray record. Next we call the conversion routines directly; the first one converts the data stored in RAW to IEEE integers and stores into the array I.

The last subroutine call above converts the raw data into IEEE single-precision floating point. Note here that we start the conversion at word 101 in the RAW array. This is where the Cray floating-point data starts in the mixed-type record.

Example 2:
Here is an example of the second strategy. In this case we read the same record twice, converting separately during each read:

	EXTERNAL                CRAYREAD
	INTEGER                 CRAYREAD
	INTEGER			WDS
	INTEGER			I(50)
	REAL			F(100)

	WDS = CRAYREAD(IFC, RAW, 50, 3)

	IER = CRAYBACK(IFC)

	WDS = CRAYREAD(IFC, RAW, 100, 1)
Here the first CRAYREAD only reads and converts the first 50 words of the data record. The conversion specifies conversion into 32-bit IEEE integers. The second CRAYREAD reads the entire record and assumes that all the Cray words are floating-point words. Since we already know that the first 50 words are actually INTEGER words, we will ignore them and only use the true REAL words starting at F(51).

The numeric conversion routines automatically detects out-of-range values and resets them to either zero (underflow) or the maximum value for the specified type. The numeric conversion routines will not abort the program due to conversion issues.

Example 3:
This example explains how to read a Cray mixed-type record on a non-Cray Unix system. Two programs are used. The first program writes a record in Cray-blocked format and Cray-format number representation. Then the second program reads the Cray record into a "raw" character buffer and decodes the information into IEEE format as real, integer, and character variables.

This is the first program. It produces, on the Cray system, 10 reals, 10 integers, and 8 characters in one binary record of a file called "fort.1":

  program writecw
  real buf(10)
  integer ibuf(10)
  character*8 char
  char='thatsall'
  do i=1,10
    buf(i)=float(i)
    ibuf(i)=i
  enddo
  write(1)buf,ibuf,char
  stop
  end

This is the second program. It reads the fort.1 file on a non-Cray Unix system and prints out the contents.

Note: The most convenient way to read a mixed-type record is to read the unconverted bytes into a buffer. (In this case we use a character buffer.) Then with careful use of ctospf and ctospi, you can work your way through the buffer and convert and place the numbers in appropriate arrays. Be careful with the subscripts! Also, due to a design characteristic of NCARU, you must handle the characters as 8-byte Cray words. If you don't, then you will have to do a formatted buffer-read on them to pack or unpack them into the format you want.

  program test
  external CRAYOPEN,CRAYREAD
  integer CRAYOPEN,CRAYREAD
  integer IFC,IWDS
  character*1 raw(168)
  real reals(10)
  integer integers(10)
  IFC = CRAYOPEN("fort.1",0,0)
  if (IFC .LT. 0) then
    print *,'Error opening Cray file ./fort.1'
  else
    print *,'Opened Cray file ./fort.1 IFC =',IFC
  endif

C  Cray-deblocking read of the bytes in the record into a 
C  character input buffer:
C
  IWDS = CRAYREAD(IFC,raw,160,0)
  print*," raw wds=",IWDS
C
C  Conversion of the first 80 bytes in the input record into 10 
C  32-bit reals:
C
C  NOTE: The conversion to 32-bit reals is implicitly determined
C  by the number representation specified by the output array.
C
  call ctospf(raw,reals,10)
  print*,reals
C
C  Conversion of the second 80 bytes in the input record into 10 
C  32-bit integers:
C
C  NOTE: The conversion to 32-bit integers is implicitly determined
C  by the number representation specified by the output array.
C
  call ctospi(raw(81),integers,10)
  print*,integers
C
C  The 8-byte character variable is copied from the input buffer
C  to a character*8
C
C  NOTE: Regardless of how the character variables are defined on 
C  the Cray, the record will contain 8-byte character words, so
C  you may need to use gbytes() or a similar function to extract
C  the characters and re-assemble them into the correct strings.
C
  print*,(raw(i+160),i=1,8)
  stop
  end


If you have questions about this document, please contact SCD Customer Support. You can also reach us by telephone 24 hours a day, seven days a week at 303-497-1278. Additional contact methods: consult1@ucar.edu and during business hours in NCAR Mesa Lab Suite 39.

Last update: 03/09/2006

© Copyright 1999-2004. University Corporation for Atmospheric Research (UCAR). All Rights Reserved.

Address of this page: http://www.scd.ucar.edu/docs/conversion.tools/ncaru.html