Data File Format Guide

Many of our data sets can be ordered in different file formats.

Use this guide to help you choose the right format for your data.



ASCII Formats

There are two types of ASCII data formats, ASCII Tabular and Arc ASCII Grid. Examples of each are provided below:

ASCII Tabular

Comma delimited ASCII-formatted text files with descriptive header lines. The format of the ASCII file varies depending on the data set.

ASCII Tabular: Non-County Data

This ASCII format contains one line for each point within the selected region. Each time period is in a separate column on that line, separated by a comma. Each record contains the longitude and latitude of the center of the grid cell or pixel, followed by the date specific data.

This format contains a header of 12 lines.

Example 1: A request for monthly data between 1-Jan-1990 and 31-Dec-1991 would result in each record containing longitude, latitude, and twenty-four columns of monthly data starting in January 1990 and ending in December 1991:

Center Lon, Center Lat, 1990-01, 1990-02, 1990-03, 1990-04, 1990-05, ... 1991-12

-122.750000, 48.750000, 124, 118, 56, 67, 40, ... 77
-122.250000, 48.750000, 230, 218, 97, 83, 50, ... 166

:

Example 2: A request for monthly data from a climatological data set would result in each record containing longitude, latitude, and twelve columns of monthly data starting in January and ending in December. Climatological data are those that have been averaged over several years:

Center Lon,Center Lat,Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec

-122.250000,48.750000,6.9,8.8,11.2,14.3,15.7,19.8,21.8,22.2,19.6,15.5,10.4,8.1
-121.750000,48.750000,5.9,8.3,11.5,15.4,18.6,21.3,23.6,24.1,20.8,16.2,9.9,7.1

 

ASCII Tabular: County Data Only

This ASCII format contains one line for each county for each year within the selected region. For U.S. data, each file will contain the FIPS ID, state name, county name, and then the date specific data. For China data, each file will contain the CIESIN ID, province name, county name, and then the relevant data values.

County-based data collections will have a variable number of header lines.

Example: A request for clay soils data for the Beijing and Fujian provinces in China would result in a file with 83 records, one for each county in the two provinces. This comma delimited file will contain 4 columns: CIESIN ID, province name, county name, and the data values for the selected soils variable.

 

Arc ASCII Grid

This ASCII format can be imported directly into ESRI Arc GIS software packages. Files contain header lines followed by a block of data values. The header contains information needed to read the data file. Header lines contain information about the dimensions of the data block and the no-data value. Data are written row-wise, so that the first data record in a block contains values for the northernmost grid-cells moving from west to east. The last data record in a block contains values for the southernmost grid-cells moving from west to east.

If there are multiple layers within the data file, the next complete layer is written as a block in the same way, until all layers have been written to the file. For example, a request for monthly data between 1-Jan-1990 and 31-Dec-1991 would result in twenty-four blocks of monthly data starting in January 1990 and ending in December 1991. (Note that Arc GIS does not support multiple layers. To use multiple-layer files in Arc GIS, cut and paste the header and one block of data into separate files.)

Example of an Arc ASCII Grid header from the VEMAP2 Transient Dynamics Data collection:

ncols 115
nrows 48
xllcorner -124.500000
yllcorner 25.000000
cellsize 0.500000
nodata_value -9999

Note: VEMAP2 Transient Dynamics data are gridded at 0.5 x 0.5 degree spatial resolution. Xllcorner, yllcorner and cellsize are all reported in decimal degrees. For projected data, these parameters are reported in the units of the projection.

 

 

BINARY Formats

BSQ Option

This BINARY raster data format ( Band SeQuential Format) includes two files: an ASCII header file that describes the data ( orderfilename.hdr.txt ), and a "flat binary data file" that contains the data ( orderfilename.bsq ). This format is recommended for large raster data and satellite images.

EOS-WEBSTER offers several header files for using BSQ files with shareware or commercial software. If you are not certain about which header to use, we recommend you order your data with the generic header option. The generic header should contain all information you will need to understand the georeferencing of the data contained in your BSQ file.

Header files contain information about the dimensions of the image, the data type and format, and other pertinent information, including image projection parameters. The header file contains information needed to read the data file into a software package such as IDL/ENVI or Erdas/Imagine and many freeware image processing programs. Each line in the data file is followed immediately by the next line in the same band, until the entire band has been written. If there are multiple bands within the data file, the next complete band is written in the same way, until all bands have been written to the file. Bands may be spectral bands, as in satellite images, data quality or ancillary information, or in the case of time-series data, each "band" would represent a slice in time.

Header Examples:

Generic header example

BSQ Arc Info header example

GrADS header example

 

GeoTiff

GeoTiff is a platform-independent standard that supports geographic spatial data such as satellite imagery, aerial photos, digital elevation models, and scanned maps. The GeoTiff format is an extension of TIFF (Tagged-Image File Format) that provides geographic information embedded as tags within the TIFF file. The geographic data can then be used to position the image in the correct location and geometry on the screen of a geographic information display.

A GeoTiff file can be imported into any software that also can read a standard TIFF file. If the software is written to take advantage of the extra geographical information in the GeoTiff file, then the image should be properly registered with other datasets you may have available. See your particular software documentation for further details on how it handles GeoTiff files.

 

 

GrADS Format

The Grid Analysis and Display System (GrADS) is an interactive desktop tool that is currently in use worldwide for the analysis and display of earth science data. GrADS is implemented on all commonly available UNIX workstations, Apple Macintosh, and DOS or Linux based PCs, and is freely distributed over the Internet. Documentation on use of GrADS is available on-line.

Here is a sample GrADS header:

Supported Data Types:

Data type
Type abbreviation
Type length (bits)
byte
uint
8
integer
int
8, 16, 32
unsigned long integer
ulong
32
floating point
float
32
double precision
float
64

Please Note: PC users may need to swap bytes for 16, 32 & 64 bit data types.

 

HDF-EOS

HDF-EOS is an extension of NCSA (National Center for Supercomputing Applications) HDF (Hierarchical Data Format) and is a self-describing, binary format. Most commercial software that handle spatial data can import HDF-EOS. Data can also be accessed through library calls, which are available for the FORTRAN and C programming languages. An HDF-EOS file contains all of the metadata needed to extract and understand the data in the file, which are stored as grid, swath, or point data. HDF is the scientific data format standard selected by NASA as the baseline standard for the Earth Observing System (EOS). Many satellite data products produced by NASA EOS are distributed in the HDF-EOS format.

 

(ERDAS) Imagine

This is a file format created and used by ERDAS Imagine. Imagine files end in ".img". It is a complex file format which holds thematic or continuous image data layers. An Imagine file can be read by ERDAS Imagine software, MultiSpec Image Processing software, and others. It can also be imported into other popular software, such as ENVI. An .img file contains not only the data file values but statistics about those data, lookup tables, map coordinates, and map projection. A single file can hold multiple bands, or layers, of data but all the layers must all be of the same resolution. Imagine files are nice because they are self-describing (i.e., a separate header file is not necessary in order to use the data) and many common image processing software programs can either directly use them or import them. If you are unsure whether or not your application can use an Imagine file, you would be better off to request you data in the generic BSQ format, as almost all programs and computer languages can be used to access the data, but it requires more work on your part.



NetCDF

The Network Common Data Form (netCDF) was developed by UNIDATA and is a self-describing, binary format. Most commercial software that handle spatial data can import netCDF. Data can also be accessed through library calls, which are available for the FORTRAN and C programming languages. A netCDF file contains all of the metadata needed to extract and understand the data in the file.

For detailed information and tools for using and viewing netCDF files, please visit:

http://www.unidata.ucar.edu/packages/netcdf/index.html

and

http://www.unidata.ucar.edu/packages/netcdf/software.html

 

BSQ Header Examples

 

Sample of a BSQ generic header file created by EOS-WEBSTER:
[Return to BSQ Output Format]

 

Data File Name: tm-12-30-10.11.99-therm.bsq
Product Name: US Landsat

Data Format: generic binary
Interleave: BSQ
Compression/Packing: none
WRS Path: 012
WRS Row: 030
Rows/Lines: 3686
Columns/Samples: 4071
Bands: 2
Header Offset: 0
Data Type: Unsigned 8-bit

Temporal Extent: 11 October 1999
Georeferencing: Yes

Location of Coord. In Pixel: center
UL Lon/Lat = 0722509.8826W, 440830.1664N
UR Lon/Lat = 0692159.9076W, 441132.3809N
LR Lon/Lat = 0692117.6782W, 421205.0793N
LL Lon/Lat = 0721836.9175W, 420915.0353N

Where:

Lon. = dddmmss.ssss(W/E)
Lat. = ddmmss.ssss(N/S)
d - degrees
m - minutes
s - seconds
W - west longitude; E - east longitude
N - north latitude; S - south latitude

Projection Information:
GCTP Projection ID:
Projection Name: UTM
Projection Units: Meters
Pixel Size (X,Y): 60 x 60
Spheroid/Ellipsoid Name: WGS-84
UTM Zone#: 19

Location of Coord. In Pixel: center
UL X/Y = 226500.000, 4893270.000
UR X/Y = 470700.000, 4893270.000
LR X/Y = 470700.000, 4672170.000
LL X/Y = 226500.000, 4672170.000

Band Information:
Band 1: Emitted Thermal Low Gain
Band 2: Emitted Thermal High Gain

The radiance values reported in each band files are quantized to 8-bit DN, reflecting 256 levels of radiance. The following Radiometric Record (as taken from original data source) contains the coefficients needed to convert the image values into at-satellite spectral radiance for each particular band.

Equation for Determining Radiance:

At-Sensor-Radiance (W m-2 sr-1 micrometer-1) = Gain*DN + Biases

Biases and Gains by Band (bias, gain):

Band 1: 0.000000000000000, 0.066823533002068
Band 2: 3.200000047683716, 0.037058821846457

Data Processing From Origin:
1) Data Imported to ERDAS Imagine from Source
2) Data Exported to Generic Binary/BSQ
Processed at EOS-WEBSTER By: Shannon Spencer
Date of EOS-WEBSTER Processing: 08 November 2000

Data Origin:
Source of Data: USGS-EDC Scene ID: L71012030_03019991011
Source Format: Fast-L7A Format
Source Processing: L1G - systematic geocoding
Original Resampling Method: Cubic Convolution

 

 

 

BSQ Arc Info header example
[Return to BSQ Output Format]

 

This binary data format can be imported into ESRI Arc GIS software packages. In addition to the BSQ data file, there are two header files example (orderfilename.hdr, orderfilename.bsw):

Example of orderfilename.hdr:

BYTEORDER I
LAYOUT BSQ
NROWS 30
NCOLS 55
NBANDS 1
NBITS 32

Example of orderfilename.bsw:

1.000000
0.000000
0.000000
-1.000000
-84.500000
14.500000

 

 

 

 

GrADS header example (orderfilename_ctl.txt)
[Return to GrADS Output Format]

A sample GrADS header from an order for the EOS-WEBSTER NOAA/NASA Pathfinder collection (data are float32):

DSET 711Amazon_NDVI.bsq
TITLE Monthly composite Normalized Difference Vegetation Index (NDVI) for Amazonia, calculated from AVHRR data. The grid size is 0.1 degree, data are monthly and cover the period of August, 1981 to July, 1994. Data are part of the NOAA/NASA Pathfinder NDVI Variable in the LBA / South American Data Collection. See citation in the auxiliary file provided when these data are ordered.
OPTIONS yrev little_endian
UNDEF -90
XDEF 441 LINEAR -80.000000 0.100000
YDEF 301 LINEAR -20.000000 0.100000
ZDEF 1 LINEAR 1.000000 0.000000
TDEF 156 LINEAR 01Aug1981 1mo
VARS 1
ndvi 0, 99 NOAA/NASA Pathfinder NDVI
ENDVARS