File-naming convention for BADC instrumental and other datafiles


Introduction

The BADC holds a wide range of instrumental and model datasets of interest to the scientific community. From the point of view of data access, it is highly desirable to adhere to common file formats and file-naming conventions for all the data produced under the various projects. A well thought out and organised file-naming convention allows quick data access and avoids the user having to read the file in order to enquire as to its contents. Using this convention will save time and resources when setting up data management for each individual project, it will also allow greater analysis and manipulation of the data by software at the BADC (and beyond).

File-naming convention

The file-naming convention for instrumental (and other) datasets uses long file names since these indicate significant information about the contents of the file without having to read the file or refer to the directory structure. Important attributes in a file name include INSTRUMENT, LOCATION and TIME.
Please note that FAAM file names expand the convention below by allowing three [_extra] fields, two of which are mandatory for data collected on board the FAAM aircraft (for details, please refer to the FAAM Filename Convention). Participants to FAAM campaigns may feel free to generalise this rule to all data collected during FAAM campaigns, and use up to 3 extra fields separated by underscore signs, if they wish to do so.

The chosen convention is as follows:

instrument_location_YYYYMMDD[hh][mm][ss][_extra].ext

Where:

instrument - is the instrument name (full or shortened) or model name. When the same instrument is used by a number of groups, the instrument name should be prefixed with the institute name/code and a hyphen, for example uea-ptrms and york-ptrms.

location - is the location name (full or shortened). This refers to the location of the observation and not the institute or location of the participating scientist/group. This field could be used for a range of items such as a site, a station, a platform, an institute or a university.

YYYYMMDD - is the date on which measurements were taken. If a data file spans more than one day then this field should represent the first day during which data was recorded. The year is given as four digits with month and day as two digits each.

[hh][mm][ss] - is the time of day specified (optional). Hours, minutes and seconds can be represented as two digits each. Hours can be used alone, only hours and minutes used or all three fields can be included. However, minutes or seconds cannot be used without the preceding time unit (i.e. no minute field allowed unit without the hour field).

[_extra] - this section allows additional code to define such things as different range resolutions and so forth. It could also be used for Version numbers etc,.

.ext - will normally be .nc (NetCDF) or .na (NASA Ames) although occasionally other formats will be used, in particular .png and .gif for Image files.

Filenames should contain only the characters [-_.a-z0-9]. Spaces are forbidden and upper case characters should be avoided. The underscore "_" character should only be used as a separator between fields.

File-naming for non-standard data (e.g. model, trajectory data)

Some projects will also generate model data, flight data, data recorded at sea (stationery and in transit), trajectories and other non-standard data types. It is suggested that the above format be adapted in the following ways:
  1. Data recorded by onboard moving craft
    When data is recorded on a moving craft the varying spatial location should not be recorded in the filename. Instead, the location field in the filename should include a name (or code) for the vessel and optionally the flight/voyage code/number.

  2. Trajectory data
    Calculated trajectory data is similar to data recorded on a moving craft. The varying spatial location should not be recorded in the filename. Instead, the location field in the filename should include a relevant code for the trajectory type/model/number.

  3. Model data
    In the case of the model data, the instrument field in the filename should instead be used for a model code (indicating the type, version etc., of the model). For box models running at one location only the location fie ld should include this. However, models that output data over a grid can use appropriate codes to represent this.

  4. Use of the [_extra] additional information field
    The [_extra] field is unlikely to be used in most cases but is provided as an option for exceptional cases where the data producer wishes to include some additional information not otherwise catered for. Suitable warning should be used a gainst overloading this field. Such a use might be in forecast files where the date and time provide the start time whilst the [_extra] field provides the time of the actual forecast.

  5. Use of the [hh][mm][ss] time options
    The [hh][mm][ss] options are included or occasions where data is produced at such a high frequency that storing it in multiple files per day, hour or minute becomes appropriate. This is unlikely to be commonplace but is available for special cases.

  6. Image files
    Text files (.txt) may be included to describe image data. Apart from the file name extension (last field), files containing images and their associated metadata should have the same name. When data exist both in the form of NASA Ames formatted fields and images,files also have the same name, except for the file name extension.

Standardising common names in the naming convention

In order to standardise the names used within the file-naming convention the BADC will need to collate those currently used by the community and publish them via our website. This can be regularly extended to include new locations, instruments, models, etc. Interaction with instrument scientists and modellers will be essential to achieve this aim successfully.