Previous: Upgrading Madrigal   Up: Madrigal admin guide   Next: Madrigal data organization

The Madrigal data model and metadata files

Understanding the Madrigal data model is an important step in understanding how Madrigal works. There is a correspondence between each level of the data model and the metadata files that are found in the MADROOT/metadata directory. In this section we describe each level of the Madrigal data model and the corresponding metadata file.

Madrigal site - siteTab.txt

The highest level of Madrigal is a Madrigal site. A Madrigal site is one particular web site controlled by one particular group, that holds all their own data. At the moment, there are Madrigal sites at Millstone Hill, USA, EISCAT, Sweden, Arecibo, Puerto Rico, SRI International, USA, Cornell University, USA, Jicamarca, Peru, The Institute of Solar-Terrestrial Physics, Russia, and Wuhan Ionospheric Observatory, the Chinese Academy of Sciences. While each Madrigal site stores their own data locally, they also share metadata with all the other sites. This makes it possible for users to search for data at all the Madrigal sites at once no matter which site they visit, and simply follow links to the Madrigal site that has the data they are interested in.

Metadata about all sites is stored in MADROOT/metadata/siteTab.txt. When new Madrigal sites are added, this table is updated and all Madrigal sites are notified so they can update this file. If a site is running Madrigal 2.5 or higher, this file will be automatically updated unless the file has been manually modified by the administrator in a way not reported to OpenMadrigal administrator. This file contains the following comma-separated fields:

Instrument - instTab.txt

The next layer of the Madrigal data model is the instrument. All data in Madrigal is associated with one and only one instrument. Any given Madrigal site will hold data from one or more instruments. Since Madrigal focuses on ground-based instruments, most instruments have a particular location associated with them. However, some Madrigal data is based on measurements from multiple instruments, and so have no particular location. Some examples are "EISCAT Scientific Association IS Radars" which combine data from the multiple EISCAT radars, and "World-wide GPS Receiver Network", which consists of over a thousand individual GPS receivers distributed around the globe.

Metadata about all instruments is stored in MADROOT/metadata/instTab.txt. When new Madrigal instruments are added, this table is updated and all Madrigal sites are notified so they can update this file. If a site is running Madrigal 2.5 or higher, this file will be automatically updated unless the file has been manually modified by the administrator in a way not reported to OpenMadrigal administrator. This instrument code list is usually consistent with the Cedar instrument list. This file contains the following comma-separated fields:

 

Instrument type - instType.txt

The instrument type table lists categories of instruments, to allow the user to search instruments more easily. If a site is running Madrigal 2.5 or higher, this file will be automatically updated unless the file has been manually modified by the administrator in a way not reported to OpenMadrigal administrator. This file contains the following comma-separated fields:

 

Experiment - expTab.txt

All the data from a given instrument is organized into experiments. An experiment consists of data from a single instrument covering a limited period of time, and, as a rule, is meant to address a particular scientific goal. Madrigal makes the assumption that instruments may be run in different modes, and so the data generated may vary from one experiment to another. By organizing one instrument's data into experiments, the purpose and limitations of each experiment can be made clearer. As a Madrigal administrator, you can also provide users with supplemental plots and documentation about that experiment, in addition to the standard Madrigal data files. See the section on creating and updating Madrigal experiments for more information.

Madrigal has a number of security codes for different types of experiments. Most experiments are public, which allows them to be accessed by everyone. An experiment can also be made private, which means that only users with set ranges of ip addresses can access them. (See Set private versus public access.) Private experiments are never shared with other Madrigal sites. There is also a hidden experiment state to completely remove an experiment from access by Madrigal (if, for example, the data is discovered to be corrupt, but might be fixed in the future).

There are also two experiment states for archiving experiments. These are public archive and private archive. These states are meant to support the archiving of Madrigal data at a central Madrigal site, such as the one at NCAR. These archived experiments are duplicates of experiments found at other Madrigal sites. In general these archived experiments are ignored by any part of the user interface that searches all sites, because the user will only want to find the main data source. However, when the user interface is only accessing local data, these archived experiments will appear. A private archived experiment is subject to the same restriction as a regular private experiment.

Metadata about all experiments is stored in MADROOT/metadata/expTab.txt. This file is automatically generated from individual expTab.txt files located in each experiment directory, as will be described in the next section on experiment organization. This file contains the following comma-separated fields:

There is also a file called MADROOT/metadata/expTabAll.txt which is also automatically generated. If differs from expTab.txt in that it contains experiment metadata from all Madrigal sites, not just the local one. Any remote experiment with a non-zero security code will be excluded.

Experiment Files - fileTab.txt

The data from a given experiment is stored in one or more experiment files. There are a number of reasons there may be more than one file for a given experiment. The first is that different kinds of data may be stored in different files. Also, the experimental data may be analyzed in more than one way, leading to files with different sets of measured parameters. For these two cases, each file should have its own kindat code (see below). Another reason for multiple files is that older, historical files can be kept on-line for reference purposes.

The format of these files can be any of the allowed variants of the Cedar database format. Each file may contain only one kindat. The category field is used to distinguish files which are of historical interest only, e.g. a file which have been superseded by a file with an improved electron density calibration. In some cases there may be more than one up-to-date variant of a file, e.g. when different analysis options have been chosen. In this case one of these files is designated the default, and the others are designated as variants.

Metadata about all experiment files is stored in MADROOT/metadata/fileTab.txt. This file is automatically generated from individual fileTab.txt files located in each experiment directory, as will be described in the next section on experiment organization. This file contains the following comma-separated fields:

There is also a file called MADROOT/metadata/fileTabAll.txt which is also automatically generated. If differs from fileTab.txt in that it contains experiment file metadata from all Madrigal sites, not just the local one.

 

Data parameters - parcods.tab

Any given file is made up a series of records holding measured parameters. Note that based on which parameters are in the file, Madrigal will automatically derive a large number of other parameters such as Kp and Magnetic field strength that aren't in the file itself. In the web browser, measured parameters are shown in bold, derived parameters in normal font.

The metadata file parcods.tab contains information about what Madrigal or Cedar parameters are supported. Madrigal parameters are a superset of Cedar parameters. If a Madrigal parameter has a parameter code of 0, it cannot be stored in a Cedar file and is meant the be a derived value only. All Madrigal mnemonics must be unique. All non-zero parameter codes must be unique, and are set by the Cedar standard. If a new parameter is desired, it should be done in coordination with Barbara Emery at NCAR (emery@ucar.edu).

The file parcods.tab is not comma delimited, but instead is fixed length formatted.

Since data in the Cedar format is presently stored as 16 bit integers, parameters only have a limited dynamic range, and care must be taken in selecting units and scale factor. To increase dynamic range, sometimes an additional, finer scale parameter is also added to the Cedar parameter list, called an additional increment parameter. See, for example, parameters 120 and 121 in the Cedar format, whose descriptions are "Range" and "Additional increment to range". Because 16 bit integers have a dynamic range of 2^16, the scale factor of the additional increment parameter is typically a factor of 10^4 lower than the main parameter.

Madrigal is designed to automatically use additional increment parameters found in parcods.tab. To add a new parameter with an accompanying additional increment parameter, the new additional parameter must follow the following rules:

  1. It must have the code (1 + code of main parameter)
  2. Its description must begin "Additional increment" (case sensitive)
  3. It must use the same units as the main parameter.
Parameter explanations

For parameters that cannot be fully described in the 38 characters allowed in the parcods.tab file, additional explanation about the parameter or its corresponding error parameter can be added to the file madroot/doc/parmDesc.html. Simply create a new named anchor in that file, where the anchor name is the parameter mnemonic in all capitals. Following that, a description of arbitrary length can be given using html. Change one of the last two columns from 0 to 1 for that parameter in parcods.tab to let Madrigal know that this explanation exists. In general, the parameter order in parmDesc.html matches that of parcods.tab, but that is not a functional requirement.

Parameter categories - madCatTab.txt

The Madrigal category metadata file(madCatTab.txt) contains information about what categories Madrigal parameters belong in. The categories are similar to the Cedar categories, but do not follow them exactly. This file does not change. This file contains the following comma-separated fields:

Data type (kindat) table - typeTab.txt

The Madrigal data type (also called kind of data or kindat) metadata file(typeTab.txt) contains a list of all data types in the database. The purpose of kindat is to uniquely identify the data processing algorithm used to compute the parameters in the associated Madrigal file. For now this metadata is only used locally; however, Madrigal sites that develop new data processing algorithms should update this table, and then forward the revised typeTab.txt metadata file to the OpenMadrigal administrator. While not required by Madrigal, this updating would allow this table to be consistent with the CEDAR Database list of kindat codes.

Instrument parameter table - instParmTab.txt

The instrument parameter metadata file (instParmTab.txt) contains information about what measured parameters are found in the data for any given instrument. This data is used to support the global database query web page, and is rebuild by updateMaster. This file contains the following comma-separated fields:

Instrument kindat table - instKindatTab.txt

The instrument kindat metadata file (instKindatTab.txt) contains information about what kindat codes are used with any given instrument. This data is used to support the global database query web page, and is rebuild by updateMaster. This file contains the following comma-separated fields:

File data

The bottom level of the Madrigal data model is of course the data itself. A Madrigal file is made up of a series of records, each with a start and stop time, representing the integration period of measurement (Madrigal tries to enforce the idea that all measurements take a finite time, but sometimes the start time = the stop time). To get data from a file, simply specify the parameters you want (and optionally, any filters to apply to the data). More details are given later in this tutorial.

Each Madrigal record has two parts - scalar parameters and vector parameters. For historical reasons these two parts are sometimes called one-dimensional and two-dimensional parameters. Scalar parameters are easy to explain - each scalar parameters has one measurement per record. An example might be the azimuth of a radar making a measurement. Vector parameters have multiple values in a given record. The Cedar file format specifies that all vector parameters must have the same number of measurements. One of more of the vector parameters represent the independent spatial variable(s). For radars this variable is typically range, but latitude, longitude, and altitude could just as easily be used as the three independent spatial variables. The dependent vector variables must all have the same length as the independent variable(s). The independent parameter should never represent time, since the Cedar format specifies that that one record should should cover one period of time.

For example, a radar might store azimuth and elevation as scalar parameters, and range as the independent vector variable. If the electron density and ion temperature are dependent vector variables, and there are ten range measurements, then there must be ten measurements of electron density, and ten measurements of ion temperature. If at certain ranges it is impossible to determine the ion temperature, the Cedar format defines a special value to represent missing data to fill the gap.

The Cedar file format defines the physical meaning of almost every parameter to be found in a Cedar file. The only exceptions are parameters defined by individual groups. Any parameter found in a Cedar file that is not defined in the Cedar file format should be fully defined in the header record of the file. See the experiment page for a description of how to view a Cedar file's header record.

Each Cedar parameter can also have an associated error value. This error value can have the special values "missing", "assumed", or "known bad". If an error parameter is "assumed", the implication is that the measured value itself is assumed, and does not represent a measured value. If the error value is "known bad", the measured data is known to have a problem.

Previous: Upgrading Madrigal   Up: Madrigal admin guide   Next: Madrigal data organization