Why did we design Naiad system?

Working with satellite data

Working with satellite data, especially when stored in their native full resolution swath format, has long been an issue for users which has often prevented from a wider use of these data :

  • Volume : the required storage capacity of a satellite product archive for a full mission lifetime can be huge. This makes processing tasks over long time period - especially with orbit files - ressources and time consumming while not always justified : applications focusing on small geographic areas or on particular events (such as storms) use only a small subset of an archive but it is most of the time impossible, with existing systems, to extract from an archive only what is really needed. Instead users have to download and store a massive collection of orbit files which have to be scanned one by one to extract the relevant subset.
  • Access : satellite data are spread and heterogenous. They are available on different sites with different access means (ftp, OpenDAP,...). The lack of central catalog, single access point or standardized way of accessing data makes collecting such data a time consumming task.
  • Format : the data comes in a large variety of format and it is unlikely that users to master all of them
  • Combination : satellite data come with different pattern and resolution and the combination of these data often relies on time and space colocation thresholds beyond which the available data are no more relevant. Selection of the data matching these thresholds is a hard and very consuming task.

This makes the use of satellite data not only a technical issue (storage capacity, processing time and ressource available) but also a programming (extraction, remapping,...) issue when working for instance with swath data (orbit files).
This problem has often been addressed by data providers generating specific gridded products with either downgraded resolution at global scale or with high resolution over specific regional area of general interest. Such products are indeed much more convenient to use and solve partly the storage issue. However this approach is not applicable as soon as one wants to access the full resolution, especially over marginal areas, since most of the satellite coverage is ignored by the predefined products. The user has then no choice than getting back to the full resolution swath data when available, and coping with all the issues raised above.

Advanced applications

The basic data access issues described in the above section are not the only limits of currently available data retrieval systems. Advanced usage of satellite data requires more powerfull data search and selection tools, as well as exploration tools allowing real datamining applications taking full advantage of the massive data archives available. The following subsections gives example of advanced functionalities a data archive should offer to users.

Data collocation

A major issue is to combine data coming from different sensors and different satellites. Data combination covers many different needs :

  • having different geophysical or biological parameters over a specific area and at a a specific time, for instance to study some phenomena or event :
    • algae blooms study may require chlorophyll concentration as well as sea surface temperature, surface solar irradiance and wind conditions
    • oil spill monitoring may require SAR images as well as wind conditions, sea surface temperature and ocean color in order to detect and remove false alerts
    • etc...
  • having the same parameters from different sources in order either to intercompare them (sensor calibration/validation) or to complement them :
    • study of some oceanic pattern (such as fronts, eddies, jets,...) will benefit on having for instance different sources of sea surface temperature measurements especially in cloudy areas retrieval of some geophysical parameters can only be performed by combining different parameters which may come from different sensors :
    • sea surface salinity can only be obtained from sea surface brightness temperature removing the signature from other parameters such as sea surface temperature, wind conditions, surface roughness,...
    • sea surface wind conditions are better retrieved when taking into account the sea state (waves, currents,...)

The issues to address to match these needs are mostly :

  • providing an integrated access to all the data sources : single access point, single request, single result
  • performing complex requests across all the selected datasets on specific time and space match-up criteria. The execution of these requests must be time effective.
  • preparing the data combination task by restoring the data homogeneously, with respect to format, output pattern (grid, resolution,...), merging,...

Data exploration

It does not address the issue of selecting the data really statistically relevant for long-term studies or suitable for the observation of some specific phenomena. Users interested in sea surface temperature do not need cloud-contaminated data, users with focus on extreme phenomena do not need segments of orbit files filled with low wind speed only,... They may be instead interested in selecting data over areas exhibiting some particular patterns, strong values or variability of a given parameter, high vorticity index,...
Powerfull systems should therefore aim at providing advanced and flexible search and subsetting mechanisms based on criteria related to the actual data content and properties rather than (or in addition to) its coverage.