Retrieval

major modules

mwr_l12l2.retrieval.retrieval

This is the main module orchestrating the retrieval including pre- and post-processing

class mwr_l12l2.retrieval.retrieval.Retrieval(conf, selected_instrument=None, node=0)[source]

Bases: object

Class for gathering and preparing all necessary information to run the retrieval

Parameters:
  • conf – configuration file or dictionary

  • node – identifier for different parallel TROPoe runs. Defaults to 0.

choose_model_files()[source]

choose most actual model forecast run containing time range in MWR data and according zg file

do_retrieval()[source]

run the retrieval using the TROPoe container

list_obs_files()[source]

get file lists for the selected station

Note

this method shall list all (MWR) files not just the ones matching time settings. Like that old (obsolete) files are removed when prepare_obs() is run with delete_mwr_in=True

postprocess_tropoe()[source]

post-process the outputs of TROPoe and write to NetCDF file matching the E-PROFILE format

prepare_model()[source]

extract reference profile and uncertainties as well as surface data from ECMWF to files readable by TROPoe

prepare_obs(start_time=None, end_time=None, delete_mwr_in=False)[source]

Function to prepare E-PROFILE MWR and ALC inputs.

Parameters:
  • start_time (datetime64) – The start time for selecting the data.

  • end_time (datetime64) – The end time for selecting the data.

  • delete_mwr_in (bool) – Flag indicating whether to delete the MWR files after processing.

Raises:
  • MissingDataError – If none of the MWR files contain data between the required time limits.

  • MissingDataError – If there is not enough data to run the retrieval.

Finally, it sets the necessary attributes for further processing.

prepare_paths(datestamp='', netcdf_ext='.nc')[source]

prepare input and output paths and filenames from config

prepare_tropoe_dir()[source]

set up an empty tropoe tmp file directory for the current node (remove old one if existing)

prepare_vip()[source]

prepare the vip configuration file for running the TROPoe container

run(start_time=None, end_time=None)[source]

run the entire retrieval chain

Parameters:
  • start_time (optional) – earliest time from which to consider data. If not specified, all data younger than ‘max_age’ specified in retrieval config will be used or, if ‘max_age’ is None, age of data is unlimited.

  • end_time (optional) – latest time from which to consider data. If not specified, all data received by now is processed.

select_instrument()[source]

Selects the instrument which has the oldest MWR file in the input directory (based on its filename).

Note that this method is not called in case of the operationnal processing or in case we already have the WIGOS number setup.

This method finds the oldest file in the folder and extracts the station ID from the filename or file content. It then retrieves the instrument configuration based on the station ID and instrument ID. Finally, it sets the necessary attributes for further processing.

Raises:

MissingDataError – If no MWR data is found in the specified directory.

helper modules

mwr_l12l2.retrieval.tropoe_helpers

This module contains utilities for running the TROPoe container

mwr_l12l2.retrieval.tropoe_helpers.add_flags(data)[source]

Add dummy quality flags to the given data. TODO: define the quality flags.

Parameters: data (xarray.Dataset): The input data.

Returns: xarray.Dataset: The data with quality flags added.

mwr_l12l2.retrieval.tropoe_helpers.add_variables_attrs(data, derived_product_list)[source]

Add variables attributes linked to the retrieval_type, retrieval_elevation_angles and retrieval_frequency

mwr_l12l2.retrieval.tropoe_helpers.extract_attrs(data)[source]

Extracts some attributes from the TROPoe outputs and rename them.

Parameters:

data (xr.Dataset) – The input data containing the variables to extract the attributes from.

Returns:

The input data with the added attributes

Return type:

data (xr.Dataset)

mwr_l12l2.retrieval.tropoe_helpers.extract_avk(data, tropoe_out_config)[source]

Extracts prior information from the given data based on the TROPoe output configuration. #TODO: this function could be done more generic, e.g. by inputing a list of variables to extract prior information from.

Parameters:
  • data (xr.Dataset) – The input data containing the variables to extract prior information from.

  • tropoe_out_config (dict) – The TROPoe output configuration dictionary.

Returns:

The input data with the prior information variables added.

Return type:

xr.Dataset

Raises:

FileExistsError – If the tropoe_out_config argument is not a dictionary.

mwr_l12l2.retrieval.tropoe_helpers.extract_prior(data, tropoe_out_config)[source]

Extracts prior information from the given data based on the TROPoe output configuration. #TODO: this function could be done more generic, e.g. by inputing a list of variables to extract prior information from.

Parameters:
  • data (xr.Dataset) – The input data containing the variables to extract prior information from.

  • tropoe_out_config (dict) – The TROPoe output configuration dictionary.

Returns:

The input data with the prior information variables added.

Return type:

xr.Dataset

Raises:

FileExistsError – If the tropoe_out_config argument is not a dictionary.

mwr_l12l2.retrieval.tropoe_helpers.height_to_altitude(data, station_altitude)[source]

transform height above ground level to altitude above mean sea level and add as dataarray and coordinate

Parameters:
  • dataxarray.Dataset in which to change height to altitude

  • station_altitude – single value or array defining station altitude (in an array the first entry is considered)

Returns:

updated dataset with altitude variable added (height also kept) and coordinate swapped from height to altitude

mwr_l12l2.retrieval.tropoe_helpers.model_to_tropoe(model, station_altitude)[source]

extract reference profile and uncertainties as well as surface data from ECMWF to files readable by TROPoe

Parameters:

model – instance of mwr_l12l2.model.ecmwf.interpret_ecmwf.ModelInterpreter that with executed run()

Returns:

xarray.Dataset containing model profile data in a form writable to an input nc for TROPoe sfc_data: xarray.Dataset containing model surface data in a form writable to an input nc for TROPoe

Return type:

prof_data

mwr_l12l2.retrieval.tropoe_helpers.run_tropoe(data_path, date, start_hour, end_hour, vip_file, apriori_file, data_mountpoint='/data', tropoe_img='davidturner53/tropoe', tmp_path='mwr_l12l2/retrieval/tmp', verbosity=1)[source]

Run TROPoe container using podman for one specific retrieval

Parameters:
  • data_path – path that will be mounted to /data inside the container. Absolute path or relative to project dir

  • date – date for which retrieval shall be executed. For now retrievals cannot encompass more than one day. Make sure that it is of type datetime.datetime or a string of type ‘yyyymmdd’. Alternatively you can pass 0 or ‘0’ to let TROPoe print back the vip-file parameter options.

  • start_hour – hour of the day defining the start time of the retrieval period. Can be a float, int or string.

  • end_hour – hour of the day defining the end time of the retrieval period. Can be a float, int or string.

  • vip_file – path to vip file relative to data_path or packaged inside container if matching ‘prior.*’

  • apriori_file – path to a-priori file relative to data_path

  • data_mountpoint (optional) – where the data path will be mounted

  • tropoe_img (optional) – reference of TROPoe container image to use. Will take latest available by default

  • tmp_path (optional) – tmp path that will be mounted to /tmp inside the container. Uses a dummy folder by default

  • verbosity (optional) – verbosity level of TROPoe. Defaults to 1

mwr_l12l2.retrieval.tropoe_helpers.transform_units(data)[source]

Transform all units of TROPoe output file to match units in E-PROFILE output files