Performing a full reconstruction¶
This section gives a brief walkthrough of the necessary steps for running a full LMR reconstruction including: configuration, setting up proxies, PSMs, and usage of pre-calculated observations.
Building the proxy database¶
Note
The provided databases in the downloaded sample data should be sufficient for most reconstruction purposes. Skip over this step unless updates were made to the available proxy data.
The proxy files used by the LMR code are in the format of Pandas dataframes
which are used as a database-like structure. One file contains the list of
proxies with unique identifiers and associated metadata
(xxxx_Metadata.df.pckl
), while the other contains the proxy measurements
over time (xxxx_Proxy.df.pckl
).
Anytime proxies are added or updated the database files
need to be updated using LMR_proxy_preprocess.py
. There are more
available options described in the comments of the main()
function, but
we discuss the important parameters here.
The main choice is which proxy database to build. There are two choices of
proxy_data_source
, ‘LMRdb’ and ‘PAGES2Kv1’. ‘LMRdb’ is a compilation of
NCDC, PAGES2K Phase 2, and other collected proxy records, while `PAGES2Kv1’
is data from only the PAGES2K Phase 1 project. For most purposes, ‘LMRdb’
should be the database of choice. Comment out whichever source is not in use
e.g,:
#proxy_data_source = 'PAGES2Kv1' # proxies from PAGES2k phase 1 (2013)
# --- *** --- *** --- *** --- *** --- *** --- *** --- *** --- *** --- *** ---
proxy_data_source = 'LMRdb' # proxies from PAGES2k phase 2 (2017) +
# "in-house" collection in NCDC-templated files
The other important parameters are to set the proxy input data and database
output locations. If you downloaded sample data from us, the input data are
located under the proxies
directory (e.g.,
/home/disk/foo/LMR_data/data/proxies/
). It is fine to use that same
directory as the output target.
Note
If rebuilding ‘LMRdb’, you should make sure the files under
.../LMR_data/proxies/LMRdb/
are untarred first. cd
to that
directory and untar using $ tar -xvf ToPandas_v0.4.0_files.tar.gz
.
When finished editing the options defining the database and file locations create the databases using:
(lmr_py3) $ python LMR_proxy_preprocess.py
(Remember to activate the correct python environment if you’re using Anaconda!)
Create pre-calibrated statistical PSMs¶
Note
The provided sample data includes many pre-calibration combinations for seasonal/annual PSMs calibrated against GISTEMP, NOAA, Berkeley Earth, and GPCC. You will likely be able to skip this step if no changes have been made to the underlying proxy database. If a needed pre-calibration file does not exist, the code alerts you that it was not found and exits. At that point, follow the instructions in this section.
The proxy system models (PSMs) are essential for translating our reconstructed fields (in climate model space) to something comparable to the proxy data (observation space). We implement a few different statistical regressions that fit proxies against instrumental data to form a PSM. These models are fit using annual or seasonal averages and a univariate or bivariate fit to moisture and temperature variables.
The LMR_PSMbuild.py
script creates the pickle files found in the
LMR_data/PSM/
directory. This file still uses a legacy
configuration style, so at first glance it’s a bit more dense than other
configuration interactions. The parameters for users are denoted between the
makers:
##** BEGIN User Parameters **##
...
##** END User Parameters **##
Below, user parameters are described for each configuration section of
LMR_PSMbuild.py
. After setting the relevant parameters for desired PSM
calibration, create the files using the command:
(lmr_py3) $ python LMR_PSMbuild.py
class v_core¶
- lmr_path: Path to LMR input data folders (e.g., /home/disk/foo/LMR_data/)
- psm_type: Setting to use ‘linear’ or ‘bilinear’ statistical PSM
- anom_reference_period: The time period over which the average is taken to use as the reference value for all proxy PSMs
- calib_period: Years over which proxy and instrumental data are used to calibrate the PSM
class v_proxies¶
- use_from: Which proxy database to use for calibration ([‘PAGES2kv1’] or [‘LMRdb’])
class v_psm¶
- avgPeriod: Whether to use annual or seasonal averages to calibrate the PSM (‘annual’ or ‘season’)
- test_proxy_seasonality: A flag where if True will go through a pre-defined set of seasonal distinctions to find the best calibration fit. The seasons tested are defined for each database for various proxy types. (Starting on Line #264 for PAGES2kv1 or Line #506 for LMRdb)
class _linear¶
- datadir_calib: Directory for instrumental calibration data. ‘None’ defaults to files in the designated lmr_path directory.
- datatag_calib and datafile_calib: Instrumental target for calibration. Uncomment the pair for the desired data, and make sure all others are commented out.
- psm_r_crit: Correlation threshold to consider for PSM calibration. If a fit is below this threshold the PSM is not created for that proxy.
class_bilinear¶
- datadir_calib: Directory for instrumental calibration data. ‘None’ defaults to files in the designated lmr_path directory.
- datatag_calib_T and datafile_calib_T: Instrumental target for temperature-sensitive calibration. Uncomment the pair for the desired data, and make sure all others are commented out.
- datatag_calib_P and datafile_calib_P: Instrumental target for moisture-sensitive calibration. Uncomment the pair for the desired data, and make sure all others are commented out.
- psm_r_crit: Correlation threshold to consider for PSM calibration. If a fit is below this threshold the PSM is not created for that proxy.
Configuring the LMR reconstruction¶
To start off, the configuration files need to be copied into the main source
code directory for LMR. Wherever you cloned/downloaded the source code
(we’ll use the path /home/disk/foo/LMR_src for our code directory) there should
be a config_templs/
folder which holds configuration templates.
From the LMR_src directory, there are two files you need to copy to
run an experiment:
$ cp config_templs/config_template.yml ./config.yml
$ cp config_templs/LMR_config_template.py ./LMR_config.py
The file, config.yml
, contains all the necessary knobs to fine-tune the
reconstruction. For an explicit description of each option, please see
LMR configuration.
Important options for a reconstruction¶
- core
- nexp: Experiment name
- lmr_path: Path to LMR_data directory
- datadir_output: Working directory to temporarily store LMR output files
- archive_dir: Archive directory to store final post-processed LMR output
- recon_period: Range of years (edge inclusive) to reconstruct
- nens: Number of prior ensemble members (should generally be above 50)
- save_archive: Ensemble detail of field output. ‘ens_variance’ and ‘ens_percentiles’ are more econmical reductions, while ‘ens_subsample’ and ‘ens_full’ store full-field ensemble members and can use large amounts of disk space
- seed: Sets the RNG seed to ensure reproducability for the ensemble sample and proxy record sample. WARNING: overwritten by wrapper.multi_seed and should not be used when running multiple iterations of a reconstruction.
- proxies
- use_from: Which proxy database to use for the reconstruction. [LMRdb] or [PAGES2kv1]
- proxy_frac: Fraction of available proxy records to use. Useful for independent verification on withheld proxies
- proxy_order (Database specific): Order of assimilation for proxy records. Commenting out proxy groups here will omit them from use in the reconstruction
- proxy_psm_type (Database specific): Specifies which PSM type to be used for which proxy groups. E.g., Tree ring_Width: bilinear
- (database specific means there are separate configuration settings for each proxy database)
- psm
- calib_period: Distinction of instrumental time period to calibrate PSMs to
- avgPeriod: Whether to use annual or seasonal averages to calibrate PSMs
- season_source (Only used for seasonal PSMs): Use season defined by the proxy metadata or an objectively-derived best season
- datatag_calib (PSM dependent): Which instrumental data source to use
for calibration. Options defined in
all_calib_sources
parameter
- prior
- prior_source: Experiment tag to use as source data for the prior ensemble. Should match the tag defined in datasets.yml
- state_variables: Which state variables to reconstruct and output. If not using pre-calculated Ye-values (estimated observations) the PSM-required-variables must be listed (i.e., temperature and/or moisture fields). The associated value after each field can be either ‘anom’ or ‘full’. ‘anom’ uses anomaly values for the prior. ‘full’ uses original non-centered data for the prior and is not guaranteed to work in all cases.
- regrid_method: Specification for regridding data that is loaded in for the prior. esmpy is generally recommended and can handle masked/non-regular grids.
- regrid_resolution (simple or spherical harmonics only): Resolution of the regridded field. (Number is a reference to the spherical harmonics truncation. E.g., 42 is a 44x66 grid.)
- esmpy_interp_method (esmpy only): Which interpolation method to use (‘bilinear’ or ‘patch’)
- esmpy_regrid_to (esmpy only): Target regrid definition tag defined
in
grid_def.yml
Important options for a Monte-Carlo (MC) iteration¶
Advantages of the LMR framework include the capacity to run many realizations of a reconstruction by sampling from the input data. This generates uncertainty bounds on reconstructed output and is an essential product for determining the robustness of reconstructed signals. There are a few options in the configuration important for MC operations.
- wrapper
- iter_range: Number range to perform iterations over. [0, 5] will output reconstructions to 5 different directories named r0 - r5. One can easily distribute runs on an a cluster by farming out different iteration ranges. E.g., set the range as [0, 5] for a reconstruction on one machine and [6, 10] on another.
- multi_seed: Seeds for creating reproducible iterations. Must be of
length such that indexing from the
iter_range
number is not out of bounds.
- core
- nens: Number of prior ensembe members (should generally be above 50). This is resampled for each iteration.
- proxies
- proxy_frac: Fraction of available proxy records to use. Useful for independent verification on withheld proxies. Resampled for each iteration.
Pre-calculating estimated observations (Ye values)¶
For offline reconstructions, estimated observations from the prior sample are
re-used each year. If we are not interested in outputting a field required for
the PSM, the Ye values (estimated observations) can be calculated and the field
ommitted. This saves memory and disk space and allows for individual fields
to be reconstructed separately when using RNG seeding (i.e., multi_seed
for
MC reconstructions).
To enable this, we first need to create the pre-calculated Ye file. After
setting up the config.yml, cd
into the misc/
directory and run
the command:
(lmr_py3) $ python build_ye_file.py <path/to/desired/config.yml>
If no configuration file is provided as a command-line argument, the code
uses config.yml
in the source code directory. This script builds the Ye file
based on the chosen proxy database, PSMs and averaging period, and the prior
source. Numpy zip files contining the calculated Ye values are output in the
lmr_path
directory under ye_precalc_files
.
In order to use the file of pre-calculated Ye values, in config.yml
under
the core
section, set use_precalc_ye
to True.
Running your LMR reconstruction¶
With all the files created and the configuration set, running a reconstruction is performed using:
(lmr_py3) $ python LMR_wrapper.py
If any files are missing or the configuration is set up incorrectly, the code will exit with an error printout explaining what action should be taken.
After the reconstruction finishes, there is a printout of total time elapsed, and the code issues a move command to process the output and place it in the designated archive directory.