Automatic Cell Measures¶
If you have worked with CMIP data before, you know that cell measure information
like areacella
is needed to take proper area-weighted means/summations. Yet
many times, model centers have not uploaded this information uniformly in all
submissions. This can be frustrating for the user.
In intake-esgf
, when you call to_dataset_dict()
, we perform a search for
each dataset being placed in the dataset dictionary, progressively dropping
facets to find, if possible, the cell measures that are closest to the dataset
being downloaded. Sometimes they are simply in another variant_label
, but
other times they could be in a different activity_id
. No matter where they
are, we find them for you and add them to your dataset by default (disable with
add_measures=False
).
Consider the following search for data with UKESM1-0-LL
. We are looking for a land variable gpp
, the gross primary productivity.
from intake_esgf import ESGFCatalog
cat = ESGFCatalog().search(
variable_id="gpp",
source_id="UKESM1-0-LL",
variant_label="r2i1p1f2",
frequency="mon",
experiment_id="historical",
)
dsd = cat.to_dataset_dict()
The progress bar will let you know that we are searching for cell measure
information. We determine which measures need downloaded by looking in the
dataset attributes. Since gpp
is a land variable, we see that its
cell_measures ='area: areacella'
which indicates that this data should be also
downloaded. However you will also find where land
in the cell_methods
meaning that we also need sftlf
, the land fractions. If you look at the
resulting dataset, you will find that both have been associated.
dsd["gpp"]
<xarray.Dataset> Size: 230MB Dimensions: (time: 1980, bnds: 2, lat: 144, lon: 192) Coordinates: * time (time) object 16kB 1850-01-16 00:00:00 ... 2014-12-16 00:00:00 * lat (lat) float64 1kB -89.38 -88.12 -86.88 ... 86.88 88.12 89.38 * lon (lon) float64 2kB 0.9375 2.812 4.688 6.562 ... 355.3 357.2 359.1 type |S4 4B ... Dimensions without coordinates: bnds Data variables: time_bnds (time, bnds) object 32kB dask.array<chunksize=(1, 2), meta=np.ndarray> lat_bnds (time, lat, bnds) float64 5MB dask.array<chunksize=(1200, 144, 2), meta=np.ndarray> lon_bnds (time, lon, bnds) float64 6MB dask.array<chunksize=(1200, 192, 2), meta=np.ndarray> gpp (time, lat, lon) float32 219MB dask.array<chunksize=(1, 144, 192), meta=np.ndarray> sftlf (lat, lon) float32 111kB ... areacella (lat, lon) float32 111kB ... Attributes: (12/46) Conventions: CF-1.7 CMIP-6.2 activity_id: CMIP branch_method: standard branch_time_in_child: 0.0 branch_time_in_parent: 113400.0 creation_date: 2019-07-04T10:57:56Z ... ... title: UKESM1-0-LL output prepared for CMIP6 variable_id: gpp variant_label: r2i1p1f2 license: CMIP6 model data produced by the Met Office Hadle... cmor_version: 3.4.0 tracking_id: hdl:21.14100/8a19464e-4fff-4ccd-b45c-6c0c79f7e70a
What makes this particular example difficult is that the cell measures for this model are only found in the piControl
experiment, for the r1i1p1f2
variant. Our methods finds the right measures, which you can see by printing out the session log and looking for which areacella
files are downloaded / accessed.
print(cat.session_log())
2024-05-02 17:59:43 search begin variable_id=['gpp'], source_id=['UKESM1-0-LL'], variant_label=['r2i1p1f2'], frequency=['mon'], experiment_id=['historical'], type=['Dataset'], project=['CMIP6'], latest=[True], retracted=[False]
2024-05-02 17:59:44 combine_time=0.01
2024-05-02 17:59:44 search end total_time=0.94
2024-05-02 17:59:44 file info begin
2024-05-02 17:59:45 file info end total_time=1.26
2024-05-02 17:59:45 begin move_data
2024-05-02 17:59:49 transfer_time=3.53 [s] at 6.71 [Mb s-1] http://esgf-node.ornl.gov/thredds/fileServer/css03_data/CMIP6/CMIP/MOHC/UKESM1-0-LL/historical/r2i1p1f2/Lmon/gpp/gn/v20190708/gpp_Lmon_UKESM1-0-LL_historical_r2i1p1f2_gn_195001-201412.nc
2024-05-02 17:59:53 transfer_time=7.80 [s] at 5.22 [Mb s-1] http://esgf-node.ornl.gov/thredds/fileServer/css03_data/CMIP6/CMIP/MOHC/UKESM1-0-LL/historical/r2i1p1f2/Lmon/gpp/gn/v20190708/gpp_Lmon_UKESM1-0-LL_historical_r2i1p1f2_gn_185001-194912.nc
2024-05-02 17:59:53 end move_data
2024-05-02 17:59:53 search begin variant_label=['r2i1p1f2'], source_id=['UKESM1-0-LL'], mip_era=['CMIP6'], activity_id=['CMIP'], experiment_id=['historical'], grid_label=['gn'], table_id=['fx', 'Ofx'], variable_id=['sftlf'], type=['Dataset'], project=['CMIP6'], latest=[True], retracted=[False]
2024-05-02 17:59:54 search end no results
2024-05-02 17:59:54 search begin source_id=['UKESM1-0-LL'], mip_era=['CMIP6'], activity_id=['CMIP'], experiment_id=['historical'], grid_label=['gn'], table_id=['fx', 'Ofx'], variable_id=['sftlf'], type=['Dataset'], project=['CMIP6'], latest=[True], retracted=[False]
2024-05-02 17:59:56 search end no results
2024-05-02 17:59:56 search begin source_id=['UKESM1-0-LL'], mip_era=['CMIP6'], activity_id=['CMIP'], grid_label=['gn'], table_id=['fx', 'Ofx'], variable_id=['sftlf'], type=['Dataset'], project=['CMIP6'], latest=[True], retracted=[False]
2024-05-02 17:59:57 combine_time=0.00
2024-05-02 17:59:57 search end total_time=1.23
2024-05-02 17:59:57 file info begin
2024-05-02 17:59:58 file info end total_time=1.03
2024-05-02 17:59:58 begin move_data
2024-05-02 17:59:58 transfer_time=0.07 [s] at 1.23 [Mb s-1] http://esgf-node.ornl.gov/thredds/fileServer/css03_data/CMIP6/CMIP/MOHC/UKESM1-0-LL/piControl/r1i1p1f2/fx/sftlf/gn/v20190705/sftlf_fx_UKESM1-0-LL_piControl_r1i1p1f2_gn.nc
2024-05-02 17:59:58 end move_data
2024-05-02 17:59:58 search begin variant_label=['r2i1p1f2'], source_id=['UKESM1-0-LL'], mip_era=['CMIP6'], activity_id=['CMIP'], experiment_id=['historical'], grid_label=['gn'], table_id=['fx', 'Ofx'], variable_id=['areacella'], type=['Dataset'], project=['CMIP6'], latest=[True], retracted=[False]
2024-05-02 18:00:00 search end no results
2024-05-02 18:00:00 search begin source_id=['UKESM1-0-LL'], mip_era=['CMIP6'], activity_id=['CMIP'], experiment_id=['historical'], grid_label=['gn'], table_id=['fx', 'Ofx'], variable_id=['areacella'], type=['Dataset'], project=['CMIP6'], latest=[True], retracted=[False]
2024-05-02 18:00:02 search end no results
2024-05-02 18:00:02 search begin source_id=['UKESM1-0-LL'], mip_era=['CMIP6'], activity_id=['CMIP'], grid_label=['gn'], table_id=['fx', 'Ofx'], variable_id=['areacella'], type=['Dataset'], project=['CMIP6'], latest=[True], retracted=[False]
2024-05-02 18:00:05 combine_time=0.00
2024-05-02 18:00:05 search end total_time=2.73
2024-05-02 18:00:05 file info begin
2024-05-02 18:00:06 file info end total_time=1.12
2024-05-02 18:00:06 begin move_data
2024-05-02 18:00:06 transfer_time=0.06 [s] at 1.01 [Mb s-1] http://esgf-node.ornl.gov/thredds/fileServer/css03_data/CMIP6/CMIP/MOHC/UKESM1-0-LL/piControl/r1i1p1f2/fx/areacella/gn/v20190705/areacella_fx_UKESM1-0-LL_piControl_r1i1p1f2_gn.nc
2024-05-02 18:00:06 end move_data