If you have worked with CMIP data before, you know that cell measure information
like areacella is needed to take proper area-weighted means/summations. Yet
many times, model centers have not uploaded this information uniformly in all
submissions. This can be frustrating for the user.
In intake-esgf, when you call to_dataset_dict(), we perform a search for
each dataset being placed in the dataset dictionary, progressively dropping
facets to find, if possible, the cell measures that are closest to the dataset
being downloaded. Sometimes they are simply in another variant_label, but
other times they could be in a different activity_id. No matter where they
are, we find them for you and add them to your dataset by default (disable with
add_measures=False).
Consider the following search for data with UKESM1-0-LL. We are looking for a land variable gpp, the gross primary productivity.
from intake_esgf import ESGFCatalog
cat = ESGFCatalog().search(
variable_id="gpp",
source_id="UKESM1-0-LL",
variant_label="r2i1p1f2",
frequency="mon",
experiment_id="historical",
)
dsd = cat.to_dataset_dict()The progress bar (not shown) will let you know that we are searching for cell
measure information. We determine which measures need downloaded by looking in
the dataset attributes. Since gpp is a land variable, we see that its
cell_measures ='area: areacella' which indicates that this data should be also
downloaded. However you will also find where land in the cell_methods
meaning that we also need sftlf, the land fractions. If you look at the
resulting dataset, you will find that both have been associated.
What makes this particular example difficult is that the cell measures for this
model are only found in the piControl experiment, for the r1i1p1f2 variant.
Our methods finds the right measures, which you can see by printing out the
session log and looking for which areacella files are downloaded / accessed.
print(cat.session_log())2026-05-26 19:04:37 search begin variable_id=['gpp'], source_id=['UKESM1-0-LL'], variant_label=['r2i1p1f2'], frequency=['mon'], experiment_id=['historical'], type=['Dataset'], project=['CMIP6'], latest=[True], retracted=[False]
2026-05-26 19:04:37 └─GlobusESGFIndex('ESGF2-US-1.5-Catalog') results=2 response_time=0.16
2026-05-26 19:04:37 combine_time=0.01
2026-05-26 19:04:37 search end total_time=0.18
2026-05-26 19:04:37 file info begin
2026-05-26 19:04:37 └─GlobusESGFIndex('ESGF2-US-1.5-Catalog') results=4 response_time=0.09
2026-05-26 19:04:37 combine_time=0.00
2026-05-26 19:04:37 file info end total_time=0.09
2026-05-26 19:04:38 transfer_time=0.33 [s] at 71.49 [Mb s-1] https://g-52ba3.fd635.8443.data.globus.org/css03_data/CMIP6/CMIP/MOHC/UKESM1-0-LL/historical/r2i1p1f2/Lmon/gpp/gn/v20190708/gpp_Lmon_UKESM1-0-LL_historical_r2i1p1f2_gn_195001-201412.nc
2026-05-26 19:04:38 transfer_time=0.42 [s] at 97.82 [Mb s-1] https://g-52ba3.fd635.8443.data.globus.org/css03_data/CMIP6/CMIP/MOHC/UKESM1-0-LL/historical/r2i1p1f2/Lmon/gpp/gn/v20190708/gpp_Lmon_UKESM1-0-LL_historical_r2i1p1f2_gn_185001-194912.nc
2026-05-26 19:04:38 accessed /home/docs/.esgf/CMIP6/CMIP/MOHC/UKESM1-0-LL/historical/r2i1p1f2/Lmon/gpp/gn/v20190708/gpp_Lmon_UKESM1-0-LL_historical_r2i1p1f2_gn_185001-194912.nc
2026-05-26 19:04:38 accessed /home/docs/.esgf/CMIP6/CMIP/MOHC/UKESM1-0-LL/historical/r2i1p1f2/Lmon/gpp/gn/v20190708/gpp_Lmon_UKESM1-0-LL_historical_r2i1p1f2_gn_195001-201412.nc
2026-05-26 19:04:39 └─GlobusESGFIndex('ESGF2-US-1.5-Catalog') results=0 response_time=0.07
2026-05-26 19:04:39 └─GlobusESGFIndex('ESGF2-US-1.5-Catalog') results=0 response_time=0.07
2026-05-26 19:04:39 └─GlobusESGFIndex('ESGF2-US-1.5-Catalog') results=3 response_time=0.08
2026-05-26 19:04:39 └─GlobusESGFIndex('ESGF2-US-1.5-Catalog') results=3 response_time=0.07
2026-05-26 19:04:49 └─GlobusESGFIndex('ESGF2-US-1.5-Catalog') results=0 response_time=0.07
2026-05-26 19:04:49 └─GlobusESGFIndex('ESGF2-US-1.5-Catalog') results=0 response_time=0.07
2026-05-26 19:04:49 └─GlobusESGFIndex('ESGF2-US-1.5-Catalog') results=3 response_time=0.07
2026-05-26 19:04:50 └─GlobusESGFIndex('ESGF2-US-1.5-Catalog') results=3 response_time=0.07