Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Automatic Cell Measures

If you have worked with CMIP data before, you know that cell measure information like areacella is needed to take proper area-weighted means/summations. Yet many times, model centers have not uploaded this information uniformly in all submissions. This can be frustrating for the user.

In intake-esgf, when you call to_dataset_dict(), we perform a search for each dataset being placed in the dataset dictionary, progressively dropping facets to find, if possible, the cell measures that are closest to the dataset being downloaded. Sometimes they are simply in another variant_label, but other times they could be in a different activity_id. No matter where they are, we find them for you and add them to your dataset by default (disable with add_measures=False).

Consider the following search for data with UKESM1-0-LL. We are looking for a land variable gpp, the gross primary productivity.

from intake_esgf import ESGFCatalog
cat = ESGFCatalog().search(
    variable_id="gpp",
    source_id="UKESM1-0-LL",
    variant_label="r2i1p1f2",
    frequency="mon",
    experiment_id="historical",
)
dsd = cat.to_dataset_dict()

The progress bar (not shown) will let you know that we are searching for cell measure information. We determine which measures need downloaded by looking in the dataset attributes. Since gpp is a land variable, we see that its cell_measures ='area: areacella' which indicates that this data should be also downloaded. However you will also find where land in the cell_methods meaning that we also need sftlf, the land fractions. If you look at the resulting dataset, you will find that both have been associated.

Loading...

What makes this particular example difficult is that the cell measures for this model are only found in the piControl experiment, for the r1i1p1f2 variant. Our methods finds the right measures, which you can see by printing out the session log and looking for which areacella files are downloaded / accessed.

print(cat.session_log())
2026-05-26 19:04:37 search begin variable_id=['gpp'], source_id=['UKESM1-0-LL'], variant_label=['r2i1p1f2'], frequency=['mon'], experiment_id=['historical'], type=['Dataset'], project=['CMIP6'], latest=[True], retracted=[False]
2026-05-26 19:04:37 └─GlobusESGFIndex('ESGF2-US-1.5-Catalog') results=2 response_time=0.16
2026-05-26 19:04:37 combine_time=0.01
2026-05-26 19:04:37 search end total_time=0.18
2026-05-26 19:04:37 file info begin
2026-05-26 19:04:37 └─GlobusESGFIndex('ESGF2-US-1.5-Catalog') results=4 response_time=0.09
2026-05-26 19:04:37 combine_time=0.00
2026-05-26 19:04:37 file info end total_time=0.09
2026-05-26 19:04:38 transfer_time=0.33 [s] at 71.49 [Mb s-1] https://g-52ba3.fd635.8443.data.globus.org/css03_data/CMIP6/CMIP/MOHC/UKESM1-0-LL/historical/r2i1p1f2/Lmon/gpp/gn/v20190708/gpp_Lmon_UKESM1-0-LL_historical_r2i1p1f2_gn_195001-201412.nc
2026-05-26 19:04:38 transfer_time=0.42 [s] at 97.82 [Mb s-1] https://g-52ba3.fd635.8443.data.globus.org/css03_data/CMIP6/CMIP/MOHC/UKESM1-0-LL/historical/r2i1p1f2/Lmon/gpp/gn/v20190708/gpp_Lmon_UKESM1-0-LL_historical_r2i1p1f2_gn_185001-194912.nc
2026-05-26 19:04:38 accessed /home/docs/.esgf/CMIP6/CMIP/MOHC/UKESM1-0-LL/historical/r2i1p1f2/Lmon/gpp/gn/v20190708/gpp_Lmon_UKESM1-0-LL_historical_r2i1p1f2_gn_185001-194912.nc
2026-05-26 19:04:38 accessed /home/docs/.esgf/CMIP6/CMIP/MOHC/UKESM1-0-LL/historical/r2i1p1f2/Lmon/gpp/gn/v20190708/gpp_Lmon_UKESM1-0-LL_historical_r2i1p1f2_gn_195001-201412.nc
2026-05-26 19:04:39 └─GlobusESGFIndex('ESGF2-US-1.5-Catalog') results=0 response_time=0.07
2026-05-26 19:04:39 └─GlobusESGFIndex('ESGF2-US-1.5-Catalog') results=0 response_time=0.07
2026-05-26 19:04:39 └─GlobusESGFIndex('ESGF2-US-1.5-Catalog') results=3 response_time=0.08
2026-05-26 19:04:39 └─GlobusESGFIndex('ESGF2-US-1.5-Catalog') results=3 response_time=0.07
2026-05-26 19:04:49 └─GlobusESGFIndex('ESGF2-US-1.5-Catalog') results=0 response_time=0.07
2026-05-26 19:04:49 └─GlobusESGFIndex('ESGF2-US-1.5-Catalog') results=0 response_time=0.07
2026-05-26 19:04:49 └─GlobusESGFIndex('ESGF2-US-1.5-Catalog') results=3 response_time=0.07
2026-05-26 19:04:50 └─GlobusESGFIndex('ESGF2-US-1.5-Catalog') results=3 response_time=0.07