Reproducibility¶
If you are using ESGF data in an analysis publication, the journal to which you are submitting may require that you provide data citations or availability. While we are working on improving this in ESGF, we also wanted to highlight the current functionality. Consider the following query assumed to be used in an unspecified analysis. For comparison, we will print the underlying dataframe to show the results of the search.
cat = ESGFCatalog().search(
experiment_id="historical",
source_id="CanESM5",
variable_id=["gpp", "tas", "nbp"],
variant_label=["r1i1p1f1"],
frequency="mon",
)
cat.df
table_id | experiment_id | institution_id | variable_id | datetime_start | activity_drs | version | member_id | source_id | datetime_stop | grid_label | mip_era | project | id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Amon | historical | CCCma | tas | 1850-01-16T12:00:00Z | CMIP | 20190429 | r1i1p1f1 | CanESM5 | 2014-12-16T12:00:00Z | gn | CMIP6 | CMIP6 | [CMIP6.CMIP.CCCma.CanESM5.historical.r1i1p1f1.... |
1 | Lmon | historical | CCCma | nbp | 1850-01-16T12:00:00Z | CMIP | 20190429 | r1i1p1f1 | CanESM5 | 2014-12-16T12:00:00Z | gn | CMIP6 | CMIP6 | [CMIP6.CMIP.CCCma.CanESM5.historical.r1i1p1f1.... |
2 | Lmon | historical | CCCma | gpp | 1850-01-16T12:00:00Z | CMIP | 20190429 | r1i1p1f1 | CanESM5 | 2014-12-16T12:00:00Z | gn | CMIP6 | CMIP6 | [CMIP6.CMIP.CCCma.CanESM5.historical.r1i1p1f1.... |
In the course of the analysis, you would download the datasets into a dictionary.
dsd = cat.to_dataset_dict(add_measures=False)
Then you may loop through the datasets and pull out the tracking_id
from the
global attributes of each dataset.
tracking_ids = [ds.tracking_id for _,ds in dsd.items()]
for tracking_id in tracking_ids:
print(tracking_id)
hdl:21.14100/387658c8-f085-4ab8-995c-def848e7d856
hdl:21.14100/872062df-acae-499b-aa0f-9eaca7681abc
hdl:21.14100/52656bcc-3758-463b-964f-ef8863a6424a
The tracking_id
is similar to a digital object identifier (DOI) and can be
provided in some form in your paper or supplemental material to be precise about
what ESGF data you used. If you have a list of tracking_id
s, then you can pass
them into from_tracking_ids()
to reproduce the catalog.
new_cat = ESGFCatalog().from_tracking_ids(tracking_ids)
new_cat.df
table_id | experiment_id | institution_id | variable_id | activity_drs | version | member_id | source_id | grid_label | mip_era | project | id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Lmon | historical | CCCma | gpp | CMIP | 1 | r1i1p1f1 | CanESM5 | gn | CMIP6 | CMIP6 | [CMIP6.CMIP.CCCma.CanESM5.historical.r1i1p1f1.... |
1 | Amon | historical | CCCma | tas | CMIP | 1 | r1i1p1f1 | CanESM5 | gn | CMIP6 | CMIP6 | [CMIP6.CMIP.CCCma.CanESM5.historical.r1i1p1f1.... |
2 | Lmon | historical | CCCma | nbp | CMIP | 1 | r1i1p1f1 | CanESM5 | gn | CMIP6 | CMIP6 | [CMIP6.CMIP.CCCma.CanESM5.historical.r1i1p1f1.... |
If you visually compare cat
with new_cat
you will see that they are the
same. From here you may interact with the new catalog and recover the data you
used if needed. This can also be used to quickly communicate the colleagues
which data should be used.