If you are using ESGF data in an analysis publication, the journal to which you are submitting may require that you provide data citations or availability. While we are working on improving this in ESGF, we also wanted to highlight the current functionality. Consider the following query assumed to be used in an unspecified analysis. For comparison, we will print the underlying dataframe to show the results of the search.
cat = ESGFCatalog().search(
experiment_id="historical",
source_id="CanESM5",
variable_id=["gpp", "tas"],
variant_label=["r1i1p1f1"],
frequency="mon",
)
cat.dfIn the course of the analysis, you would download the datasets into a dictionary.
dsd = cat.to_dataset_dict(add_measures=False)Then you may loop through the datasets and pull out the tracking_id from the
global attributes of each dataset.
tracking_ids = [ds.tracking_id for _,ds in dsd.items()]
for tracking_id in tracking_ids:
print(tracking_id)hdl:21.14100/872062df-acae-499b-aa0f-9eaca7681abc
hdl:21.14100/387658c8-f085-4ab8-995c-def848e7d856
The tracking_id is similar to a digital object identifier (DOI) and can be
provided in some form in your paper or supplemental material to be precise about
what ESGF data you used. If you have a list of tracking_ids, then you can pass
them into from_tracking_ids() to reproduce the catalog.
new_cat = ESGFCatalog().from_tracking_ids(tracking_ids)If you visually compare cat with new_cat you will see that they are the
same. From here you may interact with the new catalog and recover the data you
used if needed. This can also be used to quickly communicate the colleagues
which data should be used.