Skip to content

Conversation

shoyer
Copy link
Member

@shoyer shoyer commented Aug 20, 2025

This PR adds three features to the DataTree.from_dict constructor:

  1. It supports DataArray objects and anything that can be coerced into a DataArray via the Dataset constructor.
  2. It adds a coords argument for explicitly specifying coordinates.
  3. It adds support for nested dictionary values, which are automatically unflattened.

Fixes #9539, #9486

  • Tests added
  • User visible changes (including notable bug fixes) are documented in whats-new.rst

@github-actions github-actions bot added the topic-DataTree Related to the implementation of a DataTree class label Aug 20, 2025
@eni-awowale
Copy link
Collaborator

Thanks for adding this at @shoyer! Do you think it would be worth adding another round trip test for DataArray.from_dict and DataTree.from_dict?

Here is what I did based on DataArray.from_dict

d = {
    "coords": {
        "t": {"dims": "t", "data": [0, 1, 2], "attrs": {"units": "s"}}
    },
    "attrs": {"title": "air temperature"},
    "dims": "t",
    "data": [10, 20, 30],
    "name": "a",
}
da = xr.DataArray.from_dict(d)

dt = xr.DataTree.from_dict({'/a': da}
xr.testing.assert_identical(dt.a, da)


Or equivalently from a dict of values coercible to DataArray objects:

>>> dt2 = DataTree.from_dict({"/a": 1, "/b/c": 2, "/b/d": 3}, coords={"/x": 0})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this supposed to create a group with the name"/a"? On my end I am just seeing ('/', '/b') and the data variable with the name "a".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example was supposed to show creating a variable "a". I've replaced this with something a bit more concrete to illustrate the intended usage.

@shoyer
Copy link
Member Author

shoyer commented Aug 20, 2025

Thanks for adding this at @shoyer! Do you think it would be worth adding another round trip test for DataArray.from_dict and DataTree.from_dict?

So unfortunately this doesn't work. The issue is that DataTree.from_dict works fundamentally different from Dataset.from_dict and DataArray.from_dict: #9074

  • Dataset.from_dict and DataArray.from_dict parse "pure" Python dictionaries in the form you show above (e.g., {"coords": ..., "data": ..., "dims": ...})
  • In contrast, DataArray.from_dict expects xarray data structures in values.

I'm not sure it makes sense to combine both in a single function. In particular, there is some ambiguity about whether dict value should be flattened (which I've added here), e.g., does {"foo": {"data": 1}} mean a DataArray at /foo/data or at /foo?

Instead, I think we should have a dedicated methods to_pure_dict()/from_pure_dict() or add pure keyword argument for controlling the output argument. Ideally we would make this consistent across DataTree/Dataset/DataArray, too. Given that pure dictionaries are relatively niche compared to this alternate constructor, I would lean towards renaming the Dataset/DataArray methods.

@shoyer shoyer changed the title Support DataArray objects in DataTree.from_dict Support DataArray objects and nested dicts in DataTree.from_dict Aug 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-DataTree Related to the implementation of a DataTree class topic-typing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Creating DataTree from DataArrays
2 participants