Skip to content

Python: Automatically convert Pandas types to valid Delta Lake types in write_deltalake() #686

@wjones127

Description

@wjones127

Description

Many Pandas types aren't automatically converted into valid Delta Lake types when converted into Arrow tables. For example, Pandas Timestamps are converted into timestamps with nanosecond precision by default, but Delta Lake only supports microsecond precision. This makes write_deltalake() difficult to use for Pandas users.

We should write a test that validates all Pandas types can be written with write_deltalake() without manual conversion.

I'm not sure yet how to configure the conversion here:

if _has_pandas and isinstance(data, pd.DataFrame):
data = pa.Table.from_pandas(data)

It's possible that we can pass in an adjusted schema to the schema parameter of pyarrow.Table.from_pandas() and that will make the correct conversion.

Use Case

Related Issue(s)

Based on #685

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions