-
Notifications
You must be signed in to change notification settings - Fork 17
Description
The package used to import data from glue catalogs deltalake and delta-rs has the capability to import data from Unity Catalog too.
If it is combined with as an example a "DATABRICKS_TOKEN" with access to the data you can collect a sas token with the databricks.sdk package and from the DeltaTable class with table_uri and storage_options set you can get the file list. If you then add the "adlfs" package which can import "abfs://" protocol but not the UC native "abfss://". Replace this in the filepath given to dd.read_parquet(file_paths, **kwargs) and you can import delta parquet files from Unity Catalog.
The code could look something like this.
from databricks.sdk import WorkspaceClient
from databricks.sdk.service.catalog import TableOperation
w = WorkspaceClient(
host=os.environ["DATABRICKS_HOST"],
token=os.environ["DATABRICKS_TOKEN"]
)
uc_full_url = f"{catalog_name}.{database_name}.{table_name}"
table = w.tables.get(uc_full_url)
temp_credentials = (
w.temporary_table_credentials.generate_temporary_table_credentials(
operation=TableOperation.READ,
table_id=table.table_id
)
)
storage_options = {
"sas_token": temp_credentials.azure_user_delegation_sas.sas_token
}
delta_table = DeltaTable(
table_uri=table.storage_location,
storage_options=storage_options
)
file_paths = [
file_path.replace("abfss://", "abfs://")
for file_path in delta_table.file_uris()
]
I was planning on making a PR with this but wanted to discuss it first since the
file_path.replace("abfss://", "abfs://")
is a little of a hack.
So please let me know what you think.