databricks-unity-catalog
$
npx mdskill add databricks/databricks-agent-skills/databricks-unity-catalogQuery Unity Catalog system tables and manage volume files.
- Access audit logs, lineage, billing, and compute metrics.
- Integrates with Unity Catalog system tables and volume APIs.
- Executes queries based on user intent and data requirements.
- Returns structured results for system tables and file lists.
SKILL.md
.github/skills/databricks-unity-catalogView on GitHub ↗
--- name: databricks-unity-catalog description: "Unity Catalog system tables and volumes. Use when querying system tables (audit, lineage, billing) or working with volume file operations (upload, download, list files in /Volumes/)." --- # Unity Catalog Guidance for Unity Catalog system tables, volumes, and governance. ## When to Use This Skill Use this skill when: - Working with **volumes** (upload, download, list files in `/Volumes/`) - Querying **lineage** (table dependencies, column-level lineage) - Analyzing **audit logs** (who accessed what, permission changes) - Monitoring **billing and usage** (DBU consumption, cost analysis) - Tracking **compute resources** (cluster usage, warehouse metrics) - Reviewing **job execution** (run history, success rates, failures) - Analyzing **query performance** (slow queries, warehouse utilization) - Profiling **data quality** (data profiling, drift detection, metric tables) ## Reference Files | Topic | File | Description | |-------|------|-------------| | System Tables | [references/5-system-tables.md](references/5-system-tables.md) | Lineage, audit, billing, compute, jobs, query history | | Volumes | [references/6-volumes.md](references/6-volumes.md) | Volume file operations, permissions, best practices | | Data Profiling | [references/7-data-profiling.md](references/7-data-profiling.md) | Data profiling, drift detection, profile metrics | ## Quick Start ### Create Unity Catalog Objects (CLI) **IMPORTANT**: Use `--json` for creating UC objects. Positional args vary by command and version. ```bash # Create a catalog databricks catalogs create my_catalog # Create a schema (args: NAME CATALOG_NAME — positional, name first) databricks schemas create my_schema my_catalog # Create a volume (args: CATALOG_NAME SCHEMA_NAME NAME VOLUME_TYPE — catalog first) databricks volumes create my_catalog my_schema my_volume MANAGED # List catalogs, schemas, volumes databricks catalogs list databricks schemas list my_catalog databricks volumes list my_catalog.my_schema ``` ### Volume File Operations (CLI) `databricks fs` requires the `dbfs:` scheme prefix even for UC Volume paths — without it the CLI treats the path as local filesystem and errors with `no such directory`. ```bash # List files in a volume databricks fs ls dbfs:/Volumes/catalog/schema/volume/path/ # Upload a directory's contents to a volume (-r copies contents, not the directory itself) databricks fs cp -r --overwrite /tmp/data dbfs:/Volumes/catalog/schema/volume/dest # Download a file from a volume databricks fs cp dbfs:/Volumes/catalog/schema/volume/file.csv /tmp/file.csv # Create a directory in a volume databricks fs mkdirs dbfs:/Volumes/catalog/schema/volume/new_folder ``` ### Enable System Tables Access ```sql -- Grant access to system tables GRANT USE CATALOG ON CATALOG system TO `data_engineers`; GRANT USE SCHEMA ON SCHEMA system.access TO `data_engineers`; GRANT SELECT ON SCHEMA system.access TO `data_engineers`; ``` ### Common Queries ```sql -- Table lineage: What tables feed into this table? SELECT source_table_full_name, source_column_name FROM system.access.table_lineage WHERE target_table_full_name = 'catalog.schema.table' AND event_date >= current_date() - 7; -- Audit: Recent permission changes SELECT event_time, user_identity.email, action_name, request_params FROM system.access.audit WHERE action_name LIKE '%GRANT%' OR action_name LIKE '%REVOKE%' ORDER BY event_time DESC LIMIT 100; -- Billing: DBU usage by workspace SELECT workspace_id, sku_name, SUM(usage_quantity) AS total_dbus FROM system.billing.usage WHERE usage_date >= current_date() - 30 GROUP BY workspace_id, sku_name; ``` ## SQL Queries via CLI Use `databricks experimental aitools tools query` for system table queries: ```bash # Query lineage via CLI databricks experimental aitools tools query --warehouse WAREHOUSE_ID " SELECT source_table_full_name, target_table_full_name FROM system.access.table_lineage WHERE event_date >= current_date() - 7 " ``` ## Best Practices 1. **Filter by date** - System tables can be large; always use date filters 2. **Use appropriate retention** - Check your workspace's retention settings 3. **Grant minimal access** - System tables contain sensitive metadata 4. **Schedule reports** - Create scheduled queries for regular monitoring ## Related Skills - **databricks-pipelines** - for pipelines that write to Unity Catalog tables - **databricks-jobs** - for job execution data visible in system tables - **[databricks-synthetic-data-gen](../databricks-synthetic-data-gen/SKILL.md)** - for generating data stored in Unity Catalog Volumes - **[databricks-aibi-dashboards](../databricks-aibi-dashboards/SKILL.md)** - for building dashboards on top of Unity Catalog data ## Resources - [Unity Catalog System Tables](https://docs.databricks.com/administration-guide/system-tables/) - [Audit Log Reference](https://docs.databricks.com/administration-guide/account-settings/audit-logs.html)