Skip to main content

Databricks Security Basics β€” Tokens, Users & Groups

You’re back at ShopWave, our fictional retail company.
You’ve set up your first notebooks, clusters, and dashboards. Everything seems perfectβ€”until your manager asks:

β€œHow do we make sure only the right people can access sensitive data?”

Welcome to the world of Databricks Security.


πŸ›‘οΈ Why Security Matters in Databricks​

Databricks houses valuable business data, including:

  • Customer PII
  • Sales transactions
  • Payment info
  • ML models
  • Inventory forecasts

Without proper security:

  • Analysts might accidentally access restricted tables
  • Notebooks could be shared outside the team
  • Jobs and pipelines could be modified by unauthorized users

Security in Databricks ensures access is controlled, data is protected, and compliance is maintained.


πŸ‘€ Users β€” Who Can Log In?​

A user is anyone with a Databricks account.
Each user has:

  • Login credentials (email/password, SSO)
  • Assigned roles
  • Permissions to access workspace resources

At ShopWave:

  • Alice is a data engineer
  • Bob is a data scientist
  • Carol is a business analyst

Each has different privileges according to their role.


πŸ‘₯ Groups β€” Organize Users Efficiently​

Instead of assigning permissions individually, Databricks uses groups:

  • Engineers group β†’ full cluster and notebook access
  • Analysts group β†’ read access to dashboards and tables
  • Data scientists group β†’ access to ML features and Delta tables

Benefits:

  • Easier management for large teams
  • Consistent access policies
  • Quick onboarding of new employees

ShopWave creates groups for each department to simplify security management.


πŸ”‘ Personal Access Tokens β€” Programmatic Access​

Sometimes, scripts or notebooks need to access Databricks without a password.

Enter personal access tokens:

  • Used for API access
  • Can be time-limited
  • Can be revoked at any time

Example use cases at ShopWave:

  • CI/CD pipelines fetching notebooks
  • Automated ETL jobs reading Delta tables
  • External apps running queries via the Databricks REST API

πŸ›οΈ Access Control Levels​

Databricks provides layered access control:

LevelDescription
WorkspaceWho can see notebooks, folders, repos
ClusterWho can start, edit, or terminate clusters
Data / TablesWho can read, write, or manage Delta tables
JobsWho can create, schedule, or run jobs
Account-levelAdmins controlling global workspace settings

ShopWave enforces least privilege principle: each user only gets access needed for their job.


πŸ” Security Best Practices​

  1. Use groups instead of individual permissions
  2. Enable SSO (Single Sign-On) for authentication
  3. Rotate personal access tokens regularly
  4. Audit workspace activity using Unity Catalog logs
  5. Enforce multi-factor authentication (MFA)
  6. Apply table-level and row-level security for sensitive data

Following these practices prevents accidental leaks and ensures compliance.


🧠 Story Recap β€” ShopWave Security in Action​

  1. Alice (Engineer) runs ETL jobs on clusters β†’ belongs to Engineers Group
  2. Bob (Data Scientist) trains ML models β†’ belongs to Data Scientists Group
  3. Carol (Analyst) queries dashboards β†’ belongs to Analysts Group
  4. Tokens are issued for API automation β†’ securely revoked when done
  5. Admin monitors access β†’ ensures everyone follows least privilege

Result: ShopWave keeps data safe, while teams remain productive.


🏁 Quick Summary​

  • Users are individual Databricks accounts; Groups manage access collectively.
  • Personal Access Tokens allow secure programmatic access.
  • Access control layers include workspace, clusters, tables, and jobs.
  • Security best practices: SSO, MFA, auditing, least privilege, and token rotation.
  • Proper security ensures data protection, compliance, and team productivity.

πŸš€ Coming Next

πŸ‘‰ Databricks DBFS β€” Internal File System Explained

Career