Connecting Databricks to Azure Data Lake Gen 2 Without Mounts

If you’re working with Azure Data Lake Gen 2 (ADLS Gen2) and Databricks, you might want to access your data without using mounts for added flexibility and security. In this blog, I’ll walk you through how to connect Databricks to ADLS Gen2 using access keys or Azure Active Directory (AAD) authentication in a simple and secure way.

Why Avoid Mounts?

Mounting ADLS Gen2 to Databricks is convenient but comes with some limitations:

Mounts are cluster-specific and cannot be shared across workspaces.
You need high-level access to the storage account for mounting, which may not align with your organization’s security policies.

Instead, directly accessing ADLS Gen2 provides better control and flexibility.

Step-by-Step Guide

Step 1: Prerequisites

Before starting, make sure you have:

An Azure Data Lake Gen 2 storage account.
A Databricks workspace.
Access credentials (either Storage Account Access Keys or AAD authentication details).

Step 2: Set Up Access in Databricks

Option 1: Using Access Keys

Get the Access Key
- In the Azure Portal, navigate to your storage account.
- Under Security + Networking, select Access keys.
- Copy one of the access keys.
Configure Access in Databricks
- Open your Databricks notebook and use the following code to set up your configurations:

python

spark.conf.set(“fs.azure.account.key.<STORAGE_ACCOUNT_NAME>.dfs.core.windows.net”, “<ACCESS_KEY>”)

Replace <STORAGE_ACCOUNT_NAME> with the name of your storage account and <ACCESS_KEY> with your copied key.

Access Data
Use the Spark API to access files in your storage account.

python

df = spark.read.csv(“abfss://<CONTAINER_NAME>@<STORAGE_ACCOUNT_NAME>.dfs.core.windows.net/<FILE_PATH>”)

df.show()

Option 2: Using Azure Active Directory (AAD) Authentication

Create an Azure Service Principal
- In Azure Active Directory, register a new app.
- Note the Application (client) ID, Directory (tenant) ID, and generate a client secret.
Grant Storage Permissions
- Navigate to your storage account in Azure.
- Under Access control (IAM), assign the Storage Blob Data Contributor role to your service principal.
Configure Access in Databricks
- Set up the following configurations in your Databricks notebook:

python

spark.conf.set(“fs.azure.account.auth.type.<STORAGE_ACCOUNT_NAME>.dfs.core.windows.net”, “OAuth”)

spark.conf.set(“fs.azure.account.oauth.provider.type.<STORAGE_ACCOUNT_NAME>.dfs.core.windows.net”,

“org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider”)

spark.conf.set(“fs.azure.account.oauth2.client.id.<STORAGE_ACCOUNT_NAME>.dfs.core.windows.net”, “<CLIENT_ID>”)

spark.conf.set(“fs.azure.account.oauth2.client.secret.<STORAGE_ACCOUNT_NAME>.dfs.core.windows.net”, “<CLIENT_SECRET>”)

spark.conf.set(“fs.azure.account.oauth2.client.endpoint.<STORAGE_ACCOUNT_NAME>.dfs.core.windows.net”,

“https://login.microsoftonline.com/<TENANT_ID>/oauth2/token”)

Replace <STORAGE_ACCOUNT_NAME>, <CLIENT_ID>, <CLIENT_SECRET>, and <TENANT_ID> with your values.

Access Data
Now, you can read or write files just as before:

python

df = spark.read.csv(“abfss://<CONTAINER_NAME>@<STORAGE_ACCOUNT_NAME>.dfs.core.windows.net/<FILE_PATH>”)

df.show()

Step 3: Secure Your Credentials

Never hardcode sensitive credentials in your notebooks. Instead:

Use Databricks Secrets to store sensitive data.
Access secrets programmatically in your notebooks:

python

client_id = dbutils.secrets.get(scope=”my_scope”, key=”client_id”)

client_secret = dbutils.secrets.get(scope=”my_scope”, key=”client_secret”)

Conclusion

By using these methods, you can connect Databricks to Azure Data Lake Gen 2 securely without using mounts. Whether you choose Access Keys or AAD authentication, you’ll benefit from a more scalable and secure solution for accessing your data.

ITECHSTORECA

FOR ALL YOUR TECH SOLUTIONS