# Add a Databricks connector

{% hint style="warning" %}
This connector support ‘on-demand’ clusters i.e.: self stopping clusters. Make sure to tick the `ON DEMAND` parameter on the connector’s configuration form to handle queries on a stopped cluster.

Live datasets might not work properly in case of self stopped cluster
{% endhint %}

{% hint style="warning" %}
The relevant **driver** must be installed and configured on your Toucan Toco workspace
{% endhint %}

## Connector features

You can use the Toucan Databricks connector to connect to your Databricks account with a Personal Access token and access `tables` or `views` with a SQL query.

With this connection, you can fetch data from your Snowflake to fill your charts and dashboards.

## Configuring a Databricks connection in Toucan

{% hint style="info" %}
Retrieve ODBC connection information from Databricks as described [here](https://docs.databricks.com/integrations/bi/jdbc-odbc-bi.html)
{% endhint %}

Follow the steps described in [..](https://docs-v3.toucantoco.com/data-management-in-datahub/datasources-in-toucan/managing-connectors/setting-up-a-connector "mention"), choose `Databricks` and fill out the form with the following info:

<table><thead><tr><th width="172.0390625">Field</th><th width="145.6015625">Format / Type</th><th>Description</th><th>Example</th></tr></thead><tbody><tr><td>Name (mandatory)</td><td>String</td><td>Use it to identify your connection</td><td><em>MyDatabricksConnection</em></td></tr><tr><td>Host (mandatory)</td><td>String</td><td>hostname of databricks cluster can be found the cluster configuration</td><td><em>my-databricks-cluster.cloudprodiverdatabricks.net</em></td></tr><tr><td>Port (mandatory)</td><td>Integer</td><td>The listening port of your Databricks cluster</td><td><em>443</em> (default)</td></tr><tr><td>Http Path (mandatory)</td><td>String</td><td>Databricks compute resources URL, can be retrieved from Databricks UI cluster’s configuration in the ‘ODBC’ section</td><td><em>sql/protocol/v1/o/xxx/yyy</em></td></tr><tr><td>User (mandatory)</td><td>String</td><td><code>token</code>"if you use a personal access token PAT,<br><br>or username if you connect by username/password (deprecated since July 2024)</td><td><em>databricks_user</em></td></tr><tr><td>Password (mandatory)</td><td>String</td><td>Access token (generated from Databricks UI in user settings) (will be stored as a secret)</td><td><em>dapixxxxxx</em></td></tr><tr><td>ANSI</td><td>Boolean</td><td>Enforce compliance with the ANSI SQL standard for SQL operations and behaviors</td><td></td></tr><tr><td>On Demand</td><td>Boolean</td><td><strong>if your cluster is self-stopping, make sure to tick this option</strong>. With this option, the connector will try to start the cluster if it’s stopped before any query</td><td></td></tr><tr><td>Retry Policy (optional)</td><td>Boolean</td><td><p><em>Boolean</em> allows to configure a retry policy if the connection is flaky.</p><ul><li>max attempts: maximum number of retries before giving up</li><li>max_delay: in seconds, above the connection is dropped</li><li>wait_time: time in seconds between each retry</li></ul></td><td></td></tr><tr><td>Slow Queries' Cache Expiration Time</td><td>Integer</td><td>Slow queries' cache expiration time</td><td></td></tr></tbody></table>

Click on the `TEST CONNECTION` button then `SAVE` the connection

{% hint style="success" %}
After successfully configuring the connector, you will be able to find it in the Connector section of the DataHub "Datasource" tab
{% endhint %}

{% hint style="warning" %}
If the cluster is stopped, the connection test might fail, but you can `SAVE` the configuration anyway
{% endhint %}

## Create a dataset from a Databricks connection

{% hint style="warning" %}
Please note that in case of a shutdown cluster, the query preview & live queries might be broken as of current state of the implementation. In such situations, the connector tries to start the cluster and wait for the cluster to be started. If you plan to use the connector in an ‘on-demand’ fashion (i.e.: with self-stopping clusters) use it only with stored datasets.
{% endhint %}

{% hint style="info" %}
This data connector is only supported in [code/SQL mode](https://docs-v3.toucantoco.com/data-management-in-datahub/datasources-in-toucan/managing-connectors/create-a-dataset-from-a-connector/code-mode-and-single-mode)
{% endhint %}

To create a dataset from Databricks, click on the "create from icon", you will then be able to:

* `QUERY`: the SQL query you want to run
* `PARAMETERS` (optional): dict, allows to parameterize the query.

{% hint style="info" %}
We specifically designed this connector to handle *DATA REFRESH* from an on-demand clusters. During this process, the connector will try to start the cluster and wait for it to be ready before running queries.\*
{% endhint %}

{% hint style="success" %}
After selecting data from your connector you will be able to create a dataset thanks to [YouPrep](https://docs-v3.toucantoco.com/data-management-in-datahub/datasets-in-toucan/preparing-data/overview-of-youprep-tm) using the selection as "source step".
{% endhint %}
