🔌Setting a Databricks Connector

Databricks provides an SQL API to read & extract data from clusters hosted on the main cloud providers. This connector provides an interface to execute SQL queries against this API.

Configuring the Databricks connector in Toucan

Retrieve ODBC connection information from Databricks as described here

Fill connection parameters:

  • NAME: name given to your connector

  • HOST: usually in the format my-databricks-cluster.cloudprodiverdatabricks.net, you can retrieve it from your cluster’s configuration

  • PORT: default is 443

  • HTTPPATH: sql/protocol/v1/o/xxx/yyy, you can retrieve it from Databricks UI cluster’s configuration in the ‘ODBC’ section

  • PWD: your access token (generated from Databricks UI in user settings), usually in this format dapixxxxxx

  • ON DEMAND: if your cluster is self-stopping, make sure to tick this option. With this option, the connector will try to start the cluster if it’s stopped before any query

  • Then you can finally hit the TEST CONNECTION button

Databricks Connection form
Databricks Connection form

Selecting data from Databricks

To create a dataset from Databricks, click on the "create from icon", you will then be able to:

  • QUERY: the SQL query you want to run

  • PARAMETERS (optional): dict, allows to parameterize the query.

We specifically designed this connector to handle DATA REFRESH from an on-demand clusters. During this process, the connector will try to start the cluster and wait for it to be ready before running queries.*

Last updated

Was this helpful?