🔌Setting up an AWS S3 connector

Configuring the AWS S3 connector in Toucan

The AWS S3 connector lets you access files hosted in an AWS S3 bucket. We use AWS STS (Security token Service) to authenticate to the S3 bucket via the Assume Role function.

Fill the connection parameters:

  • NAME* : the name of your connector.

  • BUCKET NAME* : the S3 bucket name you want to query data from

  • RetryPolicy: Boolean allows to configure a retry policy if the connection is flaky.

    • max attempts: maximum number of retries before giving up

    • max_delay: in seconds, above the connection is dropped

    • wait_time: time in seconds between each retry

  • SLOW QUERIES' CACHE EXPIRATION TIME:

  • PREFIX : a prefix for your object like a path folder e.g. : marketing/

  • ROLE ARN* : AWS Amazon Ressources Names (ARN), identifier that provides access to AWS ressources, configured with policies. Will be given to you by Toucan support

  • EXTERNAL ID* : already set, represents an ID used in AWS policy configuration

After entering those informations, you can test the connection with AWS S3 bucket, to make sure your inputs are correct and working.

If all settings are valid, you are going to have a success message like this

After successfully configuring the connector, you will be able to find it in the Connector section of the DataHub "Datasource" tab

Selecting data from AWS S3

To create a dataset from AWS S3, click on the "create from icon", you will then be able to:

  • Select a file hosted in your S3 bucket

After selecting data from your connector you will be able to create a dataset thanks to YouPrep using the selection as "source step".

Last updated