Adding and combining remote files in Toucan
Last updated
Last updated
Toucan ca read files over a network connection for a large variety of protocols.
In order to add a remote file in Toucan, use the code mode of file settings.
Follow the following steps to access it:
Upload a csv empty file (or any small random csv file) within Toucan
Switch to code mode within the configuration interface
Replace the fields of the code block depending on the distant file server and configuration associated. Refer to the sections below the fields to fill.
Save File settings. A new file should appear in the listing of files (in datasources). A dataset will be also automatically created.
Example with a CSV file on Dropbox.
We support:
ftp (as well as sftp or ftps),
http (and https),
S3 and
a long list of other schemes (‘mms’, ‘hdl’, ‘telnet’, ‘rsync’, ‘gopher’, ‘prospero’, ‘shttp’, ‘ws’, ‘https’, ‘http’, ‘sftp’, ‘rtsp’, ‘nfs’, ‘rtspu’, ‘svn’, ‘git’, ‘sip’, ‘snews’, ‘tel’, ‘nntp’, ‘wais’, ‘svn+ssh’, ‘ftp’, ‘ftps’, ‘file’, ‘sips’, ‘git+ssh’, ‘imap’, ‘wss’).
Mandatory: access to a FTP server and to Toucan staging mode on your workspace
Open Filezilla or any FTP client
Copy the URL corresponding to the location of the file on the FTP server it should look like this:
"ftp" is the protocol used, "user" and "password" are the login credentials "example.com" is the domain of the server, and "/pub/file.txt" is the full path to the file on the server.
Important
💡Contact us via help@toucantoco.com or your Delivery contact to set up a hidden password in the URL
Paste this url in the file field and modify the other configuration fields configuration as explained above in Adding a remote file in Toucan
A new file should appear in the listing of files (in datasources). A dataset will be also automatically created.
You can send data to Toucan Toco FTP Server with the following credentials:
Host: ftps.toucantoco.com
Port: 990 (for the connection) and range 64000-64321 (for data transfert)
Protocol: FTPS (if you use FileZilla it’s implicit FTP over TLS
)
Mode: Passive Mode
User : Given by the Toucan Toco Team
Password: Given by the Toucan Toco Team
The access key and secret key for your data files hosted on S3 buckets can be configured this way:
For example:
Note
If your access key or secret key contains special characters such as “/”, “@” or “:” you have to encode them. URL encoding converts special characters into a format that can be transmitted over the Internet. You will find more infos about this topic here (as well as an automatic encoder).
Toucan Toco can provide a S3 bucket with a dedicated AWS IAM user related to your instance.
Thus you will be able to configure your datasources block with a special configuration as following:
Note
If you are using a custom domain name for your S3 bucket using minio per example. Here is the syntax you should use
In the previous page, we saw how to add remote files in Toucan. Read the previous page first, before going further with this one.
In this page, we will discover how to combine several remote files into one file.
You can load multiple files - uploaded on our server or on a FTP/S3 server - in a unique file with the option match: true
. The dataset that will be created from the file will contain a column __filename__
corresponding to the origin file of the row.
Tutorial
Your corporation has now a new file of data each month : data-product-corporation-201801.csv, data-product-corporation-201802.csv … You want them to be loaded in a single domain called data-product-corpo
Find the regular expression (regex) that matches your files with regex101.com.
data-product-corporation-\d{6}\.csv
Don’t forget to use ‘^’ and ‘$’ to be more restrictive.
^data-product-corporation-\d{6}\.csv$
Add a backslach to escape backslaches.
^data-product-corporation-\\d{6}\\.csv$
Copy your regular expression in the "file" option of your datasource block
Add the option match: true
Example of content for FTP (with authentication):
Example of content for S3 (with authentication):