📁Adding and combining remote files in Toucan
Sometimes your data files are uploaded on remote servers, and it is simpler to keep them there. In this case just put your url as the file path! We will read the file over a network connection according to the given url scheme.
Adding a remote file in Toucan
In order to add a remote file in Toucan, we will use the code mode of file settings.
Follow the following steps to have access to it:
Create an empty csv file on your computer
Upload your empty file (or any random csv file) within Toucan
Switch to code mode within the configuration interface
Replace the code block depending on the distant file server and the configuration. Refer to the sections below to determine the code you should use.
Save your file and confirm the saving. A new file should appear in the listing of files (in datasources). A dataset will be also automatically created.
For example this is how you can read a CSV file directly on Dropbox or your FTP server.
We support ftp (as well as sftp or ftps), http (and https), S3 and a long list of other schemes (‘mms’, ‘hdl’, ‘telnet’, ‘rsync’, ‘gopher’, ‘prospero’, ‘shttp’, ‘ws’, ‘https’, ‘http’, ‘sftp’, ‘rtsp’, ‘nfs’, ‘rtspu’, ‘svn’, ‘git’, ‘sip’, ‘snews’, ‘tel’, ‘nntp’, ‘wais’, ‘svn+ssh’, ‘ftp’, ‘ftps’, ‘file’, ‘sips’, ‘git+ssh’, ‘imap’, ‘wss’).
FTP Server
When your data files are too big to be transferred via the studio data upload interface, you can store them in a FTP server. The FTP server can either be in the Toucan Toco side (ask for support to set it up) or in your side.
Tutorial : Product Corporation
You need an access to your FTP server.
Open Filezilla
Connect to your FTP and look at what files are available.
Right-click on the one you want to use
Select “Copy URL(s) to clipboard”
Add your password to the url generated
Important
💡 If you don’t want to write the password in your etl_config file! Contact us via help@toucantoco.com or your Delivery contact to set up a hidden password.
Paste the url to your datasource block
Your datasource block is now ready
Toucan Toco FTP Server
You can send data to Toucan Toco FTP Server with the following credentials:
Host: ftps.toucantoco.com
Port: 990 (for the connection) and range 64000-64321 (for data transfert)
Protocol: FTPS (if you use FileZilla it’s
implicit FTP over TLS
)Mode: Passive Mode
User : Given by the Toucan Toco Team
Password: Given by the Toucan Toco Team
S3 Bucket
The access key and secret key for your data files hosted on S3 buckets can be configured this way:
For example:
Note
If your access key or secret key contains special characters such as “/”, “@” or “:” you have to encode them. URL encoding converts special characters into a format that can be transmitted over the Internet. You will find more infos about this topic here (as well as an automatic encoder).
Toucan Toco can provide a S3 bucket with a dedicated AWS IAM user related to your instance.
Thus you will be able to configure your datasources block with a special configuration as following:
Note
If you are using a custom domain name for your S3 bucket using minio per example. Here is the syntax you should use
Combining a remote file in Toucan
In the previous page, we saw how to add remote files in Toucan. Read the previous page first, before going further with this one.
In this page, we will discover how to combine several remote files into one file.
You can load multiple files - uploaded on our server or on a FTP/S3 server - in a unique file with the option match: true
. The dataset that will be created from the file will contain a column __filename__
corresponding to the origin file of the row.
Tutorial
Your corporation has now a new file of data each month : data-product-corporation-201801.csv, data-product-corporation-201802.csv … You want them to be loaded in a single domain called data-product-corpo
Find the regular expression (regex) that matches your files with regex101.com.
data-product-corporation-\d{6}\.csv
Don’t forget to use ‘^’ and ‘$’ to be more restrictive.
^data-product-corporation-\d{6}\.csv$
Add a backslach to escape backslaches.
^data-product-corporation-\\d{6}\\.csv$
Copy your regular expression in the "file" option of your datasource block
Add the option
match: true
Example of content for FTP (with authentication):
Example of content for S3 (with authentication):
Last updated