Setting up an HTTP API connector
Last updated
Was this helpful?
Last updated
Was this helpful?
This is a generic connector to get data from any HTTP APIs (REST style APIs). Itโs really customizable and versatile but it implies a more complex configuration.
This type of data source combines the features of Pythonโs library to get data from any API with the filtering langage for flexible transformations of the responses. Optionally, an string can be provided to first parse the XML response and then the jq filter is be applied to get the data in tabular format.
To configure this connector you will need to use the documentation of the API you need to connect to
The type of response the connector has to expect from the queried API.
Make sure you use the correct responsetype
, based on the queried APIโs documentation. Currently JSON & XML are supported, the default being JSON.
Defines how the connector should behave when the network is unreachable:
MAX ATTEMPTS
: number of attempts to do before aborting the connexion
MAX DELAY
: total time to wait before aborting the connexion
WAIT TIME
: time to wait between each attempt
If the connector must use a certificate to establish the connexion, you can provide the path to the certificate.
The authentication method that the connector should use to query the data. AUTHTYPE
Can be:
basic
: username password, you can provide them in
positional arguments
: input your username and password in the right order
named arguments
: input them this way {โusernameโ:โmyusernameโ, โpasswordโ:โmypasswordโ}
digest
: same as above
oAuth1
:
positional arguments
: input client_id (sometimes named client_key) and client_secret. Both are provided by the service you are trying to access
named arguments
: input {โclient_idโ:your_client_id, โclient_secretโ: your_client_secret}.
oAuth2
: (deprecated)
named arguments
: input {โclient_idโ:your_client_id, โclient_secretโ: your_client_secret}.
CustomTokenServer:
provides a flexible mechanism for authenticating API requests using a custom token server. the token you get is then sent in the the Authorization
header prefixed with "Bearer
{{your_token}}"
. In the named arguments
section you have to fill as a json dict the required elements to get your token:
method
: The HTTP method to use when requesting the token (e.g., 'GET', 'POST').
url
: The URL to get the token server.
params
(optional): Query parameters to include in the token request.
data
(optional): Form data to include in the token request body.
headers
(optional): Additional headers to include in the token request.
json
(optional): JSON payload to include in the token request body.
token_header_name
: allows to override the default Authorization
header.
filter
(optional): A JQ-style filter to extract the token from the response. Defaults to "." (root of the JSON response).
For now we only support the Grant Type: Authorization Code
. This section outlines the fiels available for configuring this method
Configuration Type
(dropdown list) : AuthorizationCodeOauth2
(only option available for now)
Authentication URL
(mandatory): the URL used to initiate the OAuth2.0 authorization process. For example:https://auth.api-acme.com/oauth/authorize
Token URL
(mandatory): The URL used to exchange the authorization code for an access token. For example: https://auth.api-acme.com/oauth/token
Scope
(mandatory): The permissions requested from the OAuth2.0 provider. For example: read write profile
Additional authentication params
(optional): a JSON object containing additional URL parameters to be included in the authentication request . For example:{"add_param1": "value_2", "add_param2": "value_2"}
Client Id
(mandatory): The unique identifier for your application, provided by the OAuth2.0 service. For example: client_abc123
Client Secret
(mandatory): The secret key associated with your client ID. For example: secret_xyz789
You can use this object to avoid repetition in data sources. The values of the three attributes will be used or overridden by all data sources using this connector.
json
: a JSON object of parameters to send in the body of every HTTP request made using the configured connector. Example: { โoffsetโ: 100, โlimitโ: 50 }
headers
: a JSON object of parameters to send in the header of every HTTP request made using the configured connector. Example: { โcontent-typeโ: โapplication/xmlโ }
params
: a JSON object of parameters to send in the query string of every HTTP request made using the configured connector. Example: { โoffsetโ: 100, โlimitโ: 50}
proxies
: JSON object expressing a mapping of protocol or host to corresponding proxy. Example {โhttpโ: โfoo.bar:3128โ, โhttp://host.nameโ: โfoo.bar:4012โ}
Endpoint URL
headers
: a JSON object of parameters to send in the header of every HTTP request made using the configured connector. Example: { โcontent-typeโ: โapplication/xmlโ }. Overwrites the headerโs parameter in Template
URL params
: a JSON object of parameters to send in the query string of every HTTP request made using the configured connector. Example: { โoffsetโ: 100, โlimitโ: 50} Overwrites the params parameter in Template
Body
: a JSON object of parameters to send in the body of every HTTP request made using the configured connector. Example: { โdataโ: โmy_parametersโ }.
Advanced
parameters
: A JSON object that will be used for variables interpolation in the query string. For testing purpose only. In production mode, it should be left blank as variable interpolation will be handled by the app requester.
json
: a JSON object of parameters to send in the body of every HTTP request made using the configured connector. Example: { โoffsetโ: 100, โlimitโ: 50 } Overwrites the JSON parameter in Template
proxies
: JSON object expressing a mapping of protocol or host to corresponding proxy. Example {โhttpโ: โfoo.bar:3128โ, โhttp://host.nameโ: โfoo.bar:4012โ} Overwrites the proxies parameter in Template
flatten column
: optional field where you can specify the name of a column that contains nested rows. the column names in the resulting DataFrame will be prefixed with the original column name. Specified more parameters using a ,
delimiter. If specified, the nested rows will be flattened into separate columns in the resulting data frame. Example if you have a column orders: [{"id": 3, "product": "Notebook", "price": 5.99}] results will be separated in orders_id, orders_product and orders_price
data
: Two options, Type1 for a simple string, Type2 for a JSON field. ๐ก you can send XML data with Type1 option
In the connector weโll have a response like this:
And we can then apply a:
Letโs take the JSON defined above
We apply the filter โ.bookstore.book[]โ which means that it will extract the book
list from the bookstore
So we end up with a table of results looking like this:
Harry Potter
29.99
Learning XML
39.95
Note: the reason to have a filter
option is to allow you to take any API response and transform it into something that fits into a column based data frame.
This section presents the pagination support of Toucan. Pagination options allows to setup a configuration which will loop the results of a query until all results are retrieved.
Throttling and large datasets Throttling
We do not support throttling meaning that we do not have a speed limit feature when we request an API. This means we cannot control how quickly requests are sent. As a result, if too many requests are made too quickly, it might trigger an error message saying the system is overloaded. Large datasets Toucan execution preview calls are synchronous, which means that we only have 30 seconds to fetch and transform data. Depending in the query, it could be an issue if you are working on live data, prefer store datasets if it is the case.
Offset Limit (OffsetLimitPaginationConfig)
This configuration type implements the offset/limit pagination pattern.
Parameters
offset_name
: (string) Parameter name for offset (default: offset
)
limit_name
: (string) Parameter name for limit (default: "limit")
limit
: (int) mandatory Number of items per request
data_filter:
(string) mandatory
offset pagination config field to determine which part of data must be used to compute the data length in the form of a JQ filter
Use case: APIs using offset/limit style pagination.
Page-based pagination (PageBasedPaginationConfig)
This configuration implements page-based pagination
Parameters:
page_name
: (string) Parameter name for the page (default: page
)
page
: (int) mandatory Current page number
per_page_name
: (string) Parameter name for items per page
per_page
: (int) Number of items per page
max_page_filter
: (string) JQ filter to extract maximum page number
can_raise_not_found
: (boolean) Whether 404 errors should be treated as end of pagination, must be set if no max_page_filter
is available
Use case: Traditional APIs using page numbers where the information can be found in the response body.
Cursor based pagination (CursorBasedPaginationConfig)
This configuration implements cursor-based pagination
Parameters:
cursor_name
: (string) mandatory Parameter name for the cursor (default: cursor
)
cursor_filter
: (string) mandatory JQ filter to extract next cursor
Use case: APIs using cursors/tokens for pagination.
Hyper Media Pagination (HyperMediaPaginationConfig)
This configuration implements HATEOAS-style pagination using next links.
For this pagination type, all URLs need to have the same base_url
configured. if the configured base_url
is https://my-api.com/data
then all next page urls must be at least https://my-api.com/data/_whatever
Parameters:
next_link_filter
: mandatory (string) JQ filter to extract next page URL
next_link
: mandatory (string) field which bears the next link URL
Use case: RESTful APIs following HATEOAS principles.
The JSON response looks like this:
We apply the filter .records[].fields
which means that for every entry in the records
property, it will extract all the properties of the fields
object. So we end up with a table of results looking like this (Iโm skipping columns in this example, but you see the point):
1094
Enders, Giulia
โฆ
746
Sattouf, Riad
โฆ
Performance If the HTTP API connector is used in a live context, make sure that the API is performant enough and is able to retrieve data fast. In order to have suitable performance, make sure to retrieve a limited amount of data since its need additional transformation in order to unnest the data (in the case of json response).
positional arguments
: enter one by one (in the right order), the URL to access to the authentication endpoint (e.g. ), the โclient_IDโ (sometimes named โclient_keyโ) and the โclient_secretโ. These informations are provided by the service you are trying to access
We have added a dedicated section to manage OAuth 2.0 authentication for REST APIs. This authentication method enables users to authenticate with a third-party service (an OAuth 2.0 provider). Upon request from our backend, the provider issues a token with a specific scope. This token is then used in the Authorization
header with the Bearer
scheme to authenticate and access your data on the API. For more detailed information, please refer to the .
Google as an requires other parameters, to access to Google API that requires OAuth2.0 as a mean of authentication you will have to fill the Additional authentication params with the following json
url
: The APIโs endpoint you want to query, it will be appended to the baseroute URL defined in the connector โ ๏ธ as it cannot be empty in the case when the API doesnโt have endpoint, you can split the baseroute url defined in the connector and put the last part in the datasource. Ex: in connector and /v1 in datasource
xpath
: If the reply from the API contains XML data you can parse it with an xpath string. See documentation: Example:
filter
: String containing a jq filter applied to the data to get them in tabular format. See documentation: Example:
After selecting data from your connector you will be able to create a dataset thanks to using the selection as "source step".