๐Ÿ”ŒSetting up an HTTP API connector

Overview

This is a generic connector to get data from any HTTP APIs (REST style APIs). Itโ€™s really customizable and versatile but it implies a more complex configuration.

This type of data source combines the features of Pythonโ€™s requests library to get data from any API with the filtering langage jq for flexible transformations of the responses. Optionally, an xpath string can be provided to first parse the XML response and then the jq filter is be applied to get the data in tabular format.

Information

The connector doesn't handle paginated results.

Configuring the connector

To configure this connector you will need to use the documentation of the API you need to connect to.

Responsetype

The type of response the connector has to expect from the queried API.

Make sure you use the correct responsetype, based on the queried APIโ€™s documentation. Currently JSON & XML are supported, the default being JSON.

Retrypolicy

Defines how the connector should behave when the network is unreachable:

  • MAX ATTEMPTS: number of attempts to do before aborting the connexion

  • MAX DELAY: total time to wait before aborting the connexion

  • WAIT TIME: time to wait between each attempt

Certificate

If the connector must use a certificate to establish the connexion, you can provide the path to the certificate.

Auth

The authentication method that the connector should use to query the data. AUTHTYPE Can be:

  • basic: username password, you can provide them in

    • positional arguments: input your username and password in the right order

    • named arguments: input them this way {โ€œusernameโ€:โ€myusernameโ€, โ€œpasswordโ€:โ€mypasswordโ€}

  • digest: same as above

  • oAuth1:

    • positional arguments: input client_id (sometimes named client_key) and client_secret. Both are provided by the service you are trying to access

    • named arguments: input {โ€œclient_idโ€:your_client_id, โ€œclient_secretโ€: your_client_secret}.

  • oAUth2:

    • positional arguments: enter one by one (in the right order), the URL to access to the authentication endpoint (e.g. https://login.mywebsite.com/oauth2/token), the โ€œclient_IDโ€ (sometimes named โ€œclient_keyโ€) and the โ€œclient_secretโ€. These informations are provided by the service you are trying to access

    • named arguments: input {โ€œclient_idโ€:your_client_id, โ€œclient_secretโ€: your_client_secret}.

  • CustomTokenServer: provides a flexible mechanism for authenticating API requests using a custom token server. the token you get is then sent in the the Authorization header prefixed with "Bearer {{your_token}}" . In the named arguments section you have to fill as a json dict the required elements to get your token:

    • method: The HTTP method to use when requesting the token (e.g., 'GET', 'POST').

    • url: The URL to get the token server.

    • params (optional): Query parameters to include in the token request.

    • data (optional): Form data to include in the token request body.

    • headers (optional): Additional headers to include in the token request.

    • json (optional): JSON payload to include in the token request body.

    • filter (optional): A JQ-style filter to extract the token from the response. Defaults to "." (root of the JSON response).

Template

You can use this object to avoid repetition in data sources. The values of the three attributes will be used or overridden by all data sources using this connector.

  • json: a JSON object of parameters to send in the body of every HTTP request made using the configured connector. Example: { โ€œoffsetโ€: 100, โ€œlimitโ€: 50 }

  • headers: a JSON object of parameters to send in the header of every HTTP request made using the configured connector. Example: { โ€œcontent-typeโ€: โ€œapplication/xmlโ€ }

  • params: a JSON object of parameters to send in the query string of every HTTP request made using the configured connector. Example: { โ€œoffsetโ€: 100, โ€œlimitโ€: 50}

  • proxies: JSON object expressing a mapping of protocol or host to corresponding proxy. Example {โ€œhttpโ€: โ€œfoo.bar:3128โ€, โ€œhttp://host.nameโ€: โ€œfoo.bar:4012โ€}

Selecting data from the API

  • parameters: A JSON object that will be used for variables interpolation in the query string. For testing purpose only. In production mode, it should be left blank as variable interpolation will be handled by the app requester.

  • url: The APIโ€™s endpoint you want to query, it will be appended to the baseroute URL defined in the connector โš ๏ธ as it cannot be empty in the case when the API doesnโ€™t have endpoint, you can split the baseroute url defined in the connector and put the last part in the datasource. Ex: https://example.com/API in connector and /v1 in datasource

  • Method: Defines the http method you want the datasource to perfom, GET, POST or PUT. Default is GET. You can find the method you need in the documentation of the API you want to query

  • headers: a JSON object of parameters to send in the header of every HTTP request made using the configured connector. Example: { โ€œcontent-typeโ€: โ€œapplication/xmlโ€ }. Overwrites the headerโ€™s parameter in Template

  • params: a JSON object of parameters to send in the query string of every HTTP request made using the configured connector. Example: { โ€œoffsetโ€: 100, โ€œlimitโ€: 50} Overwrites the params parameter in Template

  • json: a JSON object of parameters to send in the body of every HTTP request made using the configured connector. Example: { โ€œoffsetโ€: 100, โ€œlimitโ€: 50 } Overwrites the JSON parameter in Template

  • proxies: JSON object expressing a mapping of protocol or host to corresponding proxy. Example {โ€œhttpโ€: โ€œfoo.bar:3128โ€, โ€œhttp://host.nameโ€: โ€œfoo.bar:4012โ€} Overwrites the proxies parameter in Template

  • flatten column: optional field where you can specify the name of a column that contains nested rows. the column names in the resulting DataFrame will be prefixed with the original column name. Specified more parameters using a , delimiter. If specified, the nested rows will be flattened into separate columns in the resulting data frame. Example if you have a column orders: [{"id": 3, "product": "Notebook", "price": 5.99}] results will be separated in orders_id, orders_product and orders_price

  • data: Two options, Type1 for a simple string, Type2 for a JSON field. ๐Ÿ’ก you can send XML data with Type1 option

  • xpath: If the reply from the API contains XML data you can parse it with an xpath string. See documentation: xpath Example:

    <?xml version="1.0" encoding="UTF-8"?>
    <result>
    <bookstore>
        <book>
            <title>Harry Potter</title>
            <price>29.99</price>
        </book>
        <book>
            <title>Learning XML</title>
            <price>39.95</price>
        </book>
    </bookstore>
    </result>

In the connector weโ€™ll have a response like this:

{"bookstore": {"book": [{"title":"Harry Potter", "price": "29.99"}, {"title": "Learning XML", "price":"39.95"}]}}

And we can then apply a:

  • filter: String containing a jq filter applied to the data to get them in tabular format. See documentation: jq Example:

    filter: ".bookstore.book[]"

Letโ€™s take the JSON defined above

{"bookstore": {"book": [{"title":"Harry Potter", "price": "29.99"}, {"title": "Learning XML", "price":"39.95"}]}}

We apply the filter โ€œ.bookstore.book[]โ€ which means that it will extract the book list from the bookstore So we end up with a table of results looking like this:

Note: the reason to have a filter option is to allow you to take any API response and transform it into something that fits into a column based data frame.

Example of connection to Open Data Paris

Setting up the connection to Open Data Paris

name: open-data-paris
baseroute: https://opendata.paris.fr/api/

Selecting data from Open Data Paris

Dataset: books
Method: GET
URL: records/1.0/search/
Dataset: les-1000-titres-les-plus-reserves-dans-les-bibliotheques-de-pret
Facet: auteur
Filter: .records[].fields

The JSON response looks like this:

json   {     "nhits": 1000,     "parameters": { ... },     "records": [       {         "datasetid": "les-1000-titres-les-plus-reserves-dans-les-bibliotheques-de-pret",         "recordid": "4b950c1ac5459379633d74ed2ef7f1c7f5cc3a10",         "fields": {           "nombre_de_reservations": 1094,           "url_de_la_fiche_de_l_oeuvre": "https://bibliotheques.paris.fr/Default/doc/SYRACUSE/1009613",           "url_de_la_fiche_de_l_auteur": "https://bibliotheques.paris.fr/Default/doc/SYRACUSE/1009613",           "support": "indรฉterminรฉ",           "auteur": "Enders, Giulia",           "titre": "Le charme discret de l'intestin [Texte imprimรฉ] : tout sur un organe mal aimรฉ"         },         "record_timestamp": "2017-01-26T11:17:33+00:00"       },       {         "datasetid":"les-1000-titres-les-plus-reserves-dans-les-bibliotheques-de-pret",         "recordid":"3df76bd20ab5dc902d0c8e5219dbefe9319c5eef",         "fields":{           "nombre_de_reservations":746,           "url_de_la_fiche_de_l_oeuvre":"https://bibliotheques.paris.fr/Default/doc/SYRACUSE/1016593",           "url_de_la_fiche_de_l_auteur":"https://bibliotheques.paris.fr/Default/doc/SYRACUSE/1016593",           "support":"Bande dessinรฉe pour adulte",           "auteur":"Sattouf, Riad",           "titre":"L'Arabe du futur [Texte imprimรฉ]. 2. Une jeunesse au Moyen-Orient, 1984-1985"         },         "record_timestamp":"2017-01-26T11:17:33+00:00"       },       ...     ]   }

We apply the filter .records[].fields which means that for every entry in the records property, it will extract all the properties of the fields object. So we end up with a table of results looking like this (Iโ€™m skipping columns in this example, but you see the point):

Note: the reason to have a filter option is to allow you to take any API response and transform it into something that fits into a column-based data frame. jq is designed to be concise and easy for simple tasks, but if you dig a little deeper, youโ€™ll find a feature functional programming language hiding underneath.

Performance If the HTTP API connector is used in a live context, make sure that the API is performant enough and is able to retrieve data fast. In order to have suitable performance, make sure to retrieve a limited amount of data since its need additional transformation in order to unnest the data (in the case of json response).

After selecting data from your connector you will be able to create a dataset thanks to YouPrep using the selection as "source step".

Last updated