Fetching Data

From third party REST APIs

When we visit a website our browser makes an HTTP request for that HTML page as well as any subsequent HTTP requests for loading any assets (fonts, images, videos, etc). In the case of our projects (which we host on GitHub's servers), someone visiting our work in their browser sends requests to the GitHub servers which send them back the code we wrote (and any other files/assets we might be storing on there), but our code can also make requests to other servers. In this way our work can incorporate data and other assets from various other parts of the web.

One of the most common approaches is to send requests from our JavaScript code to other servers which make data, assets and other services available through an interface known as a REST API. These 3rd party (meaning controlled by someone else) REST APIs usually provide data formatted in JSON (though sometimes you see other formats like XML) which we can access by sending HTTP requests to specific URLs.

There are loads of these sorts of APIs online, apilist.fun and publicapis.io are just a couple of sites which attempt to aggregate as many of them as they can. The Chicago city also has a REST API which gives us access to all sorts of city data at data.cityofchicago.org

While it's possible to send requests to these REST APIs by entering their URLs in our browser's address bar (a great way to inspect what sort of data you'll be getting back and how it's structured), in order to create work using this data, we need a way to send HTTP requests in our JavaScript code. There have been different ways of accomplishing this over the years, which first began in the early days of Web 2.0 (mid 2000s) with the XMLHttpRequest object, which is a browser API for making HTTP requests. This was followed by an easier to use browser API called Fetch.





Below you'll find 4 netnet examples, which send a request to the same REST API, called dog.ceo (also used in the demo above), which returns a random image of a dog. The first example makes use of the nn library and is written in the most modern way possible (this is what our code should look like). That said, I've included three more examples below that one for context, as we may come across code like this online and it's important to be able to recoginzie these other variations. The first uses the older XMLHttpRequest API, the second one uses the Fetch API and the third example uses the newer "async / await" syntax to use the Fetch API with cleaner and easier to read code. All three examples technically do the same thing (the difference is the syntax)


The dog.ceo API used in the examples above is designed to be as easy as possible to use. It's a demo created to help beginners practice working with REST APIs. In most real-world cases there are some additional considerations which are important to keep in mind.

Other REST API Considerations

Documentation

When using an API, meaning a code-based interface created developers for other developers, we should always look up the documentation first. When using a JavaScript library for example, there's no way to know what the names of the functions (or methods) inside that library are (nor how we're supposed to use these functions) without some documentation. When it comes to REST APIs the primary interface isn't a set of functions (like it is in a library), instead it's a URL, what we refer to as an endpoint. How do we know what these endpoint URLs are and what sort of data we should expect to receive? By reviewing the API's documentation.





URL Parameters

We've previously discussed URL parameters, specifically how to use "search query" parameters them in our own published pages so that the same web page can render differently depending on the value of specific URL parameters. As explained above, a REST API are essentially a collection of URLs, often referred to as "endpoints", which return data (usually structured in JSON) rather than returning a web page (ie. an HTML file). Often we can request specific types of data from the same API endpoint by specifying what we want in the URL's parameters. For example, Let's take a look at https://restcountries.com/, a REST API which returns information about countries.

To get a list of all the countries we can send a request to https://restcountries.com/v3.1/all, here we consider the /all the "parameter", because while the first part of the URL will always remain the same, "https://restcountries.com/v3.1", what follows that will be different and will effect what sort of data we get back. For example, if we wanted to get data for a specific country the documentation page explains that we can do that by adding the following parameter /name/{name}, what the {} syntax in the documentation is telling us is that this second part of parameter should be replaced with a specific value, in this case the "name" of the country we want data for, for example: https://restcountries.com/v3.1/name/Canada

You'll notice that requesting data from the /name/{name} endpoint actually returns a lot of data, which can be hard to visually parse. For this reason it's best to use a browser like Firefox which will make it easier to explore the data and it's structure. Or install a browser extension like the JSON Formatter for Chrome.

That said, this particular API allows us to filter the data in our request, so that it sends back less data (rather than all the countries information by default). It does this by taking an additional URL parameter, this time a query string like the ones we used to modify our meditations last week. According to the documentation page, we need to add ?fields={field},{field},{field} to the end of our URL, where {field} is the name of a particular field (property) of data returned by the API. For example, if I only wanted the API to return the langauges, flag and capital of a particular country i could add ?fields=languages,flag,capital to the end of the URL like this: https://restcountries.com/v3.1/name/Canada?fields=languages,flag,capital

Here's an example of the Countries API in use on netnet.





API Keys

The doge.ceo and restcountries.com APIs are unusual in that they do not require an API key, it most cases you will first need to register an API key to send requests to the API. This is so that the entity maintaining the API can monitor who the requests are coming from. In some cases this is done to keep track of how many requests you have sent because while many APIs offer a "free" tier or a free number of queries per hour/month/etc, most of them will charge you some small amount per request. The API key is used to keep track of how many requests have come from that specific user (or an app they created).

For example, if you visit https://gnews.io/ you can create a free account. Upon creating your account you'll be able to generate your API Key. This will grant you 100 free requests per day, meaning you can only send that many requests to their server each day, that includes both entering the endpoint in ur address bar (as we've done to inspect data) or running your own code which sends a fetch request. Once signed in you'll have access to a user dashboard which shows you how many requests you've sent, while it's not uncommon for APIs to have personal dashboards like this, they don't always look as nice as this one. If you wanted to increase your rate limit from 100 to something larger, that's when you'll need to start paying for the API.

Once you have your API key, you should refer to the API's documentation on how to "authenticate" (or how to use it). In the case of the GNews API, this is done by adding the apikey={your_key} parameter to the URL. In order to use this API we need a minimum of two parameters, in addition to the "apikey" we also need to specify a search query for the sort of news headlines we want returned. To do this we use the q={search_term}. When we need to pass more than one query parameter to a URL the first one always begins with a ? and every subsequent paramter must begin with a &, in this case that would be something like: https://gnews.io/api/v4/search?q=economy&apikey=xxxxxxxxxxx (NOTE: that link won't work, unless you replace the xxxxxxxxxxx string with your own API key).

You'll notice from reviewing the documentation that there are a number of other optional parameters we can pass into the URL, for example if I wanted to filter the results so that only news articles written in English are returned I could add &lang=en. If I wanted to filter further so that only articles from the United States are returned i can add &country=us.

Here's an example of the Gnews API in use on netnet. NOTE: you'll need to add your own API key for this example to work.





CORS Issues: Client vs Server Requests

All the APIs we've discussed so far work on the "client side", meaning the code we write which runs on the viewer's computer can request data from these APIs. This isn't always the case. Some APIs, especially those which are paid or return sensitive data, require that the requests be sent "server side", meaning from the machine running your server. This is because any user can "view source" of a website and if your request is written in the client-side code, then the user will be able to see your API key, which means they'd be able to use it to send other requests if they so choose. To prevent this you have to keep your API key secret and stored on your server, there your server-side code can send the request to the API and then pass it along to your client without revealing your API key.

In this course we're using GitHub as our server, while it's totally possible to write your own server in JavaScript (or a number of other languages) this is a bit beyond the scope of this course. That said, netnet.studio can serve as a playground to experiment with these APIs without having to role out your own server using a special version of the fetch API built into the the nn library. For example, consider this other https://newsapi.org/. Like GNews this API returns breaking news headlines and other info, but unlike GNews it does not have "CORS enabled for all origins" (CORS stands for "cross origin resource sharing" and refers to rules a server will enable regarding which and how other domains are allowed to request data from it).

Here's an example of the nn.fetch() used to get around CORS restrictions in use on netnet. NOTE: you'll need to register your own API key with https://newsapi.org/ to test this out. Alos, the nn.fetch() function only works on netnet, it exists to help you test APIs that prevent client-side requests. We do this by acting as a "proxy", where netnet makes the reques on your behalf on it's own server. If you publish this code to your own project it will no longer work because it'll then be running on GitHub's servers. For this to work in your own published project you'll either need to create your own server (and send the API requests server-side) or use a paid proxy service like https://corsproxy.io/





Other Data Sources

Not all data-driven projects you come across online make use of these 3rd party REST APIs, sometimes data is made available for download so you can host it locally with your project (ie. upload it directly to your GitHub project like you would images or other assets). There are lots of places to find datasets online, one popular repository of data is kaggle.com, or checkout Jeremy Singer's Data Is Plural newsletter where he shares interesting datasets on a weekly basis (every single dataset he's shared in the newsletter previously can be found on this spreadsheet)

Sometimes the data we want is out there on some website, but there's no "download" button nor is there a REST API to conveniently request the data from. When this is the case you can create a "web scraper", a little bit of JavaScript code which acts as a bot that goes out onto the Internet and downloads the data off the website you want. These typically need to be custom written to ensure you get only the data you want, organized the way you want. This is beyond the scope of what we'll cover in class, but if this is something you're interested in learning more about, email me.

Lastly, you could create a project which generates its own data, this can be an automated process or it could be user generated data (like in Aaron Koblin's mechanical turk projects discussed in the Data Driven Compositions notes). In these cases you need somewhere to put the generated data, which requires "server side" or "back-end" JavaScript. In this class we're mostly focused on "client side" or "front-end" JavaScript, though again, while this is beyond the scope for this class, email me if you're interested in experimenting with something like this.