HTTP
|
A protocol is a standard for communicating data across a network. A port is a number to identify which program should process a network connection.
HTTP is the protocol originally designed for requesting and receiving Web pages, but now also used as the basis for a variety of APIs. HTTPS is the encrypted version of HTTP.
Every page on the world wide web is identified with a URL or Uniform Resource Locator.
A request is how you tell a server what you want to see. A response will either give you what you asked for, or tell you why the server can’t do that. Both requests and responses have a header, and optionally a body.
We can make requests and receive responses, as well as see their headers, using curl .
|
What do APIs look like?
|
Interact with web APIs by sending requests to an endpoint representing a function of interest. Parameters can be encoded into the request, or attached as e.g. JSON.
Responses are typically plain text or JSON, but could be anything.
Most APIs require some form of authentication. This can be by username and password, or via a token.
Which choices a given API makes for each of these will be described in the API’s documentation.
|
dicts
|
A dict is a collection of key-value pairs.
Create a dict with the syntax {key1: value1, key2: value2, ...} .
Get and set elements of a dict with square brackets: my_dict[key1] = new_value1 .
|
Requests
|
GET requests are used to read data from a particular resource.
POST requests are used to write data to a particular resource.
GET and POST methods may require some form of authentication (POST usually does)
The Python requests library offers various ways to deal with authentication.
curl can be used instead for shell-based workflows and debugging purposes.
|
Elements of Web Scraping with BeautifulSoup
|
A BeautifulSoup object can be navigated in many ways:
Use find to look for the first element that matches the given criteria in a subtree
Use find_all to obtain a list of elements that matches the given criteria in a subtree
Use find_parents to get the list of ancestor of the given element
|