4.2. HTTP#

So far, the HTML files we’ve created are just sitting on the computer we made them on. If you wanted to share them with someone, you’d have to send them a copy, like emailing the file or putting it on a USB drive. But that means your HTML files are kind of stuck on their own, unable to connect with anything else.

To solve this problem, the HTTP protocol was invented. It allows computers to share HTML files over a network, like the internet. Thanks to HTTP, your files aren’t trapped — they can be shared across the web, and hyperlinks can connect pages from anywhere in the world. This combination of HTML, hyperlinks, and the internet is what we call the “World Wide Web.”

4.2.1. HTTP Protocol Overview#

The Hypertext Transfer Protocol (HTTP) defines a client-server protocol for the transmission of HTML (and associated) files over standard internet technology.

The term client is interchangeably used to mean the user’s physical device and web browser. While the term server is interchangeably used to refer to the software which processes the requests then returns appropriate responses and the physical device on which the software is running.

HTTP uses a “request and response” model. Clients send requests for a particular resource and the server provides the resource in the body of the response message.

../../_images/http.png

A client requesting a resource from a server.#

In general data is exchanged over HTTP in the following steps:

  1. server starts and waits for a new TCP connection

  2. client establishes a TCP connection with the server

  3. client sends a request conforming to HTTP protocol over TCP

  4. server processes request

  5. server sends a response conforming to HTTP protocol over TCP

  6. client and server close the TCP connection

The simplest web server hosts static content meaning that it reads the requested HTML files and returns them as the response.

../../_images/static_server.png

A crucial aspect of the HTTP protocol is that it is stateless, meaning that each request is independent of the others and the requests cannot reference any previous requests. However we will see later, that we can add state to our web sites through shared information between client and server inside the request data.

4.2.2. HTTP Requests and Responses#

Request and response messages are sent as plain text but follow a very specific format.

Requests#

Let’s start with an example HTTP request. For example, requesting the Google homepage in your browser would send the following request:

GET / HTTP/1.1
Host: www.google.com.au

Let’s look at each line:

  1. The request line GET / HTTP/1.1 consists of

    • the method or type of request: GET

    • the path to the resource, which in this case is at the root of the server: /

    • the version: HTTP/1.1

  2. The host line Host: www.google.com.au, which is a request header field that specifies the domain name the client is requesting the resource from. This is required since a single server may host many websites!

Request Specification#

METHOD PATH VERSION
Host: DOMAIN_NAME
Header-field-1: value1
Header-field-2: value2
...
Header-field-N: valueN

Breakdown:

  • METHOD, typically one of:

    • GET - request that the server returns the specified resource

    • POST - send data

  • PATH - path on the server to a resource

  • VERSION - normally HTTP/1.1

  • Mandatory Host header field

  • Optional header fields and values, e.g.

    • Accept: text/html

    • Accept-Language: en

Response#

Continuing the example from earlier, the Google web server would respond with:

HTTP/1.1 200 OK
Date: Monday, 8 Sep 2024 09:00:00 GMT
Content-Type: text/html

<!DOCTYPE html><html><head>...

where we have truncated the HTML to save page space.

Let’s look at each line:

  1. The status line HTTP/1.1 200 OK consists of:

    • the version HTTP/1.1

    • the status code of 200 meaning the request was successful

    • the status code reason phrase OK

  2. Date response header field

  3. Content-type response header field

  4. The body of the response, which contains the HTML of the page

Response Specification#

VERSION STATUS_CODE REASON_PHRASE
Header-field-1: value1
Header-field-2: value2
...
Header-field-N: valueN

BODY

Breakdown:

  • VERSION - normally HTTP/1.1

  • STATUS_CODE REASON_PHRASE - indicates the status of the request, typically one of:

    • 200 OK

    • 404 NOT FOUND

    • 500 INTERNAL SERVER ERROR

  • Optional header fields and values, e.g.

    • Content-Type: text/html

Status Codes#

A status code is a three digit number returned by the server to indicate the result of a request. Status codes are defined as part of the HTTP specification.

The codes are grouped into five classes and ranges:

  1. Informational (100 - 199)

  2. Success (200 - 299)

  3. Redirection (300 - 399)

  4. Client error (400 - 499)

  5. Server error (500 - 599)

1xx - Informational#

1xx codes are provisional responses indicating the server has received the request headers and is providing interim status while it continues processing. The connection remains open and the server will normally send a final response (2xx/3xx/4xx/5xx).

These informational codes can also be used to switch protocols.

Examples:

  • 100 Continue — Server says “go ahead”, commonly used when a client wants to send a large body and checks first via setting Expect: 100-continue in the request header.

  • 101 Switching Protocols — Server agrees to change the connection, commonly from HTTP to WebSocket protocol after a HTTP upgrade handshake.

2xx - Success#

2xx codes indicate that the request was successfully received, understood, and accepted. The exact meaning depends on the method: for example, a GET typically returns the requested representation, while a POST may create or trigger processing of a resource.

Examples:

  • 200 OK — Standard success response. The response body usually contains the requested resource (for GET) or the result of the operation.

  • 201 Created — A new resource was created, commonly in response to a POST. Often includes a Location header pointing to the new resource.

3xx - Redirection#

3xx codes indicate that the client must take additional action to complete the request. Most commonly, this means the client should make a new request to a different URL provided via the Location header.

Some 3xx codes are used for caching and conditional requests (notably 304), where the client can reuse a cached representation instead of downloading it again.

Examples:

  • 301 Moved Permanently — The resource has a new permanent URL. Clients and caches may remember the new location.

  • 304 Not Modified — The resource has not changed and the client should use its cached copy.

  • 307 Temporary Redirect — Temporary redirect where the client should repeat the request to the new URL.

4xx - Client Error#

4xx codes indicate that the request cannot be fulfilled due to a fault with the request from the client e.g. bad syntax, invalid data, missing authentication, insufficient permissions.

Examples

  • 400 Bad Request — The server cannot process the request due to malformed syntax or invalid input.

  • 401 Unauthorized — Authentication is required or the provided credentials are invalid.

  • 403 Forbidden — The server understood the request but refuses to authorise it.

  • 404 Not Found — The requested resource does not exist or the server is not willing to confirm it exists.

5xx - Server Error#

4xx codes indicate that the server failed to fulfill an otherwise valid request due to an error on the server side. These often represent temporary failures, so retrying later may succeed (especially for 502, 503, and 504), though repeated failures typically require server-side investigation.

Examples

  • 500 Internal Server Error — A generic server-side failure.

  • 502 Bad Gateway — A gateway or proxy received an invalid response from an upstream server.

  • 503 Service Unavailable — The server is temporarily unable to handle the request e.g. overload or maintenance.

Note

You can find a complete list and description of status codes here https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Status

4.2.3. Glossary#

Client#

A client is the device (like your computer or phone) that requests information from a server, such as when you use a web browser to load a website.

Method#

An HTTP method is the action that the client wants to perform, such as GET to request data or POST to send data to the server.

Plain text#

Plain text refers to data that is not encrypted or formatted, such as regular text that can be easily read by both humans and machines.

Server#

A server is a powerful computer that stores and delivers content (like web pages) to clients when they request it.

Stateless#

In HTTP, stateless means that each request from a client to a server is independent, and the server does not remember previous interactions with the client.

Static#

Static refers to web content that does not change or interact with the user, like a simple HTML page without dynamic features.

Status Code#

A status code is a three digit number returned by the server to indicate the result of a request, such as 200 for success or 404 when a page doesn’t exist.

Protocol#

A protocol is a set of rules for how data is exchanged over a network, like HTTP, which defines how web clients and servers communicate.

Resource#

A resource is any data or content (like a webpage, image, or file) that is available on a server and can be requested by a client.

Request header field#

A request header field is extra information sent by the client to the server, such as the type of browser being used or the desired content type.