Have you ever thought about how many times a day you visit a website? This is so common in our lives that it wouldn’t be a stretch to say that, for some people, the Internet is as essential as the air we breathe. And in that short period of time between typing the URL (Uniform Resource Locator), the website’s address, and the page actually appearing on your screen, some things happen during this process – and one of them is the execution of the HTTP protocol.
So, if you want to better understand what HTTP is and how crucial it is to the Internet, we explain in this post.
And if you already have a website, you need to be even more cautious, because neglecting the protocol itself can expose attack surfaces and even directly affect your sales.
What Is HTTP?
HTTP is the acronym for Hypertext Transfer Protocol. If you don’t remember or haven’t noticed, it appears at the beginning of a website address. HTTP is a text-based application layer transfer protocol and is considered the basis for data communication between networked devices. And during this request-response process, HTTP uses pre-defined standards and rules for the exchange of information. In general, HTTP is the protocol that clients and servers use to communicate.
In addition, for data exchange to take place, HTTP depends on two other network protocols: TCP (Transmission Control Protocol) and IP (Internet Protocol). From this, we have the TCP/IP model, which is part of the communication process between clients and servers and also between servers and Mobile/Web APIs. HTTP is the most used protocol for web-based applications and APIs.
Note that TCP/IP is a model, but also a protocol stack that HTTP falls into. Regarding the OSI model – which we’ll talk about later – TCP is layer 4 and IP is layer 3.
An important fact about the HTTP protocol is that it is stateless – but what does “stateless” mean? It means that each request that the client makes is an independent transaction, which is not related to any previous request, and each one is different for the server. That is, even if several requests are made at the same time, one doesn’t know that the other exists, and the server doesn’t store any information about the client’s state. As soon as a TCP connection is made, all information exchanged is lost. The advantages of this are that there is a reduction in the memory usage on the server and a reduction in the problems resulting from an expired session.
It’s also important to mention that HTTP is stateless if viewed from a high level of abstraction, but it is also based on TCP (not UDP), and thus is connection based and stateful from a lower-layer standpoint. What this means is that because it is connection based, it is stateful in delivery, which ensures that all packets are received and sequenced correctly.
We already know what a URL is, and that it is the first step in the information exchange process. But, although it is part of our everyday online life, many of us don’t know what its structure means. So, let’s consider the following URL:
In its basic form, we can divide the URL into three parts:
- The protocol (http:// or https://) - tells your browser how to communicate with a website’s server in order to send and retrieve information; when it’s HTTPS, it means that it’s a secure HTTP that has some additional security standards and encrypted text.
- The domain (azion.com) - has the subdomain (blog.), the name by which the website is known (azion), and the TLD (.com). TLD stands for Top Level Domain, which is the category of the websites such as .com (commercial), .org (organizations), and .net (networks).
- The path (/edge) - directs the browser to a specific page on the website.
History of HTTP
Let’s begin with the term hypertext, which was created by Ted Nelson in 1965 and was defined as “non-sequential writing.” In other words, it is a type of text that branches, it is not necessarily linear and contains links to other texts. This term, in turn, was inspired by the ideas of Vannevar Bush previously presented in his 1945 paper As We May Think.
Later, in 1989, Tim Berners-Lee proposed the WorldWideWeb project and, in 1991, he and his team created a protocol that would allow the retrieval of texts from other documents via hypertext links, which would become the original HTTP format. Hypertexts were the first files that used HTML (HyperText Mark-up Language), a textual format to represent documents in hypertext. HTML as well as hypertext is text, but it can serve as commands, including calling on other assets like images, videos, audios, etc. Its latest version is the HTML5.
Like everything else in the Internet world, the HTTP protocol has undergone several transformations and has evolved a lot, as can be seen below.
- HTTP/0.9 - The first version was launched in 1991 and was very simple. It focused on text transfer and only had the GET request method; it had no HTTP headers, status or error codes.
- HTTP/1.0 - This later version is from 1996, and presented, in addition to simple text transfer, the transmission of more sophisticated data, such as request/response metadata and content negotiation.
- HTTP/1.1 - This version from 1999 is considered a milestone in the evolution of the Internet, since it eliminated several problems from previous versions and introduced a series of optimizations, such as an additional cache mechanism, fragmented encoding transfers, request pipelining and encryption of transfer. Although there is a more recent version, this is still the standard and most used version in the world.
- HTTP/2 - This version is from 2015 and is better prepared for today’s massive and widespread Internet use. Although there has been no change in semantics from the previous version, some notable benefits are better performance in transporting information and data security and significantly lower latency.
- HTTP/3 - Launched in 2019, the HTTP/3 protocol is even faster, more reliable and more secure. It presents a fundamental difference from the previous version: a new transport protocol, the QUIC. QUIC is based on UDP (User Datagram Protocol), which is faster than TCP in transmitting data, as it does not go through the data verification process; this, however, makes it less secure. Although it’s not a reliable transport, QUIC adds an extra layer to UDP, bringing features like packet retransmission and congestion control. Another important feature of HTTP/3 is that it supports HTML5, the most modern version of HTML, which adds the ability to do native programming.
Understanding Where HTTP Fits in the Communication Puzzle
HTTP is a small piece in the communication stack, but how does it fit into it? To better understand it, let’s first have a quick view of the OSI model first.
The OSI (Open System Interconnection) model, created by the International Organization for Standardization in 1971, is a model of computer networks that serves as a standard, or rules, for communication protocols between different systems on a network. That is, it’s as if it were a universal language for computer networks. In this model, the communication system is divided into seven abstract layers, each one with a specific functionality.
The HTTP protocol acts precisely at layer 7, the application layer, where the interaction between users and computers happens – when users visit a website or check e-mails, for example. The application layer is responsible for the protocols and data manipulation on which the software depends to present meaningful data to the user. In addition to HTTP, other protocols that operate at the application layer are: HTTPS, DNS, FTP, IMAP, MIME, POP, RTP, SMTP, TELNET, TFTP and TLS.
The HTTP Communication
When a client wants to communicate with a server, the first thing that happens, after the user types the URL in the browser address bar or goes to another page, is opening a TCP/IP connection, and the HTTP request is sent to the server. In that request, there is a message with a series of data describing what the customer has requested. The server then sends the response to the client, which also contains data that can be read. Finally, the request-response process is finalized. Keep in mind that everything here usually takes microseconds to happen.
In the communication process we described above, you might have noticed that there are two essential agents: the request and the response.
What Is a Request?
The request is what the client needs from the server. In that message, there is specific data, which describes what was requested. The main components of a request are described below.
- The method - indicates the action the customer wants to take, such as:
- GET - to obtain resources or retrieve data from the server; it is the most common;
- POST - to send data to the server, such as submitting a form;
- HEAD - to resume the response line and headers;
- PUT - to send files to the server or do a full update of a resource;
- PATCH - to do a partial update of a resource;
- DELETE - to delete documents within the server;
- OPTIONS - to query which commands are available for a given user; and
- TRACE - to debug requests, returning a document header.
2. The path - the URI (Uniform Resource Identifier) of the resource to be searched.
3. The HTTP version of the protocol.
4. The request headers - containing additional information for servers.
5. The request body - optional and necessary for some methods, such as POST, and contains the requested resource.
What Is a Response?
The response given by the server contains the information requested by the client or informs that there was an error in relation to what was requested. It contains the elements listed below.
- The response header - contains the protocol version, the request status code and the type of content that is included in the body.
- The status code - indicates whether the request was successful or not. The answer is given by specific codes, such as:
- 200 - the request was answered successfully;
- 301 - the request was permanently moved;
- 401 - the request was not authorized by the server;
- 404 - the request was not found by the server; and
- 500 - internal server error.
- The response body - as in the request, is optional and contains data about the requested resource.
Another interesting aspect of the HTTP communication is the use of cache, which keeps copies of data that are frequently accessed. Basically, the cache speeds up the search and delivery of frequently used data and spares the use of resources on a server, improving the performance and speed of the process – but this is a topic for another blog post.
The Disadvantage of HTTP
You have seen that the HTTP protocol is the basis for any information exchange on the Internet, but, like everything else in this life, there is a catch: it’s not secure. Did you know that this is exactly why it’s the target of malicious actions? Among the cybercrimes that can affect the HTTP protocol, we can mention data interception during information transmission and DDoS attacks on layer 7, also known as HTTP flood attacks, which overload a server with HTTP requests. When the server is no longer able to respond to normal traffic, legitimate user requests will suffer denial of service.
Although all sites and applications are subject to cybercrime, the good news is that there are some ways to protect against these threats, and Azion offers the best solutions for our security: Web Application Firewall, DDoS Protection and Network Layer Protection.
Azion’s Security Solutions
Azion’s Web Application Firewall (WAF) secures web applications from numerous dangers, ranging from OWASP Top 10 threats to sophisticated zero-day attacks. It also protects the application layer 7, where applications access network services, by acting as a barrier that uses a set of rules to filter and monitor the traffic between your application and the Internet. In other words, WAF operates specifically in the security of HTTP/S protocols, since it analyzes requests, detecting and blocking malicious activities before they reach your infrastructure, without impacting the performance of your applications.
Our DDoS Protection provides multiple levels of protection against DDoS attacks, including attacks on layer 7, where the HTTP protocol operates. DDoS Protection uses our globally distributed network, in addition to several mitigation centers, to provide the intelligence and capacity needed to mitigate even the most massive and sophisticated attacks.
And our Network Layer Protection provides even wider protection, since it is a programmable security perimeter at the edge of the network for inbound and outbound traffic. With it, it’s possible to block, monitor suspicious behavior or apply penalties, such as limiting the number of accesses.
Choosing these mitigation solutions from Azion can help companies guard against recent trends such as larger and more frequent attacks, attacks on APIs, and complex and evolving attack strategies.
The Importance of HTTP
HTTP is a simple protocol, but more than that, it’s a striking feature that is also accessible, since it was designed to have messages that can be read and understood by any user. In addition, its stateless nature simplifies the performance of the server and makes it faster, since there is no need to store or clear data for the next requests. If a transaction is interrupted, no part of the system needs to be responsible for cleaning up the current state of the server. Another fundamental aspect is the fact that it is extensible, which allows the insertion of new functionalities and, consequently, HTTP follows the needs and the evolution of the Internet in its mission to transmit the data that practically runs the world.