Autonomous Systems (AS)
Take a look at the following commands
$ curl -H "Host: dave.dev" https://80.203.107.134/
$ ssh [email protected] -p 1138
What do you think is happening here? Where is the server located? How are these requests being routed? While it may look like both of these requests are being sent to the same server, with the same IP address (80.203.107.134), they are actually being sent to two different servers and none of them have the IP address of 80.203.107.134
.
So what's going on here?
What is an IP address?
When you saw the IP address 80.203.107.134
in the commands above, it wasn't the IP address of any one particular server. Instead, 80.203.107.134 is simply a public IP that is owned by an Autonomous System (AS). It does not tell us anything about the physical location of the server or the server itself. It is simply and IP address issued by a Regional Internet Registry (RIR) to an autonomous system. The only thing we can infer from this IP address is that it is being routed to the AS that owns it.
Let's take a closer look at the two commands above. The first command is an http request. It is made to the IP address of 80.203.107.134, it has a host header of dave.dev
and a destination port of 443. The second command is an ssh request, made to the IP address of 80.203.107.134 with a destination port of 1138. While these two commands look like they are being sent to the same server, they may not be sent to any server at all or sent to two different servers entirely. They are almost certainly not being sent to a server with an IP address of 80.203.107.134 since this is not the IP of a server. There are multiple factors that determine how these requests are routed and where they end up.
- Protocol: Routers can use the protocol of the request (http, ssh, etc.) to determine how to route the request.
- Destination Port: The destination port of the request can also be used in routing of the request.
- Host Header: The host header can also be used to determine how the request will be routed.
- IP Address: Routers can use both the source and destination IP addresses to determine how to route the request.
Examining our two commands we see that they are using different protocols, different destination ports, and different host headers. This means that they could be routed to different servers entirely. The only thing we know for sure is that they are being routed to the AS that owns the public IP. This is one common mistake that junior network engineers make by assuming things that are not known. The only thing that we should assume when looking at an IP address is that it is a public IP address owned by and routed to an AS.
Autonomous Systems
The internet is a network of networks. Each network is owned and operated by an organization or individual. These networks are called Autonomous Systems (AS). Each AS is assigned a unique AS number by the Internet Assigned Numbers Authority (IANA). This AS number is used to identify the AS on the internet.
There are three main tiers of Autonomous Systems:
- Tier 1: These are the top-level ASes that are connected to every other AS on the internet. They do not pay for transit and do not need to peer with other ASes to reach the entire internet. Examples of Tier 1 ASes include AT&T, Verizon, and Arelion. These are considered the backbone of the internet.
- Tier 2: These ASes are connected to Tier 1 ASes and other Tier 2 ASes. They pay for transit to reach the entire internet. Examples of Tier 2 ASes include Comcast, Cox Communications, and British Telecom.
- Tier 3: These ASes are connected to Tier 1 and Tier 2 ASes. They pay for transit to reach the entire internet. Examples of Tier 3 ASes include small ISPs, hosting providers, and data centers.
At its core, an autonomous system is a network. Each AS is registered with a Regional Internet Registry (RIR) and is assigned a unique AS number. Autonomous systems also buy IP address blocks directly from RIRs. There are currently five RIRs in the world: ARIN (North America), RIPE (Europe, the Middle East, and parts of Central Asia), APNIC (Asia-Pacific), LACNIC (Latin America and the Caribbean), and AFRINIC (Africa).
These IP address blocks are then assigned to the AS and used to route traffic to and from the AS. Once an AS has an IP address block, it can then advertise that block to other ASes on the internet. This is done using the Border Gateway Protocol (BGP). It is kind of like saying "Hey, if anyone sends a packet to any of these IP addresses, send it to me". They can advertise these IP addresses because they are the owner of them. When an AS advertises an IP block to its peers or upstream providers, it must be able to demonstrate ownership.
Autonomous systems are usually connected to multiple other autonomous systems and they advertise their IP address blocks to all of them. This gives them multiple paths to reach their network and provides redundancy.
So let's again revisit our two commands and see how they are being routed.
The first request is an http request with a host header of dave.dev
and a destination port of 443. The IP address is 80.203.107.134 owned by an AS, in this case an ISP. Once the request reaches the ISP, it first arrives at one of the ISPs gateway routers which then is forwarded to a load balancer. The load balancer then uses the IP and port information to determine where to send it further. It may send it to a customer or handle the request itself. We are going to assume that the request is sent to a customer. The customer has its own router that receives the request and forwards it to a reverse proxy server that is listening on port 443. The reverse proxy server then examines the host header and the destination port and forwards the request to the appropriate server (e.g. 192.168.1.10:8134). So the final destination of this request is the server with the IP address of 192.168.1.10 and port 8134.
Our second request is an ssh request with a destination port of 1138. The IP address is 80.203.107.134 owned by the same AS, in this case an ISP. Once the request reaches the ISP, it first arrives at one of the ISPs gateway routers which then is forwarded to a load balancer. The load balancer then uses the IP and port information to determine where to send it further. It may send it to a customer or handle the request itself. We are going to assume that the request is sent to a customer. The customer has its own router that receives the request, examines the protocol, destination port and forwards the request to the appropriate server (e.g. 192.168.1.11:22). So the final destination of this request is the server with the IP address of 192.168.1.11 and port 22.
If the 80.203.107.134 IP address was one of the IPs owned by a cloud provider (like DigitalOcean) registered as an AS, the request would be routed to the cloud provider's data center. It would arrive at one of the cloud provider's gateway routers which would then be possibly forwarded to a load balancer. The load balancer would then use the IP and port information to determine which part of the datacenter to send it further until it would reach a physical or virtual server which would examine the protocol, destination port, and host header and forward the request to the appropriate service.
In both cases, it is impossible to know how the request is being routed internally within the AS. We can only infer that the request is being sent to the AS that owns the IP address.
Tracing the Route
There are multiple ways how you could trace the route of a request. You could use the traceroute
command or if you're using Nginx, you could look at the access logs and see the IP where an incoming request is coming from.
nginx.1 | dave.dev 10.9.1.200 - - [09/Nov/2024:07:37:44 +0000] "GET / HTTP/1.0" 302 0 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/128.0.0.0 Safari/537.36" "10.89.0.2:80"
Here we can see that the request is coming from the IP address of 10.9.1.200, it has a host header dave.dev
and the request is being sent further upstream to 10.89.0.2:80. This is a very simple example but it shows how you can trace the route of a request.