How DNS Works: The "white pages" of the Internet

You may not have heard of DNS...

but it is essential to how we use the internet today. Actually, you may have heard of it, since many browsers now support one form of encrypted DNS, what is called DNS-over-HTTPS, by default. More on this later.

DNS stands for Domain Name System, and it is the protocol through which browsers and other internet-enabled software translate human-readable domain names (like "google.com") into an IP address (like 216.58.193.142). There is a lot more to it, but that's the general idea.

Why we need DNS

DNS allows users to access websites and other services like email without having to know the actual server's address, and the protocol over which to communicate with it. Since each physical server has its own IP address, DNS allows the providers of these services to maintain more than one server if they need to in case of hardware failure, or to serve content to you faster by having many servers spread out over a geographical area.

A decentralized system

DNS is by nature decentralized, meaning it doesn't exist on only one server in one place... This makes it incredibly resilient to failure, since copies of the directory are distributed all around the internet. Every website or other service provides its location to an "authoritative" name server that tells the internet where it can be found. From there, the information propagates across a "web" of many different DNS servers, spread around the world. Your ISP maintains DNS servers, which in turn are fed from others. Major tech entities like Google and Cloudflare maintain their own public DNS servers, and they exist on countless nodes everywhere in between.

The highest servers in the chain are the root DNS servers. These tell a requester where the authoritative top-level domain (TLD) servers are located. The TLD is the .org, .com, .info, etc. The TLD servers then point the requester to the authoritative server for the specific domain, which points to a server for a sub-domain, if any.

In most cases, domain information is stored, or cached, on each server for faster resolution, to reduce traffic overhead, and reduce load on the Name Servers. Every domain can set a TTL (time to live) property that tells each Name Server how long it can keep the info cached for, before it needs to go back and ask again where to find that domain. This lets domain owners decide how often they might want to update their IP addresses.

How it all works

Source: Aaron Filbert - https://en.wikipedia.org/wiki/File:DNS_Architecture.svg

This is essentially how DNS works. Let's use a web browser on your Desktop PC as an example. Here's what happens when you type stuff.wagno.info into your browser's address bar and hit Enter:

Your browser queries your computer's local resolver, which is part of the operating system.
Your OS first checks what is called a "hosts" file on your filesystem to see if you or your administrator has hard-coded an IP address for the domain. This is very rare. In most cases, the computer forwards the request on to the DNS server defined in its network settings, most likely provided by your router.
The next DNS server in line checks if it has the domain name cached with a TTL that is not expired. If so, it returns the IP address it previously obtained and cached.
If that DNS server cannot resolve the address, it passes along the query on your behalf, which is known as recursive querying, to the next server in the chain, likely maintained by your ISP, and so on, until either a valid cached result is returned, or the query reaches the authoritative server for the domain, which returns the answer back down the line.
Your browser retrieves the answer (170.187.155.147) from the DNS resolver and it can then complete its connection to stuff.wagno.info.

The protocol

There is not just one "DNS protocol", there are actually many different types of queries that can be performed, as well as a few different transport methods the queries use to get to the server. There is too much to cover in the scope of this article, but below are the salient points.

Transport Protocols: A DNS query can reach a DNS server over either unencrypted "plain" DNS, or encrypted "secure" DNS:

Unencrypted - These transport protocols have been in use since 1983, when DNS was first created. This uses either User Datagram Protocol (UDP) port 53, or Transmission Control Protocol (TCP) port 53. These queries are sent in plain text. There are myriad privacy, tracking, and security concerns arising from this, which I will cover in future posts. Trust me, I have a lot to say about that! :-)
Encrypted - Beginning with the unofficial DNSCrypt in 2011, a series of encrypted transport protocols for DNS have been available (though not in wide use until recently). DNSCrypt has been adopted by several major resolvers, like OpenDNS and Quad9 This protocol encrypts communication between the client and the name server primarily in order to prevent man-in-the-middle attacks, and uses UDP or TCP port 443 (same port as HTTPS). DNS over TLS (DoT) is a standard described in 2016 which uses Transport Layer Security to encrypt DNS traffic on TCP port 853. DNS over HTTPS (DoH), introduced in 2018, is similar to DoT, in that it encrypts the traffic between the client and server, however it uses the standard HTTPS protocol on port 443, though there are some minor privacy concerns with DoH. There are other transport protocols, such as DNS-over-TOR, but these are the most common ones. The most common browsers like Chrome and Firefox now implement DoH as standard, although the default provider in Firefox's case (Cloudflare) has a few issues of its own that you might want to consider before leaving the default setting. But again, more on that in a future post.

Record Types: There are several different record types that are included in the DNS standard, a full list of which can be found here. The most common ones are these:

A or IPv4 Address record - This is used to request and return a simple 32-bit IPv4 ip address mapped to a domain, like 170.187.155.147.
AAAA or IPv6 Address record - Same as the A record, but for Internet Protocol version 6 (IPv6) addresses, which are 128-bit and look like this: 2600:3c02::f03c:93ff:fe28:f182
MX or Mail Exchange record - Used to route email. This record will map to an email server that is responsible for handling email for a domain.
NS or Name Server record - These records are what name servers use to communicate the authoritative name servers for a DNS zone, and propagate information throughout the DNS system.
PTR or Pointer record - These are used to perform reverse DNS (rDNS) lookups to validate an IP address really belongs to a domain. When you recieve an email for example, the email server will validate that the MX record matches a valid PTR record for the server that sent it, in order to help filter out spam.
TXT or Text record - These can serve many purposes, such as providing humans information about a domain's servers or network, or providing machines information, such as email authentication. Generally it is bad practice to include many TXT records, as all records are returned at once when any TXT record request comes into the name server, potentially causing a lot of unnecessary traffic.

Format: All DNS resource records follow a specific data format, and contain the following fields:

NAME - This is the fully qualified domain name (i.e. mail.google.com).
TYPE - This is the record type, i.e. A, MX, PTR, etc.
CLASS - This is normally set to 'IN' for internet traffic, but other types do exist. If you're interested in a deep dive, more on that can be found here and here.
TTL - This is the time to live. It specifies the length of time, in seconds, that the record is allowed to stay valid. Once this expires, the server will need to refresh the record from an authoritative source.
RDLENGTH - This specifies the length specified in octets, of the data contained in the RDATA field.
RDATA - This is where the actual value we're looking for lies. The IP address in the case of A or AAAA records, the hostname in the case of MX records, etc.

Wildcard records: DNS is capable of using what are called wildcard records, or records where the domain name starts with an asterisk. These records will match any sub-domain asked for, even ones that don't exist, routing the requester to the main website. The specification covering wildcard domains is vague, and so varying implementations exist, some incompatible with each other. The gist of it though, is if someone typed "gobbledygook.microsoft.com", for example, into their browser, that domain probably doesn't exist. If Microsoft had a wildcard DNS record with a name of "*.microsoft.com", they could be rerouted to the main microsoft.com domain.

How you can explore DNS

Here are some fun things that you can check out if you are interested:

In Windows, you can use nslookup at the command line to:

Find the IP address(es) for a domain:

nslookup google.com

Find the domain associated with an IP (reverse DNS or rDNS):

nslookup 108.177.122.138

Look at other record types like MX or TXT by hitting enter after nslookup, and typing additional parameters at the > prompt. Here's an example looking at the TXT records for Google:

In Linux, you can use dig in a similar way: dig google.com. You'll get a result that looks like this:

Note that Google has multiple A records with different IPs. They have way more than this, but only 6 are listed by dig. There is a great explanation of dig and some more examples and usage here.

Wrap up

Hopefully you found this at least half as interesting as I do, or at least stuck it out until the end. :-)

DNS is one of the fundamental protocols that keep the internet working, and with an understanding of it, you can better understand how our traffic gets out to the wider internet.

Thanks for reading!