DNS: The whole internet runs on it!
- Aniketh Girish
I have been working, learning and poking around DNS protocol for around some time now and I thought that I should write something down to help me recollect every detailing and give a small introduction to all the readers about this protocol as well. This post might go deep into the aspect of the protocol we are dealing with, and it would certainly touch few of the other areas in the networking protocol stack - still I would try my best to keep it simple and explain every theoretical aspect of it at one piece.
I would like to start on how and why the DNS protocol was born. On an outline I will talk about Internet DNS hierarchy with reference to the OSI model, providing a platform to understand the domain concept easily.
In earlier days, Hostnames (simple computer names) of computers were manually entered into a file called HOSTS which was located on a central server. Each site/computer that needed to resolve hostnames had to download this file. But as the number of hosts grew, so did the HOSTS file (This file still exist in operating systems like Linux/Unix, Windows) until it was far too large for computers to download and it was generating great amounts of traffic. Then and there - DNS was born.
Now we will move forward with DNS resolution. What exactly happens when a host requests a DNS resolution. The full analysis of the whole resolution process if possible with an example as well moving on with understanding how name servers and the role they play in the DNS system.
DNS protocol communicates as a message buffer and therefore, it could be said that the message format is being divided into 4 parts - The header section, Question section, Resource section and the message section.
PS: In a nutshell, the header section provides the client or the server with the idea that whether the message is a query or a response and much more.
The Domain Name System is basically a 'hierarchically distributed database', (just like the roots of a tree branch out from the main root).
The DNS protocol works when your computer sends out a DNS query to a name server to resolve a domain. For example, you type "google.co.in" in your web browser, this triggers a DNS request, which your computer sends to a DNS server in order to get the website's IP Address.
The DNS protocol normally uses the UDP protocol as a means of transport because of its small overhead in comparison to TCP; the less overhead a protocol has, the faster it is.
In the case where there are constant errors and the computer trying to request a DNS resolution can't get an error-free answer, or any answer at all, it will switch to TCP to ensure the data arrives without errors.
This process, though, depends on the operating system you're using. Some operating systems might not allow DNS to use the TCP protocol, thus limiting it to UDP only. It is rare that you will get so many errors that you can't resolve any hostname or domain name to an IP Address.
The DNS protocol utilises Port 53 for its service. This means that a DNS server listens on Port 53 and expects any client wishing to use the service to use the same port. There are, however, cases where you might need to use a different port, something possible depending on the operating system and DNS server you are running.
The DNS structure has been designed in such a way that no DNS server needs to know about all possible domains, but only those immediately above and below it.
Root DNS servers, which know all about the authoritative DNS servers for the domains immediately below them. Each domain, including the ones we are talking about, have what we call a "Primary DNS" and "Secondary DNS". The Primary DNS is the one that holds all the information about its domain. The Secondary acts as a backup in case the Primary DNS fails. The process in which a Primary DNS server sends its copy to the Secondary DNS server is called zone transfer and is covered in the DNS Database section.
There are two ways for a client to use the domain name system to obtain an answer.
Recursive resolver/Server: Recursive servers generally maintain a cache as well. It will check this cache first to see if it already has the answer to the query. If it does not, it will see if it has the address to any of the servers that control the upper-level domain components. So if the request is for
www.example.com and it cannot find that host address in its cache, it will see if it has the address of the name servers for
example.com and if necessary,
com. It will then send a query to the name server of most specific domain component it can find in order to query for more information.
If it does not find the address to any of these domain components, it has to start from the very top of the hierarchy by querying the root name servers. The root servers know the addresses of all of the TLD (top-level domain) name servers which control zones for
.org, etc. It will ask the root servers if it knows the address of to
www.example.com. The root server will refer the recursive server to the name servers for the
Authorative resolver/server: An authoritative-only DNS server is a server that only concerns itself with answering the queries for the zones that it is responsible for. Since it does not help resolve queries for outside zones, it is generally very fast and can handle many requests efficiently.
Most of the operating system and any client would choose to use recursive resolution since we need the resolution in a larger scope rather than just in the scope of just a particular zone.
So, let me try to explain how DNS resolution works: We open our browser and enter, for example, www.google.com in the URL field. During this process, our computer does not know the IP address of the server to be connected to. So, it sends a DNS query to your our ISP's DNS server (It's querying the ISP's DNS because this has been set through the dial-up properties; if you're on a permanent connection then it's set through your network card's TCP/IP properties). Since the ISP's DNS server would not know the IP address of www.google.com, it will forward the query to the root DNS servers. Then, the root DNS server checks its database and finds the primary DNS for google.com. It replies to our ISP server with this answer. Now our ISP's DNS server knows the IP address of google's DNS server, so it then sends a recursive query to google.com's DNS server and asking to resolve the fully qualified domain name www.google.com. Google's DNS server checks it's database and finds an entry for www.google.com. (Since the IP address of the DNS server and webserver (www) are identical, this means they are likely to be both on the same physical server. The load-balancing mechanism can also have the same effect, making multiple services and physical machines have the same IP address.). Now our ISP's DNS server knows the IP address and lets our computer connected to the address. Naturally, the next step is to send an HTTP request directly to Google's web server and download the webpage.
Now let us analyse the DNS packets and see how DNS messages are formatted. Because the DNS message format can vary, depending on the query and the answer, we've broken this analysis into two parts:
At first, let us see in DNS format of a query, analysing contents of a DNS query packet to a DNS server, requesting to resolve a domain. Further, analyse DNS format of a response, when the DNS server is responding to our initial DNS query.
A DNS query is generated when the client needs to resolve a domain name into an IP Address.
Most of the time, the destination port is set to 53, the port which DNS protocol uses.
In the DNS query section, the main area which we are interested is the Query domain name, Query type, Query class. The rest such as the Opcodes are overhead that provide information to the server regarding the query. All fields in the DNS Query section except the DNS Name field have set lengths. The DNS Name field has no set length because it varies depending on the domain name length.
The Parameter Field (labelled Flags) is one of the most important fields in DNS because it is responsible for letting the server or client know a lot of important information about the DNS packet. For example, it contains information as to whether the DNS packet is a query or response and, in the case of a query, if it should be a recursive or non-recursive type. This is most important because as we've already seen, it determines how the query is handled by the server.
Now let us see about the DNS response.
The time it takes to receive an answer to our DNS query is very very less. To say, it is lightning fast. There are a lot of factors that contribute to this fairly fast response: The UDP transport protocol , which does not require any 3-way handshake, the load of the initial DNS server queried, the load of the other DNS servers that had to be ask, the connection speeds of everyone (our workstation, DNS servers, etc), and the traffic load between all paths our packets have taken during this DNS query/response!
The DNS Section in a response packet is considerably larger and complex than that of a query packet. As similar to the DNS query section, we have DNS query section, Answer section, Authoritative name servers, additional records in the response provided by the DNS server.
Next, each of these three sections has identical fields. Even though the information they contain might seem a bit different, the fields are exactly the same.
Next is the type field, which helps us to determine the type of information we need to require from a domain.
To give a simple example, when we have a Type=A, we are given the IP Address (IPv4) of the domain or host, whereas a Type=NS means we are given the Authoritative Name Servers that are responsible for the domain.
When requesting the name servers for a domain, it also essential their IP address is also provided, so that the client can construct a DNS query and send it to the name servers for that domain.
That's it, folks, if I feel that we could add more resource to this post. I will have an edit soon. I am closing this down since the post has become too huge already ;). For more detailed information with detailed technical in-depth, you can refer https://www.ietf.org/rfc/rfc1035.txt