WebSockets
posted on: Oct 11, 2019

In a traditional client-server programming paradigm, request-response model is pretty common. Whenever client needs resource, it sends a request to the server. Server responds a response with the requested resource to the client. However, this assumes that the client is always the initiator. What if the server has some data to send to the client?

HTTP is a client-server communication protocol, where the browser is the client and a web application is the server. Whenever you browse to a website, your browser sends an HTTP request to the server, to which server responds by sending an HTML page, which is displayed in your browser. This is straightforward.

The problem arises when the server has some data with which it wants to update the client. For example, server gets a new email. Typically, the user won’t see the email until they reload the browser. It would be nice if the server could update the email client by sending the data and forcing a UI refresh.

There are a couple of ways to handle this situation. Let’s see the inefficient solution first. For this, let me tell you a story.

There’s a gem of a guy in our support team at CityView. We call him the Plumber. The Plumber is the sweetest support staff I have ever come across. He is great at his job. Everyone likes him. The Plumber has a unique (and slightly annoying) habit. Every evening, before he leaves, the Plumber stops by everyone’s desk and asks the following question:

Is there anything I could do to help you leave early?

After the person says “no” (what else could they say?), the Plumber says:

If there’s ever anything, you know where to find me!!

Same question. Same answer. Same response. Each and every freaking day. With each and every staff who is working at the time the Plumber leaves.

I am not kidding.

Now, I felt really special for the first couple of days he asked me this exact question. There’s a support staff who really cares. Which is good. Good for clients. Good for staff. Good for everyone.

Not so good if you have to repeat the same sequence, every single day you decide to work late past the time the Plumber leaves. Not so good if you are in middle of fixing a complex bug and then you are forced out of you focused state just to politely say no. All the context in your brain’s RAM is lost and you have to start over. This is why you shouldn’t interrupt a programmer.

Asking each and every employee if they need help again and again is terribly inefficient. It consumes significant resources for the plumber, and it’s not scalable. What if we hire twenty more people tomorrow. Are you really going to stop at everyone’s desk and ask if they need help? Seriously?

This brings us to the inefficient solution I mentioned above to solve the server communication problem in client-server paradigm. It’s called polling. In polling, the client effectively keeps bugging the server to check if there is any new information available. If there is, the server responds with the new information.

There are quite a few issues with polling:

  1. As the number of clients grow, the load on the server increases proportionately
  2. If the polling frequency is low, the client may not be updated immediately when there is new data.
  3. Making repeated requests wastes resources. A new connection must be established, HTTP headers must be parsed, a query must be performed, a response must be generated, the connection must be closed and any resources must be cleaned, and on and on.

A better alternative to polling is long polling, where rather than terminating the request, the server keeps the request open and waits for the new information to become available, after which the response is completed. Long polling reduces the amount of data to be sent because the server only sends data when there is data to be sent. The client doesn’t have to check periodically.

Can we do better?

Back to our plumber story. What if, instead of the plumber asking everyone if they need any help, the person who actually needs help asks him for help? This changes the game completely. Now the employee gets immediate support from plumber, and he/she doesn’t have to wait until plumber finished his polling. Also, plumber can spend his time more effectively, handling our DevOps and all the other things he is great at, rather than polling everyone.

Turns out that there is a similar solution developed to overcome the inefficiencies of the polling technique. It is called WebSockets. As the MDN documentation says,

The WebSocket API is an advanced technology that makes it possible to open a two-way interactive communication session between the user’s browser and a server. With this API, you can send messages to a server and receive event-driven responses without having to poll the server for a reply.

WebSockets provide a persistent connection between a client and server that both can use to start sending data at any time. With WebSockets you can transfer as much data as you want without all the overhead associated with the traditional HTTP. It’s a game-changer for real-time, event-driven web applications.

This is how WebSockets work:

  1. The client establishes a connection through a WebSocket handshake, via a regular HTTP request. In this request, it lets the server know that it wants to upgrade to WebSocket protocol, from Http. This is achieved by using the Upgrade header field.

    GET ws://echo.websocket.org/?encoding=text HTTP/1.1
    Origin: http://websocket.org
    Connection: Upgrade    // upgrade this protocol
    Upgrade: websocket    // to WebSocket
    
  2. If the server supports the WebSocket protocol, it agrees to the handshake and communicates back to the client.

    HTTP/1.1 101 WebSocket Protocol Handshake
    Date: Fri, 10 Feb 2012 17:38:18 GMT
    Connection: Upgrade
    Upgrade: WebSocket
    
  3. Once the handshake is complete, the initial HTTP connection is replaced by a WebSocket connection that uses the same underlying TCP/IP connection. Now the client and server can start exchanging data back and forth.

The important thing to remember, is that the client always initiates the request. This makes sense intuitively, as there can be hundreds or even thousands of clients for a single server. The server would go crazy if it has to initiate the request for all the clients connected to it.

Let’s conclude by devising a WebSocket solution for our plumber dilemma. We can think of an employee as a server and our plumber will be the client. All the plumber has to do is to initiate the request, telling the employee only once; that if they need help, he is there and how to get to him. Both can now go back to their business. No more constant polling every evening. When the employee really needs help, they will know how to ping plumber, making everyone’s lives easier.