Python: OS, Exception And Urllib

In this post, I’ll share three things –

  • Using OS module to access and print directories,
  • Using exceptions to handle runtime errors and
  • Using urllib module to fetch data from webpage and store it as a string in a variable.

Starting with OS module – you can see how we print a file name in the current directory and then show it’s relative and absolute path:

# python program located at: /Users/omkarb/Desktop
 
import os
 
## Example pulls filenames from a dir, prints their relative and absolute paths
def printdir(dir):
	filenames = os.listdir(dir)
 
	print filenames[1] #boarding pass
	print os.path.join(dir, filenames[1]) #./boarding pass
	print os.path.abspath(os.path.join(dir, filenames[1])) #/Users/omkarb/Desktop/boarding pass
 
printdir('./')
 
#explanation
	#for filename in filenames:
		#print filename  ## foo.txt
		#print os.path.join(dir, filename) ## dir/foo.txt (relative to current dir)
		#print os.path.abspath(os.path.join(dir, filename)) ## /home/nick/dir/foo.txt

Then you can also use the commands module to run a command in the terminal. For example, this is how you run pwd command:

import commands
## Given a dir path, run an external 'ls -l' on it --
## shows how to call an external program
def listdir():
  cmd = 'pwd' #present working directory
  print "Command to run:", cmd   ## good to debug cmd before actually running it
  (status, output) = commands.getstatusoutput(cmd)
  if status:    ## Error case, print the command's output to stderr and exit
    sys.stderr.write(output)
    sys.exit(status)
  print output  ## Otherwise do something with the command's output
 
listdir() #prints /Users/omkarb/desktop

If you know a certain piece of code is possibly going to return an error then you can put it inside try-except block as follows:

import sys
filename = 'line.txt'
try:
  ## Either of these two lines could throw an IOError, say
  ## if the file does not exist or the read() encounters a low level error.
  f = open(filename, 'rU')
  text = f.read()
  f.close()
except IOError:
  ## Control jumps directly to here if any of the above lines throws IOError.
  sys.stderr.write('problem reading:' + filename)
## In any case, the code then continues with the line after the try/except

The above code prints out the the error case because line.txt doesn’t exist on my computer.

Next, we can read from URL using the following code:

import urllib
## Given a url, try to retrieve it. If it's text/html,
## print its base url and its text.
def wget(url):
  ufile = urllib.urlopen(url)  ## get file-like object for url
  info = ufile.info()   ## meta-info about the url content
  if info.gettype() == 'text/html':
    print 'base url:' + ufile.geturl()
    text = ufile.read()  ## read all its text
    # print text #prints text
 
wget('https://google.com')

However, it doesn’t include error handling. If the URL doesn’t work for some reason, we can handle it as follows:

import urllib
## Given a url, try to retrieve it. If it's text/html,
## print its base url and its text.
def wget2(url):
  try:
    ufile = urllib.urlopen(url)  ## get file-like object for url
    if ufile.info().gettype() == 'text/html':
      print 'base url:' + ufile.geturl()
      text = ufile.read()  ## read all its text
      # print text #prints text
  except IOError:
    print 'problem reading url:', url
 
wget2('https://google.com')

There’s also a simpler way to read the web page using urllib.urlretrieve method:

import urllib
 
result = urllib.urlretrieve("http://wordpress.org/")
print open(result[0]).read()

That’s all in this post 🙂

Python: Regular Expressions

To use regular expressions we have to import a module called re in Python. Let’s start with a simple example which searches the pattern “word” followed by three letters –

import re
str = 'batman starts with the word:bat!!'
match = re.search(r'word:\w\w\w', str)
# If-statement after search() tests if it succeeded
if match:                      
  print 'found', match.group() ## 'found word:bat'
else:
  print 'did not find'

Why the prefix r?

I was wondering why do we have a prefix r in there? Google’s Python course said: The ‘r’ at the start of the pattern string designates a python “raw” string which passes through backslashes without change which is very handy for regular expressions.

I didn’t quite get that, so I searched and found this StackOverflow post. It becomes clear with the following example –

>>> '\n'
'\n'
>>> r'\n'
'\\n'
>>> print '\n'
 
 
>>> print r'\n'
\n

Search Examples

Here are some more re.search examples which can be used in the second block of code in this post –

re.search(r'iii', 'niiice') # found iii
re.search(r'igs', 'niiice') # did not find
 
## . = any char but \n		
re.search(r'..e', 'niiice') # found ice
 
## \d = digit char, \w = word char
re.search(r'\d\d\d', 'n123ce') # found 123
re.search(r'\w\w\w', '$$batman&&') # found bat

Repetition

Here’s what I learned about finding repeated pattern in a given string:

  • Plus sign (+): 1 or more occurrences of the pattern to its left, e.g. ‘i+’ = one or more i’s
  • Star sign (*): 0 or more occurrences of the pattern to its left
  • Question mark (?): match 0 or 1 occurrences of the pattern to its left
re.search(r'\d\s*\d\s*\d', 'xx1 2   3xx') #found 1 2   3
re.search(r'\d\s*\d\s*\d', 'xx12  3xx') =>  # found 12  3
re.search(r'\d\s*\d\s*\d', 'xx123xx') =>  #found 123
 
re.search(r'^b\w+', 'foobatman') # did not find
re.search(r'b\w+', 'foobatman') # found batman

Finding An Email using Regular Expression

import re
 
str = 'contact superman at supes@earth.com'
 
#search 1 or more words followed by @ followed by 1 or more words
match = re.search(r'\w+@\w+', str) 
 
if match:
  print match.group()  ## 'supes@earth'

But it only returns the email address partially. We need to adjust the code in a way that will allow it to print the .com part as well.

The following code accommodates dots and dashes:

import re
 
str = 'contact superman at supes@g-mail.com'
 
#both sets can contain a word, a dash or a dot
match = re.search(r'[\w.-]+@[\w.-]+', str) 
 
if match:
  print match.group()  ## 'supes@g-mail.com'

Now that we have a way to find the email address, can we extract the username from it? Yes, we can! This can be done using group extraction in python. Just add parenthesis around the username and host as follows:

import re
 
str = 'contact superman at supes@g-mail.com'
 
#both sets can contain a word, a dash or a dot
match = re.search(r'([\w.-]+)@([\w.-]+)', str) 
 
if match:
  print match.group()  ## 'supes@g-mail.com'
  print match.group(1) # supes
  print match.group(2) #g-mail.com

FindAll()

There’s something called re.findall() will find all matches of a given pattern in the string as opposed to re.search() which only finds the first match of the given pattern.

import re
 
str = 'contact superman at supes@g-mail.com and batman at batsy@justice.com'
 
#both sets can contain a word, a dash or a dot
matches = re.findall(r'([\w.-]+)@([\w.-]+)', str) 
 
for match in matches:
	print match
	#print match[0] # prints supes, batsy
	#print match[1] # prints g-mail.com, justice.com
 
##prints
#('supes', 'g-mail.com')
#('batsy', 'justice.com')

Bonus

Before ending this post, I want to add a point about greedy vs non-greedy aspect about regular expressions which I learned from Google’s python course.

Let’s say we want to match html tags in the following string:

<b>boldman</b> and <i>italicman</i>

It’s common to come up with a solution like <.*> – which will match for any string starting and ending with < and >. However, that matches the whole string instead of individual tags as follows:

import re
 
str = '<b>boldman</b> and <i>italicman</i>'
 
#both sets can contain a word, a dash or a dot
match = re.findall(r'<.*>', str) 
 
if match:
	print match
 
#result
#['<b>boldman</b> and <i>italicman</i>']

It can be fixed by adding ? in the solution as follows: <.*?>  

import re
 
str = '<b>boldman</b> and <i>italicman</i>'
 
#both sets can contain a word, a dash or a dot
match = re.findall(r'<.*?>', str) 
 
if match:
	print match
 
#result
#['<b>', '</b>', '<i>', '</i>']

That’s all in this post. Thanks for reading 🙂

Python: Reading Files

This is how you open and read files in Python:

#lines.txt has the following three lines (without #)
#this is line 1
#this is line 2
#this is line 3
 
# Echo the contents of a file
f = open('lines.txt', 'rU')
for line in f:   ## iterates over the lines of the file
  print line,    ## trailing , so print does not add an end-of-line char
                 ## since 'line' already includes the end-of line.
f.close()
 
#output is the text in lines.txt

The second parameter in open method is the mode. We can open a file using following modes:

A) Read (r) – for reading from the file
B) Write (w) – for writing to the file
C) Append (a) – for appending text to the file
D) Universal (rU) – for reading (but being smart about different line endings, so they all convert to \n )

Python: Basics

This post is to cover all the basic concepts in Python. It could be helpful to you if you are here to refresh your knowledge about Python.

This short post covers strings, if else, for loop, while loop, list, list methods, sorting, dictionaries, dictionary methods in Python.

The code below is explained in detail with comments over it. I recommend practicing it on your computer for better understanding.

In case it doesn’t load, please open this gist on Github.

This should be sufficient for you if you are someone who wants to start solving coding problems on site’s like LeetCode.

# —–
# IF ELSE statements
def donuts(count):
if count>=10:
return 'Number of donuts: many'
else:
return 'Number of donuts: ' + str(count)
# —–
# Mix first two and last two chars of a string
# eg: spring > spng
def both_ends(s):
n = len(s)
if n>2:
return s[0:2]+s[n-2:n]
else:
return ''
# —–
# Replace all occurrences of first char with * except first char
# eg: babble > ba**le
def fix_start(s):
n = len(s)
return s[0] + s[1:n].replace(s[0], "*")
# —–
# Swap first two chars of given strings
# eg: mix, pod > pox, mid
def mix_up(a, b):
n1 = len(a)
n2 = len(b)
return b[0:2] + a[2:n1] + ' ' + a[0:2] + b[2:n2]
# —–
jobs = ['google', 'apple', 'microsoft']
print jobs[0] ## google
print jobs[2] ## microsoft
print len(jobs) ## 3
# An interesting thing about lists is that –
# assigning a list to a new variable does not create a new list,
# instead the new variable points to the same list in memory.
# It’s interesting because, changing the first item in jobs
# will also change the first item in companies.
companies = jobs
print companies # ['google', 'apple', 'microsoft']
jobs[0] = "doodle"
print companies # ['doodle', 'apple', 'microsoft']
# —–
# FOR AND IN
evens = [0, 2, 4, 8]
sum = 0
for num in evens:
sum += num
print sum ## 14
if 4 in evens:
print "4 in list"
for i in range(100):
print i
# —–
# WHILE LOOP
a = [0,1,2,3,4,5,6,7,8,9]
## Access every 2nd element in a list
i = 0
while i < len(a):
print a[i]
i = i + 2
# prints 0 2 4 6 8
# —–
# LIST methods
names = ['batman', 'superman', 'spiderman']
print names.append('wonderwoman') #prints none, adds ww at the end
print names.insert(0,'aquaman') #prints none, adds aquaman in the front
print names #prints ['aquaman', 'batman', 'superman', 'spiderman', 'wonderwoman']
print names.remove('spiderman') #prints none, removes spiderman
print names.pop() #removes last item wonderwoman and prints/returns it
# —–
# LIST sorting
names = ['batman', 'atman', 'datman', 'catman']
print names.sort() #sorts but returns none
print names #prints sorted list
names = ['batman', 'atman', 'datman', 'catman']
print sorted(names) #returns sorted list
print sorted(names, reverse=True) #returns reverse sorted list
names = ['aaa', 'a', 'aaaabbbb', 'aaaaa']
print sorted(names, key=len) #sort using length of each item
#prints ['a', 'aaa', 'aaaaa', 'aaaabbbb']
## Say we have a list of strings we want to sort by the last letter of the string.
strs = ['xc', 'zb', 'yd' ,'wa']
## Write a little function that takes a string, and returns its last letter.
## This will be the key function (takes in 1 value, returns 1 value).
def MyFn(s):
return s[-1]
## Now pass key=MyFn to sorted() to sort by the last letter:
print sorted(strs, key=MyFn) ## ['wa', 'zb', 'xc', 'yd']
# —–
# Python Dictionary
letters = { 'a':'apple', 'b':'banana', 'c':'cat'} #defining a dictionary
print letters['a'] #prints apple
letters ['d'] = 'dog'
letters ['e'] = 'elephant'
print letters ['d'] #prints dog
print letters #prints the dictionary
# {'a': 'apple', 'c': 'cat', 'b': 'banana', 'e': 'elephant', 'd': 'dog'}
letters = { 'c':'cat', 'b':'ball', 'a':'apple'} #defining a dictionary
#1
for key in letters:
print key #prints c, b, a
#2
for key in letters.keys():
print key #prints c, b, a
#3
print letters.keys() #prints ['c','b','a']
#4
print letters.values() #prints ['cat','ball','apple']
#5
for key in sorted(letters.keys()):
print key, letters[key] #prints a apple b ball c cat
#6
print letters.items() # [('a', 'apple'), ('c', 'cat'), ('b', 'ball')]
#7
for k, v in letters.items():
print k, '>', v
#prints a > apple b > ball c > cat
hash = {}
hash['word'] = 'watermelons'
hash['count'] = 500
s = 'I want %(count)d of those %(word)s' % hash # %d for int, %s for string
print s # 'I want 500 of those watermelons
Gist – Python Basics

I personally refer this gist to refresh my python knowledge whenever I start solving LeetCode problems. I hope you too will find this helpful. If you do, let me know by leaving a comment. Thanks.

Python: Stack And Queue

Stack and Queue are abstract data types in Computer Science. In this short post, I’ll try to cover them in as few words as possible.

Stack has a LIFO concept (last in first out). The last item that goes in is the first item that goes out. Example: A stack of books or plates.

While Queue has a FIFO concept (first in first out). The first item that goes in is the first item that goes out. Example: A queue of people at a counter.

# Stack implementation
class Stack(object):
def __init__(self):
self.items = []
def isEmpty(self):
return self.items == []
def push(self, item):
self.items.append(item) #imp
def pop(self):
return self.items.pop() #imp
def peek(self):
return self.items[-1]
def size(self):
return len(self.items)
# Queue implementation
class Queue(object):
def __init__(self):
self.items = []
def isEmpty(self):
return self.items == []
def enqueue(self, item):
self.items.append(item) #imp
def dequeue(self):
return self.items.pop(0) #imp
def size(self):
return len(self.items)
# implementing a queue with 2 stacks
class QueueTwoStacks(object):
def __init__(self):
self.in_stack = []
self.out_stack = []
def enqueue(self, item):
self.in_stack.append(item)
def dequeue(self):
if len(self.out_stack) == 0:
# Move items from in_stack to out_stack, reversing order
while len(self.in_stack) > 0:
newest_in_stack_item = self.in_stack.pop()
self.out_stack.append(newest_in_stack_item)
# If out_stack is still empty, raise an error
if len(self.out_stack) == 0:
raise IndexError("Can't dequeue from empty queue!")
return self.out_stack.pop()
view raw stack_queue.py hosted with ❤ by GitHub
Gist – Stack And Queue

I hope this helps to refresh your knowledge on Stack and Queues in under a minute (or a few minutes).

Troubleshooting HTTP 503 Errors

HTTP 503 (like all other 5XX errors) is a server side error message. But 503 is a bit more descriptive.

It basically tells us that the server or service is temporary unavailable to handle the request but is functioning normal otherwise.

It could mean the server or service is down for scheduled maintenance but hasn’t necessarily crashed.

The key point to note here is that it implies “temporary” unavailability and that we expect the service to be back to normal with time.

For example, if you try loading the site a couple of times, it may work, let’s say, 10% of the time and return 503 error other times. This could indicate the servers are overloaded and unable to handle that extra load. This is just a probability and we need to gather more evidence like CPU/Memory/Resource usage to confirm that this is indeed the case.

If I see this error, I would roughly run through this list –

  • Low server resources? Heavy usage? Try again and see if it loads
  • Non responsive scripts
  • External resource is slow or non responsive
  • Site is under brute force or DDOS attack?
  • Check custom code. For example, I’d check PHP scripts if the site was running WordPress. I’d check plugins/themes, disable them one by one and try loading the site again.
  • .htaccess issue?
  • Digging deeper:
    • Check site’s server log
    • Check temp files/cache
    • Contact the hosting company
    • Check analytics to see if there’s more traffic than usual
    • Check for slow database queries
  • Check for other documentation specific to host or the software that is running on the site.
  • Do a quick Google search, check Stackoverflow and other such sites with the information I have acquired so far in my investigation.

And that’s about it. With that you should be equipped to handle HTTP 503 errors. If you find this post helpful or if you have anything else to add to it, please leave a comment below. Thanks!

SQL Cheatsheet

I have put together a list of SQL statements that cover SQL basics and it is something you can use to refresh your SQL knowledge in a couple of minutes.

— Creating a table
CREATE TABLE users(
id INT AUTO_INCREMENT,
first_name VARCHAR(100),
email VARCHAR(70),
date DATETIME,
PRIMARY KEY(id)
);
— Foreign key reference
CREATE TABLE users(
id INT AUTO_INCREMENT,
user_id INT,
first_name VARCHAR(100),
email VARCHAR(70),
date DATETIME,
PRIMARY KEY(id),
FOREIGN KEY(user_id) REFERENCE users(id)
);
— Inserting into a table
INSERT INTO users(column1, column2) VALUES(value1, value2);
— Selecting from a table
SELECT * FROM users;
— Deleting from a table
DELETE * FROM users WHERE id = 6;
— Update one row in a table
UPDARE users SET email = ‘omkar@gmail.com’ WHERE id = 2;
— Alter the table
ALTER TABLE users ADD age VARCHAR(3);
ALTER TABLE users MODIFY COLUMN age INT(3);
— More Select operations
SELECT * FROM users WHERE location = ‘mumbai’;
SELECT * FROM users WHERE location=’mumbai’ AND size=100;
SELECT * FROM users ORDER BY last_name DESC;
SELECT CONCAT(first_name, ‘ ‘, last_name) AS name FROM users;
SELECT DISTINCT location from users;
SELECT * FROM users WHERE age BETWEEN 20 AND 25;
SELECT * FROM users WHERE dept IN (‘design’, ‘sales’);
SELECT * FROM users WHERE dept LIKE ‘d%’;
— A bit of Regex
SELECT * FROM users WHERE REGEXP ‘^o’;
— ^ starts with
— $ ends with
— .* any characters
— [abc] any from a given list of a, b, c
— For example, the line below selects everything that starts and ends with a vowel:
SELECT * FROM users WHERE ‘^[aeiou].*[aeiou]$’;
— Indexes
CREATE INDEX LIndex ON users(location);
DROP INDEX LIndex ON users;
— Inner Join
SELECT
u.first_name
u.last_name
p.title
p.publish_date
FROM users AS u
INNER JOIN posts AS p
ON u.id = p.user_id
ORDER BY p.title;
— Outer Join
SELECT
comments.body
posts.title
FROM comments
LEFT JOIN posts
ON posts.id = comments.post_id
ORDER BY posts.title;
— Joining comments and posts table using LEFT JOIN makes sense
— because it’ll fetch comments and post title of every post that has a comment.
— If we instead do a RIGHT JOIN,
— it’ll fetch every post even if it has no comment. That doesn’t make sense.
— Processing Order
— WHERE gets processed before GROUP BY
— HAVING gets processed after GROUP BY
SELECT store_id, COUNT(active)
FROM customer
WHERE active = 1
GROUP BY store_id;
SELECT rating, COUNT(rating) AS number
FROM film
GROUP BY rating
HAVING number > 80
ORDER BY number DESC;
Gist – SQL Cheatsheet

That’s all in this post. I may add more in a separate post later.

In case you’re interested to learn more, there are a few good SQL crash course videos on YouTube. If you ask me, I recommend this one by Traversy Media and freeCodeCamp one by Mike Dane).

HTTP Request Methods

Each HTTP request has its own header which tell the server what the request is all about. You can see this by opening developer tools in your browser, heading to the network tab, loading a page and then inspecting each request.

HTTP methods are classified into two types: 

  • Safe methods: They don’t change anything on the server side.
  • Idempotent methods: They produce the same results no matter how many times they are called. 

Idempotence catchphrase → Send and send and send my friend, it makes no difference in the end.

Now let’s look at each of these methods:

GET → Fetch a resource from the server. 
Example: GET /users/207 
Safe: ✅
Idempotent: ✅

POST → Modify/update a resource on the server. 
Example: POST /users
Safe: ❌
Idempotent: ❌

PUT → Create a new resource or overwrite if one is present. 
Example: PUT /users/207 
Safe: ❌
Idempotent: ✅

DELETE → Remove a resource from the server.
Example: DELETE /users/207 
Safe: ❌
Idempotent: ✅

PATCH → Apply partial modifications to a resource.
Safe: ❌
Idempotent: ❌

POST vs PUT vs PATCH

There is an excellent StackOverflow post that covers this in detail. Here’s an excerpt from that post →

  • POST to a URL creates a child resource at a server defined URL.
  • PUT to a URL creates/replaces the resource in its entirety at the client defined URL.
  • PATCH to a URL updates part of the resource at that client defined URL.

If the endpoint on the server is idempotent (i.e safe to do the request over and over again) and the URI is address to the resource being updated then we can safely use PUT, else use POST. 

PUT will essentially take an object and replace the entire resource at the URI. For example, if we had to update the email of a user, we would send the entire resource:

PUT /users/207
{
name: "Omkar Bhagat",
email: "omkar@bhagat.com", // only updating email in this request
city: "Mumbai"
}

If we only send email (a partial resource) as follows →

PUT /users/207
{
email: "omkar@google.com" // only updating email in this request
}

Then the rest of the properties will have NULL values. That’s where PATCH comes to our rescue. 

Reference: https://medium.com/@kumaraksi/using-http-methods-for-restful-services-e6671cf70d4d

HTTPS Encryption Explained

HTTPS is a way of encrypting HTTP. It basically wraps HTTP messages up in an encrypted format using SSL/TLS. 

SSL is secure sockets layer and TLS is transport layer security, both are cryptographic protocols designed to provide communication security over computer network.

SSL is basically deprecated and TLS is what we all use (so if someone says SSL in 2019, they’re most likely referring to TLS).

Source: https://hpbn.co/transport-layer-security-tls/

Note that TLS is not mandatory according to the HTTP/2 spec but most browsers will only allow HTTP/2 over TLS.

Encryption: Asymmetric vs Symmetric

TLS handshake uses asymmetric encryption where the server produces two keys: public and private. Data can be locked/encrypted using the server’s public key but can only be unlocked/decrypted using the server’s private key.

The server sends server’s public key to the client. It can be seen by anyone on the network but it doesn’t matter because it is a public key (which can only be used to lock/encrypt things).

Client now creates a new symmetric key on its side (a key that can be used to both lock and unlock). Client locks this symmetric key in a box using the server’s public key and sends it to the server.

People can again see this box on the network but cannot unlock it without the server’s private key. (Note: The act of secretly listening to a private conversation is called eavesdropping).

Once server receives this box, it unlocks it using server’s private key and finds a symmetric key with a note from the client which says “we shall now communicate using this symmetric key”.

At this point, client and server don’t need to send keys anymore. They can send data by locking it with the symmetric key and unlock it using the same symmetric key. 

💥 Boom! That’s how data is securely sent over the network 🙂

Reference: https://www.cloudflare.com/learning/cdn/cdn-ssl-tls-security/

HTTP Versions Explained

What is HTTP?

HTTP is Hyper Text Transfer Protocol. What is a protocol? Protocol is a set of rules that tells two machines how to talk to each other (similar to the rules in our language).

HTTP is stateless protocol, that means it doesn’t maintain a state after each request. This also results in HTTP being connectionless. That means the client (your browser/machine) and the server is only aware of each other during one/current request.

To fix this, HTTP operates over TCP (Transmission Control Protocol) which opens and keeps the connection alive. 

HTTP/1.0 → A new TCP connection was required for each request/response pair. That leads to poor performance because it is time and resource expensive to create a TCP connection for each request. 

HTTP/1.1 (persistent connections are default) → We still have to wait for a response before sending a request but we can have multiple request/reponse pairs in a single connection. 

Also, we can open multiple connections but most browsers support up to 6 parellel connections per domain. 

To fix this limitation, a technique called domain sharding is used where resources are delivered from multiple subdomains. For example, you may have seen images being served from i1.wp.com, i2.wp.com on WordPress.com. 

HTTP/1.1 (with pipelining) → Multiple requests can be sent without waiting for a response but the responses have to be in order they were requested. This is poorly supported by browsers and servers and is almost never used.

HTTP/2 (multiplexing) → Multiple requests can be sent without waiting for a response and the responses can be in any order. This vastly improves performance (see a demo: here and here). It can also break the responses into smaller items and send them as soon as they are ready.

HTTP/2 (with push) → Let’s say we request index.html, the server can check this file needs few more files like style.css and scripts.js and the server automatically “pushes” these to us. Thus the browser didn’t need to make separate requests for them.

HTTP Connections Analogy 

Let’s say you want to order a laptop, a mouse and a keyboard from our imaginary store Anazom. You can only make phone calls to order something and note that making a call is expensive (the less calls you make, the better).

HTTP/1.0 → You make a call. Order a laptop. You get the laptop. The call ends. You make a new call to order mouse. You get the mouse. The call ends. Same thing for keyboard. You make 3 calls here.

HTTP/1.1 (with persistent connection)→ You make a call. Order a laptop. Once you get the laptop, you order a mouse. Once you get the mouse, you order a keyboard. The call ends. 

You make 1 call here but wait for each order to complete before making another order.

HTTP/1.1 (with pipelining) → You make a call. Order a laptop, mouse and keyboard. But they have to arrive in the same order. If 3rd order is ready and 2nd order isn’t complete, it waits. The entire connection remains FIFO (first in first out) and can lead to HOL (head of line) blocking. 

You make 1 call here and send all orders at the same time but the delivery of each order is blocked till all the previous orders are delivered.

HTTP/2 (with multiplexing) → You make a call. Order a laptop, mouse, keyboard. They don’t have to arrive in the same order. Whichever order is ready will be sent to you asap.

In fact, if the store determines the keys in the keyboard are ready but rest of the keyboard isn’t. It’ll send keys separately first and then the rest of the keyboard. This isn’t practical in the real world but works very well in computing world. This means your second order was broken into smaller orders and sent back (reducing wait time). 

HTTP/2 (with push) → You make a call. Order a laptop. With push, the store can read your first order and understand that you are about to ask for a mouse and a keyboard. So it sends a laptop, a mouse and a keyboard without you making two separate requests (for mouse and keyboard). 

HTTP/3 → I’m writing this post in Nov 2019. At this point, HTTP/3 is pretty new so I admit I am not completely up to date with this protocol. But based on my limited understanding. HTTP/3 works with something called QUIC (Quick UDP Internet Connections)→ which is basically UDP (User Datagram Protocol) along with congestion control. Again, I don’t know much about this so please Google it if you’re interested to know more.

Reference: https://stackoverflow.com/questions/36517829/what-does-multiplexing-mean-in-http-2

Refer this sweet animation which explains all this beautifully: https://freecontent.manning.com/animation-http-1-1-vs-http-2-vs-http-2-with-push/