CS Mumbai

Setting Up Jetpack Developer Environment

Breaking the flow of Python posts I have made so far by writing a little bit about Jetpack (a product from Automattic, the company I work for).

This post covers how you can set up Jetpack Developer Environment locally. The first step is to install git, node, npm, homebrew on your computer. Next install composer and yarn using homebrew as follows:

$ brew install composer
$ brew install yarn

Without Docker

Note: Skip this step if you’ll use the same docker dev environment that is used by Jetpack dev team!

Have a working WordPress installation with one of the popular local servers like MAMP, Vagrants, etc.

Clone Jetpack repo by going into the plugins folder wp-content/plugins/ in your WordPress installation and cd into it using cd jetpack.

Finally, enter yarn build and once that finishes, your Jetpack dev environment is ready.

With Docker

Download and install Docker on your machine. Next, simply clone the Jetpack repo anywhere on your computer and then cd into it using cd jetpack.

Next enter yarn docker:up. When that completes, your Jetpack dev environment will be ready. Just go to localhost in your web browser.

At this point, you’re all done. You can now start contributing to it.

Errors

If you run into any errors with yarn build or yarn docker:up, that’s probably because you cloned a forked repo which is not up to date with master.

To fix that, simply add a new remote called upstream that links to the original repo:

$ git remote add upstream https://github.com/Automattic/jetpack.git

And then fetch the updated code and merge it with your local master branch as follows:

$ git fetch upstream
$ git checkout master
$ git merge upstream/master

Now try yarn build again and everything should work just fine.

Python: BFS And DFS

BFS and DFS are two different ways you can traverse a Tree. BFS is Breadth First Search and DFS is Depth First Search.

To explain this further, consider the following tree:

You can see how BFS traversal works there. Also, you see there are three ways to do DFS traversal:

pre-order
in-order
and post-order.

Depth First Search

class BTnode:
 
	def __init__(self, value):
		self.value = value
		self.left = None
		self.right = None
 
def DFS(node):
	#left
	if node.left:
		DFS(node.left)
	#right
	if node.right:
		DFS(node.right)
	#root
	print(node.value, end=' ') # no new line after print
 
b = BTnode(1)
b.left = BTnode(2)
b.right = BTnode(3)
b.left.left = BTnode(4)
b.left.right = BTnode(5)
b.right.left = BTnode(6)
b.right.right = BTnode(7)
 
DFS(b)
print() #manually adding new line after result

The order of left, right and root can be changed there to convert that into pre-order and in-order.

Breadth First Search

class Queue:
	def __init__(self):
		self.items = []
 
	def put(self, item):
		self.items.append(item)
 
	def get(self):
		if self.items:
			return self.items.pop(0)
 
	def isEmpty(self):
		return self.items == []
 
class BTnode:
 
	def __init__(self, value):
		self.value = value
		self.left = None
		self.right = None
 
def BFS(node):
	q = Queue()
	q.put(node) #enqueue
 
	while not q.isEmpty():
		current = q.get() #dequeue
		print(current.value, end=' ') #prints without adding new line
 
		if current.left:
			q.put(current.left)
		if current.right:
			q.put(current.right)
 
b = BTnode(1)
b.left = BTnode(2)
b.right = BTnode(3)
b.left.left = BTnode(4)
b.left.right = BTnode(5)
b.right.left = BTnode(6)
b.right.right = BTnode(7)
 
BFS(b)
print() #manually adding new line after result

We’ve used a Queue there to implement BFS. Here’s how it works:

Add 1
Pop 1 and add it’s children 2 and 3
Pop 2 and add it’s children 4 and 5
Pop 3 and add it’s children 6 and 7
Pop all other items (as none have children)

This is illustrated in the image below:

That’s all in this post! 🙂

Python: Binary Search

This is a short post to share python implementation of binary search. What’s binary search you ask? It’s simple.

Imagine you have a list of numbers and you want to check if a particular number exists in it. Normally you’d search it one by one. That’s fine and this approach is called linear search.

The problem with linear search is that if the list of numbers is 1 million, it will take 1 million checks before we can confidently say whether the number exists in it or it doesn’t.

That’s where binary search is helpful. But in order for binary search to work, the list must be sorted.

In binary search, you look at the middle element of a sorted list and see if it is equal to the number we are looking for. If it isn’t, we only look at the first half of the list or the second half. Reducing the list in half at each check.

Thus, if the length of the list is N, binary search will take O(log N) time whereas linear search takes O(N).

Here’s python implementation of binary search:

def binary_search(arr, item):
 
	l = 0
	r = len(arr)-1
 
	while l<=r:
 
		mid = l + (r-l)//2
 
		if item == arr[mid]:
			return True
 
		elif item < arr[mid]:
			r = mid-1
 
		else:
			l = mid+1
 
	return False
 
 
#testing
arr = [2,5,6,9,11,15,18,20]
print(binary_search(arr, 1)) # prints False
print(binary_search(arr, 21)) # prints False
print(binary_search(arr, 20)) # prints True
print(binary_search(arr, 2)) # prints True
print(binary_search(arr, 11)) # prints True

Highlighted lines are really important to remember. We usually make mistakes there. For example, forgetting to return False if we don’t find the item in the given list.

Also, the explanation for mid = r - (r-l)//2 can be found here in a Stack Overflow post which is funny because it is an overflow related issue. This is also addressed in a Google blog here.

Python: Reverse A Linked List

The code given below will reverse a Linked List in place. In place means we won’t need additional space for reversing this list.

This will however destroy or modify our original list (so keep this in mind when you’re doing an in-place operation on a given input).

Ok so here’s the code to reverse a given Linked List:

class Node:
	
	def __init__(self, data):
		self.value = data
		self.next = None
 
def reverse_list(head):
 
	current = head
	previous = None
	nextnode = None
 
	while current:
 
		nextnode = current.next
		current.next = previous
 
		previous = current
		current = nextnode
 
	return previous
 
def print_list(head):
	p = head
	while p:
		print(p.value)
		p = p.next
 
# testing
a = Node(1)
b = Node(2)
c = Node(3)
 
a.next = b
b.next = c
 
print_list(a)
newhead = reverse_list(a)
print('after reversing: ')
print_list(newhead)
 
# output
# 1
# 2
# 3
# after reversing: 
# 3
# 2
# 1

The trick here is to maintain three pointers: current, previous, nextnode

And the process can be visualized on paper as follows:

That’s all in this post. Thanks for reading! 🙂

Python: Linked List

In arrays the size is fixed. This is why we need to know the upper limit of elements we can add.

If we have to add more than that, we’ll need to copy the elements from first array into a bigger array. This resizing of array is expensive.

As an alternative, we can use a Linked List. Here each node can stay anywhere in memory. The items/nodes need not be consecutive because each item/node holds a pointer to the next one. The image below illustrates my point:

To implement such a structure, we’ll make use of something called Node which can be a data structure in its own separate class. Then we’ll use LinkedList class which will set the head and have insert and remove functions.

Here’s how you can visualize a Linked List:

And here’s the full Python implementation:

class Node:
	def __init__(self, data):
		self.value = data
		self.next = None
 
class LinkedList:
	def __init__(self, data=None):
		self.head = data
 
	def printlist(self):
		p = self.head
 
		while p:
			print(p.value)
			p = p.next
 
	def insert(self, data):
		newnode = Node(data)
		newnode.next = self.head
		self.head = newnode
 
	def remove(self, data):
 
		p = self.head
		found = False
 
		if not p:
			return 'Empty List'
 
		if p.value == data:
			self.head = p.next
 
		while p.next:
 
			if p.next.value == data:
				p.next = p.next.next
				found = True
				break
			else: 
				p = p.next
 
		return found
 
# testing
l = LinkedList()
l.insert(1)
l.insert(2)
l.insert(3)
 
l.printlist()
 
for i in range(4):
	item = i+1
	print(l.remove(item))
 
# output:
# 3
# 2
# 1
# True
# True
# False
# Empty List

The code is again self explanatory. But to give you something to visualize, here’s the insertion of item 4 on paper:

And here’s the removal of 2 on paper:

That’s all in this post!

Python: Bitwise Magic

In this post, let’s have a look at bitwise XOR, bitwise AND and a program that calculates number of 1’s in a given integer’s binary form.

Here’s XOR ^ and AND & operations table:

bitwise XOR will return 0 if inputs are same, else it’ll return 1
bitwise AND will return 1 if both inputs are 1

Now look at the magic in the code below. To be specific, check out the highlighted line:

def count_ones(z):
 
	count = 0
 
	while z:
		print(z, bin(z))
		count += 1
		z = z & (z-1)
 
	return count
 
print('The number of 1s in 5: ', count_ones(5))
print()
print('The number of 1s in 28: ', count_ones(28))
 
#output
 
# 5 0b101
# 4 0b100
# The number of 1s in 5:  2
 
# 28 0b11100
# 24 0b11000
# 16 0b10000
# The number of 1s in 28:  3

The code above takes an integer, finds out how many 1s are there in its binary form and returns the number of 1s.

This is done using z = z & (z-1) which basically removes the rightmost one from the integer’s binary form.

For example, 28 can be written as 11100. It turns into 11000 in the first pass. 10000 in the second and 00000 in the third pass. Then the loop stops to return the count which is 3.

This can be very useful in calculating hamming distance. What is hamming distance you say? Here’s a quote from Wikipedia:

The Hamming distance between two integers is the number of positions at which the corresponding bits are different.

For example, the hamming distance between 1 and 4 is 2:

The simple logic to calculate this is to use bitwise XOR there. In this case 1 ^ 4 will return 5 which is 101 in binary. It has 2 ones in it.

How do we find these 2 ones in there using code? We combine bitwise XOR with bitwise AND as follows:

def hamming(x,y):
 
	count = 0
	z = x ^ y
 
	while z:
		print(z, bin(z))
		count += 1
		z = z & (z-1)
 
 
	return count
 
print(hamming(1,4))
 
#output
# 5 0b101
# 4 0b100
# 2

This is pretty cool right? That’s all I have in this post. See you tomorrow with another interesting CS concept. Thanks for reading!

Python: Measure Runtime

Just a short post on finding the execution time of a given code in Python. Here’s how to do it:

import time
 
start = time.time()
x = demoFunctionSum(x,y) # measuring the exec time of this code
end = time.time()
 
total = end-start
total = int(round(total*1000*1000)) # 10^6 is micro-seconds
print(total)

That’s all.

Just replace demoFunctionSum() with your function call 🙂

Python: OS, Exception And Urllib

In this post, I’ll share three things –

Using OS module to access and print directories,
Using exceptions to handle runtime errors and
Using urllib module to fetch data from webpage and store it as a string in a variable.

Starting with OS module – you can see how we print a file name in the current directory and then show it’s relative and absolute path:

# python program located at: /Users/omkarb/Desktop
 
import os
 
## Example pulls filenames from a dir, prints their relative and absolute paths
def printdir(dir):
	filenames = os.listdir(dir)
 
	print filenames[1] #boarding pass
	print os.path.join(dir, filenames[1]) #./boarding pass
	print os.path.abspath(os.path.join(dir, filenames[1])) #/Users/omkarb/Desktop/boarding pass
 
printdir('./')
 
#explanation
	#for filename in filenames:
		#print filename  ## foo.txt
		#print os.path.join(dir, filename) ## dir/foo.txt (relative to current dir)
		#print os.path.abspath(os.path.join(dir, filename)) ## /home/nick/dir/foo.txt

Then you can also use the commands module to run a command in the terminal. For example, this is how you run pwd command:

import commands
## Given a dir path, run an external 'ls -l' on it --
## shows how to call an external program
def listdir():
  cmd = 'pwd' #present working directory
  print "Command to run:", cmd   ## good to debug cmd before actually running it
  (status, output) = commands.getstatusoutput(cmd)
  if status:    ## Error case, print the command's output to stderr and exit
    sys.stderr.write(output)
    sys.exit(status)
  print output  ## Otherwise do something with the command's output
 
listdir() #prints /Users/omkarb/desktop

If you know a certain piece of code is possibly going to return an error then you can put it inside try-except block as follows:

import sys
filename = 'line.txt'
try:
  ## Either of these two lines could throw an IOError, say
  ## if the file does not exist or the read() encounters a low level error.
  f = open(filename, 'rU')
  text = f.read()
  f.close()
except IOError:
  ## Control jumps directly to here if any of the above lines throws IOError.
  sys.stderr.write('problem reading:' + filename)
## In any case, the code then continues with the line after the try/except

The above code prints out the the error case because line.txt doesn’t exist on my computer.

Next, we can read from URL using the following code:

import urllib
## Given a url, try to retrieve it. If it's text/html,
## print its base url and its text.
def wget(url):
  ufile = urllib.urlopen(url)  ## get file-like object for url
  info = ufile.info()   ## meta-info about the url content
  if info.gettype() == 'text/html':
    print 'base url:' + ufile.geturl()
    text = ufile.read()  ## read all its text
    # print text #prints text
 
wget('https://google.com')

However, it doesn’t include error handling. If the URL doesn’t work for some reason, we can handle it as follows:

import urllib
## Given a url, try to retrieve it. If it's text/html,
## print its base url and its text.
def wget2(url):
  try:
    ufile = urllib.urlopen(url)  ## get file-like object for url
    if ufile.info().gettype() == 'text/html':
      print 'base url:' + ufile.geturl()
      text = ufile.read()  ## read all its text
      # print text #prints text
  except IOError:
    print 'problem reading url:', url
 
wget2('https://google.com')

There’s also a simpler way to read the web page using urllib.urlretrieve method:

import urllib
 
result = urllib.urlretrieve("http://wordpress.org/")
print open(result[0]).read()

That’s all in this post 🙂

Python: Regular Expressions

To use regular expressions we have to import a module called re in Python. Let’s start with a simple example which searches the pattern “word” followed by three letters –

import re
str = 'batman starts with the word:bat!!'
match = re.search(r'word:\w\w\w', str)
# If-statement after search() tests if it succeeded
if match:                      
  print 'found', match.group() ## 'found word:bat'
else:
  print 'did not find'

Why the prefix r?

I was wondering why do we have a prefix r in there? Google’s Python course said: The ‘r’ at the start of the pattern string designates a python “raw” string which passes through backslashes without change which is very handy for regular expressions.

I didn’t quite get that, so I searched and found this StackOverflow post. It becomes clear with the following example –

>>> '\n'
'\n'
>>> r'\n'
'\\n'
>>> print '\n'
 
 
>>> print r'\n'
\n

Search Examples

Here are some more re.search examples which can be used in the second block of code in this post –

re.search(r'iii', 'niiice') # found iii
re.search(r'igs', 'niiice') # did not find
 
## . = any char but \n		
re.search(r'..e', 'niiice') # found ice
 
## \d = digit char, \w = word char
re.search(r'\d\d\d', 'n123ce') # found 123
re.search(r'\w\w\w', '$$batman&&') # found bat

Repetition

Here’s what I learned about finding repeated pattern in a given string:

Plus sign (+): 1 or more occurrences of the pattern to its left, e.g. ‘i+’ = one or more i’s
Star sign (*): 0 or more occurrences of the pattern to its left
Question mark (?): match 0 or 1 occurrences of the pattern to its left

re.search(r'\d\s*\d\s*\d', 'xx1 2   3xx') #found 1 2   3
re.search(r'\d\s*\d\s*\d', 'xx12  3xx') =>  # found 12  3
re.search(r'\d\s*\d\s*\d', 'xx123xx') =>  #found 123
 
re.search(r'^b\w+', 'foobatman') # did not find
re.search(r'b\w+', 'foobatman') # found batman

Finding An Email using Regular Expression

import re
 
str = 'contact superman at supes@earth.com'
 
#search 1 or more words followed by @ followed by 1 or more words
match = re.search(r'\w+@\w+', str) 
 
if match:
  print match.group()  ## 'supes@earth'

But it only returns the email address partially. We need to adjust the code in a way that will allow it to print the .com part as well.

The following code accommodates dots and dashes:

import re
 
str = 'contact superman at supes@g-mail.com'
 
#both sets can contain a word, a dash or a dot
match = re.search(r'[\w.-]+@[\w.-]+', str) 
 
if match:
  print match.group()  ## 'supes@g-mail.com'

Now that we have a way to find the email address, can we extract the username from it? Yes, we can! This can be done using group extraction in python. Just add parenthesis around the username and host as follows:

import re
 
str = 'contact superman at supes@g-mail.com'
 
#both sets can contain a word, a dash or a dot
match = re.search(r'([\w.-]+)@([\w.-]+)', str) 
 
if match:
  print match.group()  ## 'supes@g-mail.com'
  print match.group(1) # supes
  print match.group(2) #g-mail.com

FindAll()

There’s something called re.findall() will find all matches of a given pattern in the string as opposed to re.search() which only finds the first match of the given pattern.

import re
 
str = 'contact superman at supes@g-mail.com and batman at batsy@justice.com'
 
#both sets can contain a word, a dash or a dot
matches = re.findall(r'([\w.-]+)@([\w.-]+)', str) 
 
for match in matches:
	print match
	#print match[0] # prints supes, batsy
	#print match[1] # prints g-mail.com, justice.com
 
##prints
#('supes', 'g-mail.com')
#('batsy', 'justice.com')

Bonus

Before ending this post, I want to add a point about greedy vs non-greedy aspect about regular expressions which I learned from Google’s python course.

Let’s say we want to match html tags in the following string:

<b>boldman</b> and <i>italicman</i>

It’s common to come up with a solution like <.*> – which will match for any string starting and ending with < and >. However, that matches the whole string instead of individual tags as follows:

import re
 
str = '<b>boldman</b> and <i>italicman</i>'
 
#both sets can contain a word, a dash or a dot
match = re.findall(r'<.*>', str) 
 
if match:
	print match
 
#result
#['<b>boldman</b> and <i>italicman</i>']

It can be fixed by adding ? in the solution as follows: <.*?>

import re
 
str = '<b>boldman</b> and <i>italicman</i>'
 
#both sets can contain a word, a dash or a dot
match = re.findall(r'<.*?>', str) 
 
if match:
	print match
 
#result
#['<b>', '</b>', '<i>', '</i>']

That’s all in this post. Thanks for reading 🙂

Python: Reading Files

This is how you open and read files in Python:

#lines.txt has the following three lines (without #)
#this is line 1
#this is line 2
#this is line 3
 
# Echo the contents of a file
f = open('lines.txt', 'rU')
for line in f:   ## iterates over the lines of the file
  print line,    ## trailing , so print does not add an end-of-line char
                 ## since 'line' already includes the end-of line.
f.close()
 
#output is the text in lines.txt

The second parameter in open method is the mode. We can open a file using following modes:

A) Read (r) – for reading from the file
B) Write (w) – for writing to the file
C) Append (a) – for appending text to the file
D) Universal (rU) – for reading (but being smart about different line endings, so they all convert to \n )