Python: OS, Exception And Urllib

In this post, I’ll share three things –

  • Using OS module to access and print directories,
  • Using exceptions to handle runtime errors and
  • Using urllib module to fetch data from webpage and store it as a string in a variable.

Starting with OS module – you can see how we print a file name in the current directory and then show it’s relative and absolute path:

# python program located at: /Users/omkarb/Desktop
 
import os
 
## Example pulls filenames from a dir, prints their relative and absolute paths
def printdir(dir):
	filenames = os.listdir(dir)
 
	print filenames[1] #boarding pass
	print os.path.join(dir, filenames[1]) #./boarding pass
	print os.path.abspath(os.path.join(dir, filenames[1])) #/Users/omkarb/Desktop/boarding pass
 
printdir('./')
 
#explanation
	#for filename in filenames:
		#print filename  ## foo.txt
		#print os.path.join(dir, filename) ## dir/foo.txt (relative to current dir)
		#print os.path.abspath(os.path.join(dir, filename)) ## /home/nick/dir/foo.txt

Then you can also use the commands module to run a command in the terminal. For example, this is how you run pwd command:

import commands
## Given a dir path, run an external 'ls -l' on it --
## shows how to call an external program
def listdir():
  cmd = 'pwd' #present working directory
  print "Command to run:", cmd   ## good to debug cmd before actually running it
  (status, output) = commands.getstatusoutput(cmd)
  if status:    ## Error case, print the command's output to stderr and exit
    sys.stderr.write(output)
    sys.exit(status)
  print output  ## Otherwise do something with the command's output
 
listdir() #prints /Users/omkarb/desktop

If you know a certain piece of code is possibly going to return an error then you can put it inside try-except block as follows:

import sys
filename = 'line.txt'
try:
  ## Either of these two lines could throw an IOError, say
  ## if the file does not exist or the read() encounters a low level error.
  f = open(filename, 'rU')
  text = f.read()
  f.close()
except IOError:
  ## Control jumps directly to here if any of the above lines throws IOError.
  sys.stderr.write('problem reading:' + filename)
## In any case, the code then continues with the line after the try/except

The above code prints out the the error case because line.txt doesn’t exist on my computer.

Next, we can read from URL using the following code:

import urllib
## Given a url, try to retrieve it. If it's text/html,
## print its base url and its text.
def wget(url):
  ufile = urllib.urlopen(url)  ## get file-like object for url
  info = ufile.info()   ## meta-info about the url content
  if info.gettype() == 'text/html':
    print 'base url:' + ufile.geturl()
    text = ufile.read()  ## read all its text
    # print text #prints text
 
wget('https://google.com')

However, it doesn’t include error handling. If the URL doesn’t work for some reason, we can handle it as follows:

import urllib
## Given a url, try to retrieve it. If it's text/html,
## print its base url and its text.
def wget2(url):
  try:
    ufile = urllib.urlopen(url)  ## get file-like object for url
    if ufile.info().gettype() == 'text/html':
      print 'base url:' + ufile.geturl()
      text = ufile.read()  ## read all its text
      # print text #prints text
  except IOError:
    print 'problem reading url:', url
 
wget2('https://google.com')

There’s also a simpler way to read the web page using urllib.urlretrieve method:

import urllib
 
result = urllib.urlretrieve("http://wordpress.org/")
print open(result[0]).read()

That’s all in this post 🙂

Leave a comment