Python — The swiss army knife Part 1

My first exposure to programming in my Ph.D. came in the form of reading a bunch of netcdf files containing precipitation time series, averaging the data over time and visualize it spatially. Initially, I tried to use NetCDF Operators (NCO) and Climate Data Operators (CDO) to perform averaging or any other statistical analysis over the data, save the result in a seperate netcdf file, read the file in NCL and visualize it. But this whole procedure felt cumbersome and inefficient to me and whenever I did my analysis and visualization, I wished that somehow I could do all of it in the same programming environment!

As luck would have it, I found the solution to my predicament in the form of Python. The data reading and analysis packages such as cdms2, cdutil, genutil in CDAT together with the power of numpy and scipy and visualization packages such as PyNGL/PyNIO, matplotlib and seaborn streamlined my workflow in a very efficient way.

But for leveraging the power of these packages, it is necessary to first grasp the basic syntax and nuances of Python programming. Here I will share the basics of python and in the subsequent posts we will see how these packages can simplify the whole process of data analysis and visualization. Please be aware that syntax followed in this post and further posts will be based on python 3. Some syntax such as print statement or iterating over dictionary will differ for python 2 and python 3.


Python Introduction

Python jargons

Python is a dynamic language just like MATLAB. There is no need to define the type of any variable like we do in C or FORTRAN. For example

a = 2.0; b = 'hello'
print ('a = ',a, 'and', 'b = ',b)  
## ('a = ', 2.0, 'and', 'b = ', 'hello')

Every variable in python has a datatype. We do not define the type of a variable. Python automatically assigns the type. At any moment, to check the data type of a variable use type function. For example

print (type(b))
## <type 'str'>

Basic data types are

  • String
  • Number
    • Integer
    • Long
    • Float
    • Complex
  • Set
  • Boolean

Each variable has with it associated methods. Methods are the operations which can be operated on a variable. For example

print (b.split('e')) # Splits string into strings having characters before and after e
## ['h', 'llo']

Mathematical operations are just like in MATLAB. Basic mathematical operations are

  1. Multiplication *
  2. Division /
  3. Addition +
  4. Subtraction -
  5. Exponent **

Data structures in python

Python has three important data structures

  • List
  • Tuple
  • Dictionary

Use of each data structure will be explained further.

List

List is a data structure which can hold items regardless of their type. It can hold integers, strings, complex numbers etc. It can also hold many lists within itself. It is created by entering comma separated items in square brackets. For example :

a_list = [1,2,3,'hello']; print (a_list)
## [1, 2, 3, 'hello']

Check the methods associated with the list by typing dir(a_list).

Before going further we talk about mutable and immutable data structures.

  • Mutable data structures are those data structures which can be altered in place. Which means we can delete an item, replace an item, alter an item, append an item inside that data structure.

  • Immutable data structures are those data structures which can not be altered in place. We can make copy of it with the changes to a different variable but we can not make any in place changes.

We will further look at examples to understand this.

Tuple

Tuple is also a data structure which can hold items regardless of their type just like a list. It is created by entering comma separated items in parenthesis. For example :

a_tuple = (1,2,3,'hello',1.2); print (a_tuple)
## (1, 2, 3, 'hello', 1.2)

Again check the methods associated with the tuple by typing dir(a_tuple). You will find that some methods available for list like pop, delete, append etc are missing for tuple. This is because a tuple is immutable. To change a tuple you need to create another tuple with the changes.

a_list.append(10)
print (a_list)
## [1, 2, 3, 'hello', 10]

If you try a_tuple.append(10), it will throw up error because tuple is an immutable object.

print (a_tuple)
## (1, 2, 3, 'hello', 1.2)
new_tuple = (10,11)
print (new_tuple)
## (10, 11)
final_tuple = a_tuple + new_tuple
print (final_tuple)
## (1, 2, 3, 'hello', 1.2, 10, 11)

The important difference between a list and a tuple is that all things being same, a list is a mutable container of items, a tuple on the other hand is immutable.

For this very reason, iterating over a tuple is always faster than iterating over a list.

We will talk about iterating over list, tuple and dictionary later.

Dictionary

Dictionary is the most versatile and probably the most helpful data structure in python.

Dictionary is also a mutable container of items but in dictionary we can assign each item or a list/tuple of items a key. A key acts as an identifier to a particular item or a list of items. A dictionary is created by entering comma separated key value pairs in curly brackets. Key value pairs hold a colon between them. For example :

a_dict = {'Name':'Puneet', 'Date of Birth':6, 'Month of Birth':11, 'Year of Birth':1988, 'Lab mates': ['Ram','Amit','Satya']}

print (a_dict.keys())
## ['Month of Birth', 'Date of Birth', 'Year of Birth', 'Lab mates', 'Name']
print (a_dict.values())
## [11, 6, 1988, ['Ram', 'Amit', 'Satya'], 'Puneet']

You will see that, once the dictionary is created, the order of printed items (keys) is not same as the order of entered items. But no need to worry! You can always access any item with its key.

print (a_dict['Name'])
## Puneet

Please note that a dictionary is a mutable data structure.

Subsetting, iterating over data structures and different control statements will be covered in the next post.

Phew!! That was a lot to talk about. I need a drink.

Avatar
Puneet Sharma
Research Scholar

My research interests include cloud & aerosol modeling and statistics.

Related

Next
Previous
comments powered by Disqus