Python basics
In this lesson, we will add a couple additional data types to our repertoire and learn about how to convert from one data type to another. As in the previous lesson, you should start a spyder session and follow along with the content below in a file and then run that file to test different parts of the code. Don’t forget to use the console to interact with the variables that you create in your file.
The best way to learn how to program is to actually do something useful. So, in this lesson we will work with data from The National Centers for Environmental Information (NCEI). But first, a note on white space in python!
Indenting
95% of the time, python doesn’t care about white space. You can have 0 blank lines in your code or you could put 5 blank lines between every line of code and python wouldn’t care at all. The one place white space does matter, though, is at the beginning of a line. Python uses indentation to specify certain blocks of code. What this means is that you should only use white space at the beginning of a line if you are sure it’s supposed to be there. For python, 1 blank space at the beginning of a line is the same as 2 is the same as a full tab (which is 4 blank spaces btw). So, while this will work:
[ ]:
print("no blank spaces here!!")
This:
[ ]:
print("uh-oh...")
and this:
[ ]:
print("still no")
will not. So now, if you get an IndentationError you will know why!
The type() function
Ok, back to the data. The land station closest to Ypsilanti, MI is the SE Ann Arbor station, so let’s save information about that station. In the NCEI, every station has a name and a unique ID.
[1]:
stationName = "Ann Arbor SE, MI US"
stationID = "GHCND:USC00200228"
stationLat = "42.2416"
stationLon = "-83.6933"
Here we’ve created four variables that provide information about the Ann Arbor Station. Based on those lines of code, you should be able to identify the data types of each of those variables. If you aren’t sure, then remember, we can use the type() function:
[2]:
print(type(stationName))
<class 'str'>
[3]:
print(type(stationLat))
<class 'str'>
Type conversion
All four of our variables are of type str or character strings. This makes sense for the stationname and stationID. Less so for the other two. Of course we could just enter the lat and lon without the ” marks, but instead of doing that, we can easily use python’s type conversion functions.
[4]:
stationLat = float(stationLat)
print(stationLat)
42.2416
[5]:
stationLon = float(stationLon)
print(stationLon)
-83.6933
Here, we use the float() function because we want the result of the conversion to be of a real number (type float). For our purposes, the most used type conversion functions are float(), int(), and str() to convert to a float, integer, and character string respectively.
Note
Our conversion operation above was destructive; in the first example, we took the variable stationLat which was originally a string and converted it to a float with the same name. The original information was lost as a result of this operation. In this case, that makes sense. We didn’t need the string representation of the latitude anyway. But, you could imagine a situation where you performed a similar operation but didn’t want to loose the original information. In that case we would
simply have called the result of the operation something different.
We can also use the int() function here:
[6]:
stationLat_rounded = int(stationLat)
print(stationLat_rounded)
42
where I’ve converted the float stored in the stationLat variable to an integer stored in the stationLat_rounded variable. The result of the int function must be an integer so the result of this operation is the station latitude rounded to the nearest whole number.
So, during type conversion, python has some rules that it applies to make the conversion happen. What happens then if we try to convert our station name to a number data type?
[7]:
print(float(stationName))
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-7-f2aaf7f3ff9d> in <module>
----> 1 print(float(stationName))
ValueError: could not convert string to float: 'Ann Arbor SE, MI US'
As you can see, we get a ValueError with the message that we can’t do that conversion. The reason should be obvious. Our station name is comprised of a bunch of letters! It doesn’t make sense to represent these as a real number.
Working with variables
In the previous lesson, we say that python can handle standard mathematical operations like addition and subtraction. Python uses those same operators to perform other operations as well:
[8]:
nameAndID = stationName+"_"+stationID
print(nameAndID)
Ann Arbor SE, MI US_GHCND:USC00200228
In this operation, I’ve combined the station name with the station ID and separated the two with an underscore using the + operator.
However, python will throw an error if you try to combine different data types this way:
[9]:
print(stationName+"_"+stationLat)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-9-a7519279e542> in <module>
----> 1 print(stationName+"_"+stationLat)
TypeError: can only concatenate str (not "float") to str
As you can see from the TypeError, this type of operation is called concatenation, and we can only concatenate strings. So you might ask what if we want to efficiently print the values of our variables to the screen (or a file)? Python’s print() function allows us to do this in a few different ways. First, the easy way:
[10]:
print(stationName+"_",stationLat)
Ann Arbor SE, MI US_ 42.2416
print() can take multiple arguments! Then, based on those arguments it will print strings, or string representations of non-strings, to the screen in a way that is more or less expected. But is there a better way? Yes! The slightly less-easy but better way:
[11]:
print("{}_{}".format(stationName,stationLat))
Ann Arbor SE, MI US_42.2416
This is the most “python” way of printing information to strings (or files). The details of what’s happening here are a little bit more than we need to worry about, but basically, we are using the curly brackets as placeholders. Then, print substitutes in the arguments of the .format() method. This may not look all that useful, but printing information in this manner is way, way more flexible. For example, when used this way, I can format my output to look whatever way I want very quickly
(e.g. do I want to show only 1 decimal place instead of 4? Do I want to repeat the use of a certain variable? Do I want to quickly change the order of the variables printed?) Again, I wont go into detail here, but you should be aware that this exists and is useful when you want to print more than 1 variable and format your output in a certain way.
Try it yourself
Try to print the station name and the station latitude and longitude separated by a hyphen using the .format notation as above.
A quick intro to object oriented programming
There is one major thing that we haven’t talked about yet that is somewhat important. In the last example, we did something kind of weird. We used a function that had a dot in front of it, and then I called it a “method”. This syntax is the first time that we’ve really been exposed to the object orientedness of python. Python is an object oriented programming language. That means that we can create things called objects, which means they have special powers. In fact, python is famous because it takes its object orientedness very seriously in that everything in python is considered an object. Once again, I’m not going to go into a lot of detail here. I mention this now because I want you to get familiar with the jargon of object oriented programming so that it is less confusing when looking at documentation or code examples.
In short, object oriented programming (OOP) places emphasis on the associations between the data and the operations that are done on them. An example: suppose you are analyzing a population of students and tracking how well they performed on a certain exam based on which curriculum they took. Every student would have a set of parameters that are basically the same, GPA, academic year, age, major but could take on certain values that of course would be different. The object oriented approach to this example would be to define an object, maybe called student, and associate those parameters with the student object. So you could imagine creating a specific student object called “Ben” and specifying his major as “physics”. This would look something like (note that this wont work in python because we would have to do a few more things, namely specify what a Student is. But this is the gist):
[12]:
Ben = Student()
Ben.major = "Physics"
Ben.GPA = 3.6
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-12-d7ad3fc836a2> in <module>
----> 1 Ben = Student()
2 Ben.major = "Physics"
3 Ben.GPA = 3.6
NameError: name 'Student' is not defined
In OOP, we try to maintain the natural links between data and the tools that we use to describe the data and do work with the data. In OOP jargon, Ben is an instance of a Student object and it has certain attributes: major, GPA, etc. We access those attributes using dot notation as demonstrated above. Certain attributes can be methods:
[ ]:
Ben.add_grade("B")
which is basically the OOP word for a function that belongs to a specific object. When ever you see the dot notation in python, you are seeing the attributes of a specific instance of an object. Those attributes only mean something for objects of a certain type.
Back to our weather station example. When I used the .format() method, I was using it on an object of type str (see those quotation marks?). The format() method only makes sense in the context of printing out a string to the screen or a file, so objects that have data type str have access to that method while objects of a different type do not.