Chapter 5 Python

5.1 Goal

Get a basic familiarity with what kind of language Python is and why we’re using it

5.2 Learning Objectives

After going through this chapter, students should be able to:

  • List the advantages and limitations of Python as a programming language
  • Explain why Python is a complement to bash
  • Understand why Python is being used for this class

5.3 What Python is

Python is a popular language for many applications, including bioinformatics. That is because it is easy to learn and read. In addition, it is well documented and supported by a large community. The are a vast number of packages extending Python’s functionality, including many specifically geared towards biological and bioinformatic applications.

Python is a high-level programming language. What that means is that much of fiddly details needed to make your code work is hidden under the hood and you will never need to worry about them. This makes writing code simple and fast. In addition, Python does not need to be compiled. This allows you to execute code as you write it.

Finally, Python is an object-oriented programming language meaning that every variable and function in python is actually a class that has built-in attributes and functions. While this may not make sense yet, it makes Python a very powerful language.

5.4 What Python is not

Although it is a clear language to read and write, Python is not a miracle cure for programming. You will still need to learn the syntax of the language, as well as it’s relatively small base vocabulary. Finally, you will still need to be able to translate strategies for answering questions or performing tasks into a logical set of steps that can be executed.

Python is also slower than compiled languages and bash. This is especially true when dealing with loops. However, there are ways to significantly speed up the execution of Python. Many python packages have compiled C-code hidden within their function calls, such that you have the convenience of Python but the speed of a compiled language.

5.5 Why do we need Python if we have bash?

While bash has many powerful functions and basic programming logic, it is not a true programming language but rather a command-line interpreter. There are several advantages and some disadvantages to using Python over bash.

Bash Python
Difficult to write Easy to write
Hard to read Easy to read
Less powerful More powerful
Faster Slower
No third-party programs needed Third-party programs may be needed

Ultimately, you should be using these two approaches together to take advantage of the strengths of each. You will find that creating an analysis pipeline is quite effective when using bash to connect the inputs and outputs of multiple programs (including your own Python scripts) to take raw data through processing and analysis.

5.6 Why Python is right for this class

Given the range of student backgrounds and exposure to coding, Python offers a useful middle ground of easy to read and learn code while maintaining the power and flexibility that more advanced coders may need.

If you perform any data analysis in biology (and you will), it is inevitable that you will encounter programs written in Python. You may never see the code, but exposure to programming concepts now will give you a greater understanding of

  1. how the programs work
  2. troubleshooting issues when running them
  3. tackling common problems like reformatting data

Most importantly, writing code in Python should be fun, not scary. And that is what will keep you using the skills that you learn in this class.

5.7 Other programming languages, pseudocoding, and learning to write code specific to Python

There are other programming languages besides Python which we will not be teaching you in bootcamp. These include R, C++, Java, and many more. Each language has its benefits. Like R is a powerful language for statistics. C++ is fantastic for wanting to write programs that are fast. We’ve described above why Python is right for this class. Before we delve into the fundamentals of writing code specific to Python, in the next chapter, we first are going to overview pseudocoding. The advantages of learning to pseudocode before actually coding are based on the idea that pseudocode is language agnostic and doesn’t use language-specific syntax. Rather, it focuses on logically breaking down problems into smaller, manageable tasks that you express in your natural speaking thoughts. Therefore, in the next chapter (Chapter 6), we’ll overview pseudocoding, providing some examples and guidelines. Then in Chapters 7-10, we’ll cover fundamentals and specifics of Python syntax, necessary to write code. Finally, in Chapters 12-18, you’ll have concrete practice pseudocoding and writing Python code.