Part 4 : Loops and control structures

Why Python ?

In the first part we have discussed variables that can store data and values, and extended this to lists and dictionnaries.

A very common task in programming is to test the value of a variable, and decide what to do depending on this test. To perform such tests, we need conditional structures or if ... else ... structures.

When we have an list of values, we also need to iterate over all values contained in the list: for example, iterate over all students and print the scores for each student stored in a list. To perform this, we need loops (for or while loops).


For and while loops

Intro

It is often required to iterate instructions a certain number of times. There are two major types of loops, the for and the while loop, which are both implemented in the python language.

For loops

The For loop can be used to repeat an instruction a certain number of times. It can also be used to iterate over all elements of a list, of a string, all keys of a dictionnary, lines of a file, etc...

1
2
3
4
5
6
7
8
9
# The variable v takes all values in the list
for v in liste:                         # For each element of the list ...
   instruction(v)                       # ... do something with it

# The variable v takes all values in the list
# p contains the index of the element
for p,v in enumerate(liste):                    # For each element of the list, retrieve its value (v) and index (p)
   instruction(v)                                               # Do something with v
   instruction(p)                                               # Do something with p

In the following, we give some example of for loops; copy paste the instructions and answer the questions:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
   ###################################################
   ### Repeat an operation n times                 ###
   ###################################################
   n = 10
   for i in range(n):
           print("Let's do it !")


   ###################################################
   ### Function range()                            ###
   ###################################################
   help(range)

   # Exercice 1 : Write a program which prints the multiples of 3 smaller than 20


   ###################################################
   ### Iterate over the elements of a list         ###
   ###################################################
   chroms = ["Chr1", "Chr2", "Chr3", "Chr4", "Chr5", "Chr6", "Chr7"]

   for c in chroms:
           print(c)

   for p,c in enumerate(chroms):
           print(p, c)

   ###################################################
   ### Iterate over the characters of a string     ###
   ###################################################
   dna = 'ATGCTCGCTCGCTCGATGAAAATGTG'

   # go through all elements of the string
   for  nuc in dna:
     print("The sequence contains "+ nuc)

   # Exercice 2: Go through all the characters of the string and print
   # "the sequence contains character x at position y"

   ###################################################
   ### Iterate through a dictionnary               ###
   ###################################################
   # A dictionnary containing the user names as keys and
   # the passwords as values

   login2passwd = {"Tim":"abc123", "John":"qwerty", "Alice":"1234567"}

   # Print keys
   for key in login2passwd:
       print(key)

   # Print key and value
   # Use the method item()
   for key, value in login2passwd.items():
       print("Value is " + value + " for key " + key)

   # Exercice 3: Use a loop to print the name of the users
   # which have a password containing the string "123"
  1. Start again with the array containing the number of birds:

    number_of_birds = [28, 32, 1, 0, 10, 22, 30, 19, 145, 27, 36,
    25, 9, 38, 21, 12, 122, 87, 36, 3, 0, 5, 55,
    62, 98, 32, 900, 33, 14, 39, 56, 81, 29, 38,
    1, 0, 143, 37, 98, 77, 92, 83, 34, 98, 40,
    45, 51, 17, 22, 37, 48, 38, 91, 73, 54, 46,
    102, 273, 600, 10, 11]
    
  2. Use a for loop to iterate over all values in this list, and for each value print the message:

    "At site xx : yy birds counted"
    
Solution (click to open)
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
             number_of_birds = [28, 32, 1, 0, 10, 22, 30, 19, 145, 27, 36,
             25, 9, 38, 21, 12, 122, 87, 36, 3, 0, 5, 55,
             62, 98, 32, 900, 33, 14, 39, 56, 81, 29, 38,
             1, 0, 143, 37, 98, 77, 92, 83, 34, 98, 40,
             45, 51, 17, 22, 37, 48, 38, 91, 73, 54, 46,
             102, 273, 600, 10, 11]


             for position,number in enumerate(number_of_birds):
                     print "At position",position,": Number of birds =",number

While loops

1
2
3
4
5
6
7
   while condition:          # while the condition is true ...
        statement(s)         # ... execute these instructions

   example:

   while a < 10 :
     a=raw_input("Enter a number : ")
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
   #########################################
   ###     An example of while loop    ###
   #########################################

   password=None # The variable is initialized to None

   # Do a while loop

   while password != "^^@@??))":                                             ## As long a password is NOT ^^@@??
        password=raw_input("Enter a password please:")  ## Ask for the password
   print "Welcome"                ## print "Welcome" if the password is right !

   ## Example of infinite loop ...

   while True:
      print("This is True and will remain so for a while ...")

Exercice 2 : extracting the codons

Advice : try to solve each step independently; make sure each step works, before trying to combine the different task.

  1. Define (again...) a longer DNA sequence:

    dna = "ttcacctatgaatggactgtccccaaagaagtaggacccactaatgcagatcctgtgtgtctagctaagatgtattattctgctgtggatcccactaaagatatattcactgggcttattgggccaatgaaaatatgcaagaaaggaagtttacatgcaaatgggagacagaaagatgtagacaaggaattctatttgtttcctacagtatttgatgagaatgagagtttactcctggaagataatattagaatgtttacaactgcacctgatcaggtggataaggaagatgaagactttcaggaatctaataaaatgcactccatgaatggattcatgtatgggaatcagccgggtctcactatgtgcaaaggagattcggtcgtgtggtacttattcagcgccggaaatgaggccgatgtacatggaatatacttttcaggaaacacatatctgtggagaggagaacggagagacacagcaaacctcttccctcaaacaagtcttacgctccacatgtggcctgacacagaggggacttttaatgttgaatgccttacaactgatcattacacaggcggcatgaagcaaaaatatactgtgaaccaatgcaggcggcagtctgaggattccaccttctacctgggagagaggacatactatatcgcagcagtggaggtggaatgggattattccccacaaagggagtgggattaggagctgcatcatttacaagagcagaatgtttcaaatgcatttttagataagggagagttttacataggctcaaagtacaagaaagttgtgtatcggcagtatactgatagcacattccgtgttccagtggagagaaaagctgaagaagaacatctgggaattctaggtccacaacttcatgcagatgttggagacaaagtcaaaattatctttaaaaacatggccacaaggccctactcaatacatgcccatggggtacaaacagagagttctacagttactccaacattaccaggtaaactctcacttacgtatggaaaatcccagaaagatctggagctggaacagaggattctgcttgtattccatgggcttattattcaactgtggatcaagttaaggacctctacagtggattaattggccccctgattgtttgtcgaagaccttacttgaaagtattcaatcccagaaggaagctggaatttgcccttctgtttctagtttttgatgagaatgaatcttggtacttagatgacaacatcaaaacatactctgatcaccccgagaaagtaaacaaagatgatgaggaattcatagaaagcaataaaatgcatgctattaatggaagaatgtttggaaacct"
    
  2. Extract and print all codons (you can use a for or while loop...)

  3. Test if the codon corresponds to a start codon (ATG) ; if yes, print its position

  4. Define a dictionnary containing the correspondence between codons and amino acids:

    map = {"TTT":"F", "TTC":"F", "TTA":"L", "TTG":"L",
    "TCT":"S", "TCC":"s", "TCA":"S", "TCG":"S",
    "TAT":"Y", "TAC":"Y", "TAA":"STOP", "TAG":"STOP",
    "TGT":"C", "TGC":"C", "TGA":"STOP", "TGG":"W",
    "CTT":"L", "CTC":"L", "CTA":"L", "CTG":"L",
    "CCT":"P", "CCC":"P", "CCA":"P", "CCG":"P",
    "CAT":"H", "CAC":"H", "CAA":"Q", "CAG":"Q",
    "CGT":"R", "CGC":"R", "CGA":"R", "CGG":"R",
    "ATT":"I", "ATC":"I", "ATA":"I", "ATG":"M",
    "ACT":"T", "ACC":"T", "ACA":"T", "ACG":"T",
    "AAT":"N", "AAC":"N", "AAA":"K", "AAG":"K",
    "AGT":"S", "AGC":"S", "AGA":"R", "AGG":"R",
    "GTT":"V", "GTC":"V", "GTA":"V", "GTG":"V",
    "GCT":"A", "GCC":"A", "GCA":"A", "GCG":"A",
    "GAT":"D", "GAC":"D", "GAA":"E", "GAG":"E",
    "GGT":"G", "GGC":"G", "GGA":"G", "GGG":"G"}
    
  5. Use this dictionnary to translate the sequence (start from the first position; alternatively, start from the first atg)

  6. Additional things to implement :

    • detect open reading frame (= ORF) in all 3 frames
    • print length of ORF
Solution (click to open)
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
dna = "ttcacctatgaatggactgtccccaaagaagtaggacccactaatgcagatcctgtgtgtctagctaagatgtattattctgctgtggatcccactaaagatatattcactgggcttattgggccaatgaaaatatgcaagaaaggaagtttacatgcaaatgggagacagaaagatgtagacaaggaattctatttgtttcctacagtatttgatgagaatgagagtttactcctggaagataatattagaatgtttacaactgcacctgatcaggtggataaggaagatgaagactttcaggaatctaataaaatgcactccatgaatggattcatgtatgggaatcagccgggtctcactatgtgcaaaggagattcggtcgtgtggtacttattcagcgccggaaatgaggccgatgtacatggaatatacttttcaggaaacacatatctgtggagaggagaacggagagacacagcaaacctcttccctcaaacaagtcttacgctccacatgtggcctgacacagaggggacttttaatgttgaatgccttacaactgatcattacacaggcggcatgaagcaaaaatatactgtgaaccaatgcaggcggcagtctgaggattccaccttctacctgggagagaggacatactatatcgcagcagtggaggtggaatgggattattccccacaaagggagtgggattaggagctgcatcatttacaagagcagaatgtttcaaatgcatttttagataagggagagttttacataggctcaaagtacaagaaagttgtgtatcggcagtatactgatagcacattccgtgttccagtggagagaaaagctgaagaagaacatctgggaattctaggtccacaacttcatgcagatgttggagacaaagtcaaaattatctttaaaaacatggccacaaggccctactcaatacatgcccatggggtacaaacagagagttctacagttactccaacattaccaggtaaactctcacttacgtatggaaaatcccagaaagatctggagctggaacagaggattctgcttgtattccatgggcttattattcaactgtggatcaagttaaggacctctacagtggattaattggccccctgattgtttgtcgaagaccttacttgaaagtattcaatcccagaaggaagctggaatttgcccttctgtttctagtttttgatgagaatgaatcttggtacttagatgacaacatcaaaacatactctgatcaccccgagaaagtaaacaaagatgatgaggaattcatagaaagcaataaaatgcatgctattaatggaagaatgtttggaaacct"

dna = dna.upper()

map = {"TTT":"F", "TTC":"F", "TTA":"L", "TTG":"L",
"TCT":"S", "TCC":"s", "TCA":"S", "TCG":"S",
"TAT":"Y", "TAC":"Y", "TAA":"STOP", "TAG":"STOP",
"TGT":"C", "TGC":"C", "TGA":"STOP", "TGG":"W",
      "CTT":"L", "CTC":"L", "CTA":"L", "CTG":"L",
      "CCT":"P", "CCC":"P", "CCA":"P", "CCG":"P",
      "CAT":"H", "CAC":"H", "CAA":"Q", "CAG":"Q",
      "CGT":"R", "CGC":"R", "CGA":"R", "CGG":"R",
      "ATT":"I", "ATC":"I", "ATA":"I", "ATG":"M",
      "ACT":"T", "ACC":"T", "ACA":"T", "ACG":"T",
"AAT":"N", "AAC":"N", "AAA":"K", "AAG":"K",
"AGT":"S", "AGC":"S", "AGA":"R", "AGG":"R",
"GTT":"V", "GTC":"V", "GTA":"V", "GTG":"V",
"GCT":"A", "GCC":"A", "GCA":"A", "GCG":"A",
"GAT":"D", "GAC":"D", "GAA":"E", "GAG":"E",
"GGT":"G", "GGC":"G", "GGA":"G", "GGG":"G"}

protein = ""

for i in range(0,len(dna),3):
       codon = dna[i:i+3]
       if (len(codon) == 3):
            AA = map[codon]
            print "Codon=",codon,"-> Amino Acid=",AA
            protein = protein + AA
print "The protein sequence is",protein


in_orf = 0     # is 0 if outside an orf, and 1 if in an orf
for i in range(0,len(dna),3):
      codon = dna[i:i+3]
      if (len(codon) == 3):
             if ((codon == 'ATG') and (in_orf == 0)):   # checks if a start codon is encountered, and we are not in an ORF yet !
                    orf = map[codon]
                    in_orf = 1  # mark that we have entered a new ORF
             elif ((codon in ['TAA','TAG','TGA']) and (in_orf == 1)):   # if stop codon AND we are in an ORF
                     print "New ORF : ",orf;     # print the ORF
                     orf = ""     # initiate a new ORF
                     in_orf = 0   # we are not in an ORF anymore
             elif (in_orf == 1):  # we are in an ORF an add the new codon
                     orf = orf + map[codon]

Breaks and continue

As in many other languages, the command break can be used to exit a loop. Continue can be used to jump to the next iteration without executing the block of instructions.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
##########################
###  Break and continue  ###
###########################
## Example 1
# The output will be
# 0 1 2 3 4

for i in range(10):
   if i==5:
      break
   print(i)

## Example 2
# The output will be
# 1 2 8 9 10

i = 0
while i < 10:
   i += 1
   if  2 < i < 8:
       continue
   print(i)

Conditional structures

Introduction

With a conditional structure, one can test if a condition is true or false. Hence, we will evaluate a condition (e.g. a < b, len(a) = 10, etc...) and perform a different set of intructions depending on whether the test returned True or False.

A simple if structure :

1
2
3
   if test:                        # 'test' is a boolean expression (e.g. a<b, b>= 10,...)
           instruction(s)          # a block of instructions to be executed if the test is true
           more instructions

A if .. else.. structure:

1
2
3
4
   if test:                              # test is a boolean expression
        instruction(s)                   # what to do if the expression is True
   else:                                 # alternative
        instruction(s)                   # what to do if the expression is False

A if .. elif .. else structure :

1
2
3
4
5
6
7
8
   if test1:                                            # 'test1' is a boolean expression
      instruction(s)                            # what to do if test1 is True
   elif test2:                                          # if test1 is False, check test2
      instruction(s)                            # what to do if test2 is True
   elif test3:                                          # if test2 is False, check test3
      instruction(s)                            # what to do if test3 is True
   else:                                                        # if all tests are false ...
      instruction(s)                            # ... execute this instruction

Nested if-conditions

Tests for conditions can be nested in a hierarchical manner:

1
2
3
4
5
6
   if test:                                             # 'test' is a boolean expression
           instruction(s)                               # what to do if test is True
           if test2:                                    # test2 is a boolean expression
                   instruction(s)                       #  what to do is test2 is True
   else:                                                # if test is False ...
           instruction(s)                               # ... execute these instructions

Operators

To evaluate the conditions in the test, we can use a full list of tests

Operator Description
a == b Returns True if the two values are equal
a != b Returns True if the two values are different
a > b Returns True if a greater than b
a < b Returns True if a smaller than b
a >= b Returns True if a greater or equal b
a <= b Returns True if a smaller or equal b
a < b < c Returns True if a smaller than b and b smaller than c
a <= b <= c Returns True if a smaller/equal b and b smaller/equal than c

Exercice 1

  1. Open a text editor in which you will write the lines of code.
  2. Define a variable dna containing a DNA sequence of your choice
  3. Test if the length of the sequence is between 5 and 10: if not, display an error message “Length incorrect”
  4. if the length is correct, convert this sequence in upper-case using the appropriate method, and print it.

To test your script, you need to go to a bash console (beware: not the ipython console !!), and enter

python script.py

Does it work ? If so, try modifying the script:

  1. if the length is correct, count the number of A,C,G and T and the sum, and display the results

Does it work ? If so, try modifying the script even more:

  1. If the length is correct, test if the sequence contains only A,C,G,T: if yes, display the number of A,C,G,T; if not, display an error message “Non valid sequence”
  2. Save the script under sequence.py and execute it in the terminal.
Solution (click to open)
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
         # This script checks a DNA sequence

         # here we define the DNA sequence
         #dna = 'ag'

         dna = raw_input("Please give a sequence : ")

         # Turn the sequence into upper case
         dna = dna.upper()

         nA = dna.count('A')
         nC = dna.count('C')
         nG = dna.count('G')
         nT = dna.count('T')

         # Total number of A.,C,G,T
         nACGT = nA + nC + nG + nT
         l = len(dna)

         if (5 <= l <= 10):
              print " The length is correct !"
              if (l != nACGT):
                    print "Beware : The alphabet is incorrect !!"
              else:
                    print "There are :"
                    print "Number of A: ", nA, ", number of C:",nC, " Number of G:",nG," Number of T:",nT
         else:
                 print "The length is incorrect !"

Exercice 3 : For the more advanced: reading a fasta file and annotations

The goal is to translate an mRNA sequence read from a file, using the annotations about the location of the exons and coding sequences from an annotation file.

  • Download the files from here:
  • Goals:
    • define functions to read the files and get the sequence and annotation
    • define a function get_coding_exons to extract the exons from the fasta sequence
    • define a function translate_sequence to translate given dna sequence.