======================================= Part 4 : Loops and control structures ======================================= .. sidebar:: Goals of this session * learn how to use the basic control structures ``if..else`` * learn about different types of loops * `Official Python documentation `_ * `Good overview of python structures `_ * :download:`Vorlesung Erste Befehle mit Python <../documents/Vorlesung_programming101.pdf>` -------------- Why Python ? -------------- In the first part we have discussed **variables** that can store data and values, and extended this to **lists** and **dictionnaries**. A very common task in programming is to test the value of a variable, and decide what to do depending on this test. To perform such tests, we need *conditional structures* or ``if ... else ...`` structures. When we have an list of values, we also need to iterate over all values contained in the list: for example, iterate over all students and print the scores for each student stored in a list. To perform this, we need *loops* (*for* or *while* loops). ---- --------------------- For and while loops --------------------- Intro """"" It is often required to iterate instructions a certain number of times. There are two major types of loops, the ``for`` and the ``while`` loop, which are both implemented in the python language. For loops """"""""" The For loop can be used to repeat an instruction a certain number of times. It can also be used to iterate over all elements of a list, of a string, all keys of a dictionnary, lines of a file, etc... .. code-block:: python :linenos: # The variable v takes all values in the list for v in liste: # For each element of the list ... instruction(v) # ... do something with it # The variable v takes all values in the list # p contains the index of the element for p,v in enumerate(liste): # For each element of the list, retrieve its value (v) and index (p) instruction(v) # Do something with v instruction(p) # Do something with p In the following, we give some example of ``for`` loops; copy paste the instructions and answer the questions: .. code-block:: python :linenos: ################################################### ### Repeat an operation n times ### ################################################### n = 10 for i in range(n): print("Let's do it !") ################################################### ### Function range() ### ################################################### help(range) # Exercice 1 : Write a program which prints the multiples of 3 smaller than 20 ################################################### ### Iterate over the elements of a list ### ################################################### chroms = ["Chr1", "Chr2", "Chr3", "Chr4", "Chr5", "Chr6", "Chr7"] for c in chroms: print(c) for p,c in enumerate(chroms): print(p, c) ################################################### ### Iterate over the characters of a string ### ################################################### dna = 'ATGCTCGCTCGCTCGATGAAAATGTG' # go through all elements of the string for nuc in dna: print("The sequence contains "+ nuc) # Exercice 2: Go through all the characters of the string and print # "the sequence contains character x at position y" ################################################### ### Iterate through a dictionnary ### ################################################### # A dictionnary containing the user names as keys and # the passwords as values login2passwd = {"Tim":"abc123", "John":"qwerty", "Alice":"1234567"} # Print keys for key in login2passwd: print(key) # Print key and value # Use the method item() for key, value in login2passwd.items(): print("Value is " + value + " for key " + key) # Exercice 3: Use a loop to print the name of the users # which have a password containing the string "123" .. container:: exo 1. Start again with the array containing the number of birds:: number_of_birds = [28, 32, 1, 0, 10, 22, 30, 19, 145, 27, 36, 25, 9, 38, 21, 12, 122, 87, 36, 3, 0, 5, 55, 62, 98, 32, 900, 33, 14, 39, 56, 81, 29, 38, 1, 0, 143, 37, 98, 77, 92, 83, 34, 98, 40, 45, 51, 17, 22, 37, 48, 38, 91, 73, 54, 46, 102, 273, 600, 10, 11] 2. Use a ``for`` loop to iterate over all values in this list, and for each value print the message:: "At site xx : yy birds counted" .. container:: toggle .. container:: header **Solution** (click to open) .. code-block:: bash :linenos: number_of_birds = [28, 32, 1, 0, 10, 22, 30, 19, 145, 27, 36, 25, 9, 38, 21, 12, 122, 87, 36, 3, 0, 5, 55, 62, 98, 32, 900, 33, 14, 39, 56, 81, 29, 38, 1, 0, 143, 37, 98, 77, 92, 83, 34, 98, 40, 45, 51, 17, 22, 37, 48, 38, 91, 73, 54, 46, 102, 273, 600, 10, 11] for position,number in enumerate(number_of_birds): print "At position",position,": Number of birds =",number While loops """"""""""" .. code-block:: python :linenos: while condition: # while the condition is true ... statement(s) # ... execute these instructions example: while a < 10 : a=raw_input("Enter a number : ") .. code-block:: python :linenos: ######################################### ### An example of while loop ### ######################################### password=None # The variable is initialized to None # Do a while loop while password != "^^@@??))": ## As long a password is NOT ^^@@?? password=raw_input("Enter a password please:") ## Ask for the password print "Welcome" ## print "Welcome" if the password is right ! ## Example of infinite loop ... while True: print("This is True and will remain so for a while ...") .. container:: exo **Exercice 2 : extracting the codons** **Advice** : try to solve each step independently; make sure each step works, before trying to combine the different task. 1. Define (again...) a longer DNA sequence:: dna = "ttcacctatgaatggactgtccccaaagaagtaggacccactaatgcagatcctgtgtgtctagctaagatgtattattctgctgtggatcccactaaagatatattcactgggcttattgggccaatgaaaatatgcaagaaaggaagtttacatgcaaatgggagacagaaagatgtagacaaggaattctatttgtttcctacagtatttgatgagaatgagagtttactcctggaagataatattagaatgtttacaactgcacctgatcaggtggataaggaagatgaagactttcaggaatctaataaaatgcactccatgaatggattcatgtatgggaatcagccgggtctcactatgtgcaaaggagattcggtcgtgtggtacttattcagcgccggaaatgaggccgatgtacatggaatatacttttcaggaaacacatatctgtggagaggagaacggagagacacagcaaacctcttccctcaaacaagtcttacgctccacatgtggcctgacacagaggggacttttaatgttgaatgccttacaactgatcattacacaggcggcatgaagcaaaaatatactgtgaaccaatgcaggcggcagtctgaggattccaccttctacctgggagagaggacatactatatcgcagcagtggaggtggaatgggattattccccacaaagggagtgggattaggagctgcatcatttacaagagcagaatgtttcaaatgcatttttagataagggagagttttacataggctcaaagtacaagaaagttgtgtatcggcagtatactgatagcacattccgtgttccagtggagagaaaagctgaagaagaacatctgggaattctaggtccacaacttcatgcagatgttggagacaaagtcaaaattatctttaaaaacatggccacaaggccctactcaatacatgcccatggggtacaaacagagagttctacagttactccaacattaccaggtaaactctcacttacgtatggaaaatcccagaaagatctggagctggaacagaggattctgcttgtattccatgggcttattattcaactgtggatcaagttaaggacctctacagtggattaattggccccctgattgtttgtcgaagaccttacttgaaagtattcaatcccagaaggaagctggaatttgcccttctgtttctagtttttgatgagaatgaatcttggtacttagatgacaacatcaaaacatactctgatcaccccgagaaagtaaacaaagatgatgaggaattcatagaaagcaataaaatgcatgctattaatggaagaatgtttggaaacct" 2. Extract and print all codons (you can use a ``for`` or ``while`` loop...) 3. Test if the codon corresponds to a start codon (ATG) ; if yes, print its position 4. Define a dictionnary containing the correspondence between codons and amino acids:: map = {"TTT":"F", "TTC":"F", "TTA":"L", "TTG":"L", "TCT":"S", "TCC":"s", "TCA":"S", "TCG":"S", "TAT":"Y", "TAC":"Y", "TAA":"STOP", "TAG":"STOP", "TGT":"C", "TGC":"C", "TGA":"STOP", "TGG":"W", "CTT":"L", "CTC":"L", "CTA":"L", "CTG":"L", "CCT":"P", "CCC":"P", "CCA":"P", "CCG":"P", "CAT":"H", "CAC":"H", "CAA":"Q", "CAG":"Q", "CGT":"R", "CGC":"R", "CGA":"R", "CGG":"R", "ATT":"I", "ATC":"I", "ATA":"I", "ATG":"M", "ACT":"T", "ACC":"T", "ACA":"T", "ACG":"T", "AAT":"N", "AAC":"N", "AAA":"K", "AAG":"K", "AGT":"S", "AGC":"S", "AGA":"R", "AGG":"R", "GTT":"V", "GTC":"V", "GTA":"V", "GTG":"V", "GCT":"A", "GCC":"A", "GCA":"A", "GCG":"A", "GAT":"D", "GAC":"D", "GAA":"E", "GAG":"E", "GGT":"G", "GGC":"G", "GGA":"G", "GGG":"G"} 5. Use this dictionnary to translate the sequence (start from the first position; alternatively, start from the first atg) 6. Additional things to implement : * detect open reading frame (= ORF) in all 3 frames * print length of ORF .. container:: toggle .. container:: header **Solution** (click to open) .. code-block:: bash :linenos: dna = "ttcacctatgaatggactgtccccaaagaagtaggacccactaatgcagatcctgtgtgtctagctaagatgtattattctgctgtggatcccactaaagatatattcactgggcttattgggccaatgaaaatatgcaagaaaggaagtttacatgcaaatgggagacagaaagatgtagacaaggaattctatttgtttcctacagtatttgatgagaatgagagtttactcctggaagataatattagaatgtttacaactgcacctgatcaggtggataaggaagatgaagactttcaggaatctaataaaatgcactccatgaatggattcatgtatgggaatcagccgggtctcactatgtgcaaaggagattcggtcgtgtggtacttattcagcgccggaaatgaggccgatgtacatggaatatacttttcaggaaacacatatctgtggagaggagaacggagagacacagcaaacctcttccctcaaacaagtcttacgctccacatgtggcctgacacagaggggacttttaatgttgaatgccttacaactgatcattacacaggcggcatgaagcaaaaatatactgtgaaccaatgcaggcggcagtctgaggattccaccttctacctgggagagaggacatactatatcgcagcagtggaggtggaatgggattattccccacaaagggagtgggattaggagctgcatcatttacaagagcagaatgtttcaaatgcatttttagataagggagagttttacataggctcaaagtacaagaaagttgtgtatcggcagtatactgatagcacattccgtgttccagtggagagaaaagctgaagaagaacatctgggaattctaggtccacaacttcatgcagatgttggagacaaagtcaaaattatctttaaaaacatggccacaaggccctactcaatacatgcccatggggtacaaacagagagttctacagttactccaacattaccaggtaaactctcacttacgtatggaaaatcccagaaagatctggagctggaacagaggattctgcttgtattccatgggcttattattcaactgtggatcaagttaaggacctctacagtggattaattggccccctgattgtttgtcgaagaccttacttgaaagtattcaatcccagaaggaagctggaatttgcccttctgtttctagtttttgatgagaatgaatcttggtacttagatgacaacatcaaaacatactctgatcaccccgagaaagtaaacaaagatgatgaggaattcatagaaagcaataaaatgcatgctattaatggaagaatgtttggaaacct" dna = dna.upper() map = {"TTT":"F", "TTC":"F", "TTA":"L", "TTG":"L", "TCT":"S", "TCC":"s", "TCA":"S", "TCG":"S", "TAT":"Y", "TAC":"Y", "TAA":"STOP", "TAG":"STOP", "TGT":"C", "TGC":"C", "TGA":"STOP", "TGG":"W", "CTT":"L", "CTC":"L", "CTA":"L", "CTG":"L", "CCT":"P", "CCC":"P", "CCA":"P", "CCG":"P", "CAT":"H", "CAC":"H", "CAA":"Q", "CAG":"Q", "CGT":"R", "CGC":"R", "CGA":"R", "CGG":"R", "ATT":"I", "ATC":"I", "ATA":"I", "ATG":"M", "ACT":"T", "ACC":"T", "ACA":"T", "ACG":"T", "AAT":"N", "AAC":"N", "AAA":"K", "AAG":"K", "AGT":"S", "AGC":"S", "AGA":"R", "AGG":"R", "GTT":"V", "GTC":"V", "GTA":"V", "GTG":"V", "GCT":"A", "GCC":"A", "GCA":"A", "GCG":"A", "GAT":"D", "GAC":"D", "GAA":"E", "GAG":"E", "GGT":"G", "GGC":"G", "GGA":"G", "GGG":"G"} protein = "" for i in range(0,len(dna),3): codon = dna[i:i+3] if (len(codon) == 3): AA = map[codon] print "Codon=",codon,"-> Amino Acid=",AA protein = protein + AA print "The protein sequence is",protein in_orf = 0 # is 0 if outside an orf, and 1 if in an orf for i in range(0,len(dna),3): codon = dna[i:i+3] if (len(codon) == 3): if ((codon == 'ATG') and (in_orf == 0)): # checks if a start codon is encountered, and we are not in an ORF yet ! orf = map[codon] in_orf = 1 # mark that we have entered a new ORF elif ((codon in ['TAA','TAG','TGA']) and (in_orf == 1)): # if stop codon AND we are in an ORF print "New ORF : ",orf; # print the ORF orf = "" # initiate a new ORF in_orf = 0 # we are not in an ORF anymore elif (in_orf == 1): # we are in an ORF an add the new codon orf = orf + map[codon] Breaks and continue """"""""""""""""""" As in many other languages, the command ``break`` can be used to exit a loop. ``Continue`` can be used to jump to the next iteration without executing the block of instructions. .. code-block:: python :linenos: ########################## ### Break and continue ### ########################### ## Example 1 # The output will be # 0 1 2 3 4 for i in range(10): if i==5: break print(i) ## Example 2 # The output will be # 1 2 8 9 10 i = 0 while i < 10: i += 1 if 2 < i < 8: continue print(i) ------- --------------------------- Conditional structures --------------------------- Introduction """""""""""" With a conditional structure, one can test if a condition is true or false. Hence, we will evaluate a condition (e.g. a < b, len(a) = 10, etc...) and perform a different set of intructions depending on whether the test returned *True* or *False*. .. warnings:: Beware that in Python the line indentation is compulsory to define a **block** of intructions; all lines with the same indentation belong to the same block. A simple ``if`` structure : .. code-block:: python :linenos: if test: # 'test' is a boolean expression (e.g. a= 10,...) instruction(s) # a block of instructions to be executed if the test is true more instructions A ``if .. else..`` structure: .. code-block:: python :linenos: if test: # test is a boolean expression instruction(s) # what to do if the expression is True else: # alternative instruction(s) # what to do if the expression is False A ``if .. elif .. else`` structure : .. code-block:: python :linenos: if test1: # 'test1' is a boolean expression instruction(s) # what to do if test1 is True elif test2: # if test1 is False, check test2 instruction(s) # what to do if test2 is True elif test3: # if test2 is False, check test3 instruction(s) # what to do if test3 is True else: # if all tests are false ... instruction(s) # ... execute this instruction Nested if-conditions """""""""""""""""""" Tests for conditions can be nested in a hierarchical manner: .. code-block:: python :linenos: if test: # 'test' is a boolean expression instruction(s) # what to do if test is True if test2: # test2 is a boolean expression instruction(s) # what to do is test2 is True else: # if test is False ... instruction(s) # ... execute these instructions Operators """"""""" To evaluate the conditions in the test, we can use a full list of tests ============= ============================================================================ Operator Description ============= ============================================================================ a == b Returns ``True`` if the two values are equal a != b Returns ``True`` if the two values are different a > b Returns ``True`` if a greater than b a < b Returns ``True`` if a smaller than b a >= b Returns ``True`` if a greater or equal b a <= b Returns ``True`` if a smaller or equal b a < b < c Returns ``True`` if a smaller than b and b smaller than c a <= b <= c Returns ``True`` if a smaller/equal b and b smaller/equal than c ============= ============================================================================ .. container:: exo **Exercice 1** 1. Open a text editor in which you will write the lines of code. 2. Define a variable ``dna`` containing a DNA sequence of your choice 3. Test if the length of the sequence is between 5 and 10: if not, display an error message "Length incorrect" 4. if the length is correct, convert this sequence in upper-case using the appropriate method, and print it. .. .. container:: toggle .. container:: header **Solution** (click to open) .. code-block:: bash :linenos: dna = "accgattgcta" if (5 <= len(dna) <= 10): print "length ok !" else: print "Length is not correct !" To test your script, you need to go to a *bash* console (beware: not the ipython console !!), and enter .. code-block:: bash python script.py Does it work ? If so, try modifying the script: .. container:: exo 5. if the length is correct, count the number of A,C,G and T and the sum, and display the results Does it work ? If so, try modifying the script even more: .. container:: exo 6. If the length is correct, test if the sequence contains only A,C,G,T: if yes, display the number of A,C,G,T; if not, display an error message "Non valid sequence" 7. Save the script under ``sequence.py`` and execute it in the terminal. .. container:: toggle .. container:: header **Solution** (click to open) .. code-block:: bash :linenos: # This script checks a DNA sequence # here we define the DNA sequence #dna = 'ag' dna = raw_input("Please give a sequence : ") # Turn the sequence into upper case dna = dna.upper() nA = dna.count('A') nC = dna.count('C') nG = dna.count('G') nT = dna.count('T') # Total number of A.,C,G,T nACGT = nA + nC + nG + nT l = len(dna) if (5 <= l <= 10): print " The length is correct !" if (l != nACGT): print "Beware : The alphabet is incorrect !!" else: print "There are :" print "Number of A: ", nA, ", number of C:",nC, " Number of G:",nG," Number of T:",nT else: print "The length is incorrect !" ---- .. container:: exo **Exercice 3 : For the more advanced: reading a fasta file and annotations** The goal is to translate an mRNA sequence read from a file, using the annotations about the location of the exons and coding sequences from an annotation file. * Download the files from here: * :download:`fasta sequence of the gene <../documents/TBX2_dna.fa>` * :download:`Annotation file <../documents/TBX2_mRNA.gff>` The gff file contains the localization of the exons * Goals: * define functions to read the files and get the sequence and annotation * define a function ``get_coding_exons`` to extract the exons from the fasta sequence * define a function ``translate_sequence`` to translate given dna sequence.