Upload
joice-correia
View
27
Download
1
Embed Size (px)
DESCRIPTION
exercicio
Citation preview
1
Exerccios de Bioinformtica Prof. Maria Lucila Hernandez Macedo e Prof. Leandro E. C. Diniz
Exercicio 01
A partir da sequncia abaixo, analise e responda:
1 acatccgcgg caacgcctcc ttggtgtcgt ccgcttccaa taacccagct tgcgtcctgc
61 acacttgtgg cttccgtgca cacattaaca actcatggtt ctagctccca gtcgccaagc
121 gttgccaagg cgttgagaga tcatctggga agtcttttac ccagaattgc tttgattcag
181 gccagctggt ttttcctgcg gtgattcgga aattcgcgaa ttcctctggt cctcatccag
241 gtgcgcggga agcaggtgcc caggagagag gggataatga agattccatg ctgatgatcc
301 caaagattga acctgcagac caagcgcaaa gtagaaactg aaagtacact gctggcggat
361 cctacggaag ttatggaaaa ggcaaagcgc agagccacgc cgtagtgtgt gccgcccccc
421 ttgggatgga tgaaactgca gtcgcggcgt gggtaagagg aaccagctgc agagatcacc
481 ctgcccaaca cagactcggc aactccgcgg aagaccaggg tcctgggagt gactatgggc
541 ggtgagagct tgctcctgct ccagttgcgg tcatcatgac tacgcccgcc tcccgcagac
601 catgttccat gtttctttta ggtatatctt tggacttcct cccctgatcc ttgttctgtt
661 gccagtagca tcatctgatt gtgatattga aggtaaagat ggcaaacaat atgagagtgt
721 tctaatggtc agcatcgatc aattattgga cagcatgaaa gaaattggta gcaattgcct
781 gaataatgaa tttaactttt ttaaaagaca tatctgtgat gctaataagg aaggtatgtt
841 tttattccgt gctgctcgca agttgaggca atttcttaaa atgaatagca ctggtgattt
901 tgatctccac ttattaaaag tttcagaagg cacaacaata ctgttgaact gcactggcca
961 ggttaaagga agaaaaccag ctgccctggg tgaagcccaa ccaacaaaga gtttggaaga
1021 aaataaatct ttaaaggaac agaaaaaact gaatgacttg tgtttcctaa agagactatt
1081 acaagagata aaaacttgtt ggaataaaat tttgatgggc actaaagaac actgaaaaat
1141 atggagtggc aatatagaaa cacgaacttt agctgcatcc tccaagaatc tatctgctta
1201 tgcagttttt cagagtggaa tgcttcctag aagttactga atgcaccatg gtcaaaacgg
1261 attagggcat ttgagaaatg catattgtat tactagaaga tgaatacaaa caatggaaac
1321 tgaatgctcc agtcaacaaa ctatttctta tatatgtgaa catttatcaa tcagtataat
1381 tctgtactga tttttgtaag acaatccatg taaggtatca gttgcaataa tacttctcaa
1441 acctgtttaa atatttcaag acattaaatc tatgaagtat ataatggttt caaagattca
1501 aaattgacat tgctttactg tcaaaataat tttatggctc actatgaatc tattatactg
1561 tattaagagt gaaaattgtc ttcttctgtg ctggagatgt tttagagtta acaatgatat
1621 atggataatg ccggtgagaa taagagagtc ataaacctta agtaagcaac agcataacaa
1681 ggtccaagat acctaaaaga gatttcaaga gatttaatta atcatgaatg tgtaacacag
1741 tgccttcaat aaatggtata gcaaatgttt tgacatgaaa aaaggacaat ttcaaaaaaa
1801 taaaataaaa taaaaataaa ttcacctagt ctaaggatgc taaaccttag tactgagtta
1861 cattgtcatt tatatagatt ataacttgtc taaataagtt tgcaatttgg gagatatatt
1921 tttaagataa taatatatgt ttacctttta attaatgaaa tatctgtatt taattttgac
1981 actatatctg tatataaaat attttcatac agcattacaa attgcttact ttggaataca
2041 tttctccttt gataaaataa atgagctatg tattaacaaa aaaaaaaaaa aaaaaaaaaa
2101 aaaaaaaaaa aaaaaa
O que fazer?
Voc recebeu uma sequncia na direo 53 de um cDNA (cpia de RNA mensageiro) desconhecido. As perguntas so:
1. A sequncia corresponde a qual gene? 2. De que espcie? 3. Quais os 12 nucleotdeos do incio da traduo? 4. Qual a sequncia de aminocidos codificada pelo RNA? (apresentar como
exemplo abaixo)
2
Dicas:
1. Faa um BLAST da sequncia no site: http://www.ncbi.nlm.nih.gov/BLAST/
Entre em: Nucleotide-nucleotide BLAST (blastn)
Coloque a sequncia no box superior
Aperte o boto virtual:
Na prxima pgina, aperte o boto e aguarde (no aperte vrias vezes, pois o tempo ser ainda mais lento)...
Logo abaixo voc encontrar a sequencia de DNA mais parecida com aquela que voc iniciou a procura. Voc ver o nome e a espcie e um valor de similaridade
(E value), que inversamente proporcional semelhana das sequncias
(sequncias idnticas do E value igual a ZERO).
2. Para obter a sequncia de aminocidos, v ao site: http://us.expasy.org/tools/dna.html
O programa ir analisar todos os possveis ORF (Open Reading Frames).
Explicando: o ribossomo l o RNA de 3 em 3 letras, podendo comear pela
primeira letra da sequncia, pela segunda ou pela terceira. Dependendo em qual
posio ele comea, a sequncia de aminocidos obtida ser diferente. Preste
ateno nos sinais Met (abreviao de Metionina) e Stop (abreviao de cdon
de trmino da traduo). A sequncia da protena (se a sequncia que voc est
analisando for o cDNA inteiro, de ponta a ponta) comea sempre com uma
Metionina e acaba com um Stop. Quanto mais longa for a sequncia de
aminocidos desde uma Met at um Stop, maiores chances de que esse seja o
ORF certo (o polipeptdeo, ou protena, codificado pela sequncia em questo).
Outra observao: em alguns casos no se sabe se a sequncia de cDNA a fita + ou a fita , por isso o programa vai liberar informao de traduo nas duas direes da sequncia dada.
Voc encontrar outra forma de visualizar qual a ORF certa no site (a ORF o box verde clique nele para ver a sequncia):
http://www.ncbi.nlm.nih.gov/gorf/gorf.html
Tem uma homepage com um exerccio muito interessante, onde voc faz a transcrio e depois a traduo, experimente!
http://learn.genetics.utah.edu/content/begin/dna/transcribe/
3
Exerccio 2
Aps sequenciamento do DNA de uma amostra qualquer de microrganismo isolado do
solo foi obtida a sequncia abaixo. Identifique o organismo a qual essa sequncia
pertence. Caso essa sequncia seja codificante, a que tipo de protena estaria
relacionada.
ATGGCCATACTTTGCAGTACTGCATTGGCTCTGGGCGCATGCGGAAGTATGGGGAAAGCGGGCGGCCCGACAATGCGTTGTCTATAGAACAAACAACGCAGATAGACAGCGCAGACGGGATTGATGCCTCCAAACTGCTCTTTTCCTC
TTCCCAAGCCGTTGTTATCGCCGGTGATTCTGTGGGGCAGAAGTGGGAGGGCGCGAAAGCAGCGGTGAAGCGGGGCGCGCCGCTGCTGGTGCGCACTGCCGATAACGCGTCGGCCATTGATTCGGAGATAAAGCGCCTCGGGGCTCA
AGACGTTATTAAGATTGACGAGCCTCAGGCCCCGGACCCGGAAATTTCCGAGGCAACATTGCCGATAAAATCTC
GCAGCTCACGCCCGAATCCCCGCTTTTTAACGGCGGCGCGTCCATCCTGGTCTCCGGGCACACCACGGCCGCTGATGTAGCCACCGCACGCGCGTCGGGGGCCAATGTGGAGTACCTGTCTTCGGGCGATGCGCGTGAAAGCTCTGCG
CTATCCGCTGATCCCGACGCTCATGTGGTTGCCCTGGGTCCAAGTTTTGCCAACAAAGAACGCTTTAATCGCCAG
GTAGAGATGATTAGCCATGGTGAGGTCCCCGGTGGCGGGCATCTCATTTTCCCCTCGCATCGCGTGGTAGCTCTCTACGGTCATCCTTCCGGCGGGGCGCTGGGAGTGCTTGGCGAGCAACCTGCTGAGGAAGCCGTAAACAGGGTGA
ATGATTTAGTGGGTAAGTATCAGGCCATTGCACCGGAAGAGAGCATGATCCCCGCCTTTGAGATCATTGCTACC
GTGGCGAGCTCGTCAGCAGGGCCGGATGGCAATTATTCCAATGAGGGGAACGTTGATGAGCTGCGCCCGTGGGT
TGAAGCGATTGGTGATGCTGGAGGCATAGCGATTCTTGATCTACAACCTGGCAGCGCAAGCTTCCTTGAACAGG
CACAACAATTTGAGGAATTGCTGAAACTACCGCACGTCGGACTGGCGATAGATCCCGAGTGGCGGCTTAAGCCG
GGGGAGAAACCCATGGAGAGGGTCGGCAGTGTTGGGGCGGGGGAAGTGAACCAGACTGCTGCGTGGCTGCGGGACCTGGTAAAAGATAACGAGCTCCCGCAGAAAGTCTTTGTTGTGCACCAATTTCAGCATCAGATGGTGCAGAAC
AGGGAAACCTTGGACACCACGGCACCGGAACTTTCGTGGGTTCTTCACGCAGATGGCCACGGAACCGCGGGCG
ATAAGTTTGCCACGTGGGATATGGTGCGGAAGAATCTGCAGCCCGAGTTCTACCTTGCGTGGAAGAACTTTATCGATGAGGATCAGCCGATGTTCACCCCCGAGCAGACGTTTAAGATCGAGCCTCGGCCTTGGTTTGTGTCCTATCA
ATAA
Exerccio 3:
Faa uma digesto in silico do fragmento amplificado pelos primers Eub-8f e 1492r do
genoma do organismo identificado no exerccio acima, utilizado as enzimas de
restrio: HaeIII, HhaI, MseI, MspI e RsaI.
Exerccio 4:
As sequncias abaixo correspondem a quais organismos?
Caso elas correspondessem a um gene, qual seria a funo e estrutura da protena
codificadas por cada sequncia.
Sequncia 1
CGAAAAATAAGCCATAGTCGGCACCATAAGCATAACCTAGCTCTGCGATTATCTCTAACATAATTAACTT
AAGCAGCCGTATTTATAAAGAAATTTCCAAAATAAAGCGAATATTCTAGAATCCCAAAACAAACTGGTTG
TTGCGGTAGGTCATTTGTTTGGCAGAAAGAAAACTCGAGAAATTTCTCTGGCCGTTATTCTCTATTCGTT
TTGTGACTCTCCCTCTTTGTACTATTGCTCTCTCACTCTGTCACACAGTAAACGGCGCACTGTTCTCGTT
GCTTCGAGAGAGCGCGCCTCGAATGTTCGCGAAAAGAGCGCCGGAGTATAAATAGAGGAGCTTCGTCGAC
GGAGAGTCAATTCTATTCAAACAAGCAAAGTGAACACATCGCTAAGCGAAAGCTAAGCAAACAAACAAGC
GCAGCTGAACAAGCTAAACAATCTGCAATAAAGTGCAAGTTAAAGTGAATCAATTAAAAGTAACCAACAA
CCAAGTAATTAAACTAAAAACTGCAACTACTGAAATCAACCAAGAAGTAATTATTGAAGACAAGAAGAGA
ACTCTGAATACTTTCAACAAGTCGTTACCGAGGAAGAAGAACTCACACACAATGCCTGCTATTGGAATCG
ATCTGGGCACCACCTACTCCTGCGTGGGTGTCTACCAACATGGCAAGGTGGAGATTATCGCCAACGACCA
GGGCAACCGCACCACGCCGTCCTACGTGGCTTTCACAGATTCGGAACGCCTCATCGGCGATCCGGCTAAG
AACCAGGTGGCCATGAACCCCAGAAACACAGTGTTTGACGCCAAGCGACTGATCGGCCGAAAATACGACG
ACCCCAAGATCGCAGAGGACATGAAGCACTGGCCTTTCAAGGTTGTAAGCGACGGCGGAAAGCCCAAGAT
CGGGGTGGAGTATAAGGGTGAGTCCAAGAGATTTGCCCCCGAGGAGATCAGCTCGATGGTACTGACCAAG
4
ATGAAGGAGACGGCGGAGGCATATCTGGGCGAGAGCATCACAGACGCAGTCATCACAGTTCCAGCCTACT
TCAACGACTCCCAGCGCCAGGCTACCAAAGACGCCGGTCACATCGCCGGCCTGAATGTGCTCCGCATCAT
CAATGAGCCCACGGCGGCAGCACTGGCCTACGGACTGGACAAGAACCTCAAGGGTGAGCGCAATGTGCTT
ATCTTCGACTTGGGCGGCGGCACCTTCGATGTCTCCATCCTGACCATCGACGAGGGATCACTGTTCGAGG
TGCGCTCCACCGCCGGAGACACACACTTGGGCGGCGAGGACTTTGACAACCGGCTAGTCACTCATCTGGC
GGACGAGTTCAAGCGCAAGTACAAGAAGGATCTGCGCTCCAACCCTCGCGCCCTACGACGCCTCAGAACA
GCAGCTGAACGGGCCAAGCGCACACTCTCCTCCAGCACGGAGGCCACCATCGAGATTGACGCACTGTTTG
AGGGCCAAGACTTCTACACCAAAGTGAGCCGCGCCAGGTTTGAGGAGCTGTGCGCGGACCTCTTCCGCAA
CACCCTGCAGCCTGTGGAGAAGGCCCTCAACGATGCCAAGATGGATAAGGGTCAGATCCACGACATCGTG
CTCGTCGGCGGATCCACTCGCATTCCCAAGGTGCAAAGTCTGCTGCAGGACTTCTTCCACGGCAAGAACC
TCAACCTATCCATCAACCCAGACGAGGCAGTTGCATACGGAGCTGCTGTGCAGGCCGCTATCCTCAGCGG
AGACCAGAGCGGCAAGATCCAGGACGTGCTGCTGGTGGACGTGGCCCCACTTTCATTGGGAATTGAGACC
GCTGGAGGTGTAATGACCAAGCTGATCGAGCGCAACTGCCGCATTCCGTGCAAGCAGACTAAGACGTTCT
CCACATACGCGGACAACCAGCCCGGAGTCTCCATTCAGGTGTATGAGGGCGAACGTGCGATGACGAAGGA
CAACAATGCATTGGGCACCTTCGATCTGTCCGGCATTCCACCTGCACCAAGGGGTGTGCCCCAGATAGAA
GTTACCTTCGACTTGGACGCCAATGGAATCCTGAACGTCAGCGCCAAGGAGATGAGCACGGGCAAGGCCA
AGAACATCACGATCAAGAACGACAAGGGACGGCTCTCGCAGGCCGAGATTGATCGCATGGTGAACGAGGC
TGAAAAGTACGCCGACGAGGACGAGAAGCATCGCCAGCGAATAACCTCTAGAAATGCCCTGGAGAGCTAC
GTCTTCAATGTGAAGCAGGCCGTGGAACAGGCACCTGCTGGCAAATTGGACGAGGCTGACAAGAACTCCG
TCTTGGACAAGTGCAACGACACTATCCGGTGGCTGGACAGCAACACCACTGCCGAGAAGGAGGAGTTCGA
CCACAAGCTGGAGGAGCTCACCCGCCACTGCTCCCCCATCATGACCAAGATGCATCAGCAGGGTGCGGGA
GCTGGAGCTGGTGGTCCGGGAGCAAACTGCGGCCAGCAGGCGGGAGGATTTGGAGGCTACTCTGGACCCA
CGGTCGAGGAGGTCGACTAAGGCCAAAGAGTCTAATTTTTGTTCATCAATGGGTTATAACATATGGGTTA
TATTATAAGTTTGTTTTAAGTTTTTGAGACTGATAAGAATGTTTCGATCGAATATTCCATAGAACAACAA
TAGTATTACCTAATTACCAAGTCTTAATTTAGCAAAAATGTTATTGCTTATAGAAAAAATAAATTATTTA
TTTGAAATTTAAAGTCAACTTGTCATTTAATGTTTTGTAGACTTTTGAAAGTCTTACGATACAATTAGTA
TCTAATATACATGGGTTCATTCTACATTCTATATTAGTGATGATTTCTTTAGCTAGTAATACATTTTAAT
TATATTCGGCTTTGATGATTTTCTGATTTTTTCCGAACGGATTTTCGTAGACCCTTTCGATCTCATAATG
GCTCATTTTATTGCGATGGACGGTCAGGAGAGCTCCACTTTTGAATTTCTGTTCGCAGACACCGCATTTG
TAGCACATAGCCGGGACATCCGGTTTGGGGAGATTTTCCAGTCTCTGTTGCAATTGGTTTTCGGGAATGC
GTTGCAG
Sequncia 2
GGTTCCAATCCTGCCTCTGCCACTTCTCAGTTGTATGCCCCAACCCAACCTGTCTGGCTCTGTCCTCCTT
AACAGAAGGACGGCCCTGGCCACGGGCCACAGCCAGCAACGCTTAAGCACCAGGGCCGGCGAGTGCCCTG
CCGTGGCACGGCTCCAGCGTCGCGCTCTCGAATTCATTTGCTTTCCTTAACGAGAGAAGGTTCCAGATGA
GGGCTGAACCCTCTTCGCCCCGCCCACGGCCCCTGAACGCTGGGGGAGGAGTGCATGGGGAGGGGCGGCC
CTCAAACGGGTCATTGCCATTAATAGAGACCTCAAACACCGCCTGCTAAAAATACCCGACTGGAGGAGCA
TAAAAGCGCAGCCGAGCCCAGCGCCCCGCACTTTTCTGAGCAGACGTCCAGAGCAGAGTCAGCCAGCATG
ACCGAGCGCCGCGTCCCCTTCTCGCTCCTGCGGGGCCCCAGCTGGGACCCCTTCCGCGACTGGTACCCGC
ATAGCCGCCTCTTCGACCAGGCCTTCGGGCTGCCCCGGCTGCCGGAGGAGTGGTCGCAGTGGTTAGGCGG
CAGCAGCTGGCCAGGCTACGTGCGCCCCCTGCCCCCCGCCGCCATCGAGAGCCCCGCAGTGGCCGCGCCC
GCCTACAGCCGCGCGCTCAGCCGGCAACTCAGCAGCGGGGTCTCGGAGATCCGGCACACTGCGGACCGCT
GGCGCGTGTCCCTGGATGTCAACCACTTCGCCCCGGACGAGCTGACGGTCAAGACCAAGGATGGCGTGGT
GGAGATCACCGGTGAGCCCCCCTGCTCCTGCAGGGGAGAGGAGGAGGCTAGCAGGGCGGGCAGGGCCGGG
GGCGTGCGGTTGAAACGGGGGTCCCGGGGGCCTGGGGAGTTAAACGTTGGCCCAGCACCGGGAAAAACAG
GACTCCTGATTCCCTTGCTCAGGAATTGGGAGTGCGGGTCGCTTCTAAGGGCGCTTTCTGCTCTGTAATC
CCAGCGCTTTGGGAGGCCGAGACGGGAGGATCGCTTGAGGCCAGGAGTTCAAGACTAGCCTGGGCAACAT
AGCGAGACGCGCCCCCCCGCCCCGACCCCGCGCCATTACAAAAAAAAAGCAAACAAAAATTTTTTTAAAG
ATCATCGATGAAGAGAGAAAATGCGCTTTTCTACAGAGTCCCCTTCCCACCCACAGCCCCATCCCCAGAT
AAGCGGGGAGTTCCCTGGCGCGGTGCCAGTTTCTAGCCGCTGAGTGGGCGTGTGCGCGGCTCCAAGTGCG
CCTGCGTACTGCTCACTCCCCAGCTCCGCGCCCTGCTCCGTTCCTCCCAAAACTCTGAATCGAAGAACTT
TCCGGAAGTTTCTGAGAGCCCAGACCGGCGGGCACGCCCCCATCCCCAACCCCCTCTGTTAATCCCTACC
AGCCTGCAGTCCTGGCTGCTTCCAAGCAGGAGGTGGGGCCTCTGGCCTAGCGGGGCCGAAAGGCAGTCCC
CTCCCCCGCAGTCTGATTTCCCTCTTCCCCCCAAAGGCAAGCACGAGGAGCGGCAGGACGAGCATGGCTA
CATCTCCCGGTGCTTCACGCGGAAATACACGTGAGTCCTGGCGCCAGGTCGGGGTGGGTGGGTGGCGTGG
GGGTGGGGTCAGGGAAGAGGGCACAGGGACCCACCCGGTGTGTAATGTAACGCTTGCCTTTCCTCTCTGC
ACGTCCAGGCTGCCCCCCGGTGTGGACCCCACCCAAGTTTCCTCCTCCCTGTCCCCTGAGGGCACACTGA
5
CCGTGGAGGCCCCCATGCCCAAGCTAGCCACGCAGTCCAACGAGATCACCATCCCAGTCACCTTCGAGTC
GCGGGCCCAGCTTGGGGGCCCAGAAGCTGCAAAATCCGATGAGACTGCCGCCAAGTAAAGCCTTAGCCCG
GATGCCCACCCCTGCTGCCGCCACTGGCTGTGCCTCCCCCGCCACCTGTGTGTTCTTTTGATACATTTAT
CTTCTGTTTTTCTCAAATAAAGTTCAAAGCAACCACCTGTCACTGGCCCAGGCCCTGGTGTTTGTGGAAG
GAAGCCTCAGGCACCTGCCATTTGCTGGCTTTCAGGAGTCATCTTTGCTCAGGCCCGTGCTGGGCCATGT
GGGTACACTGGTGTAGGTTGCTGGACACAGGCTGACTCACATCCATAAAGACAGAGGTCTTAGGGCCGGG
CGCAGTGGCTCATACCTACAATCCCAGCACTTTGGGGGGTTGAAGCAGGAGGAGTGCTTGAAGCCAAGAG
TTCTAGACCAGCCTGGACAACA
Sequncia 3
AAACTTTCTGCGTCCGCCATCCTGTAGGAAGGATTTGTACACTTTAAACTCCCTCCCTGGTCTGAGTCCC
ACACTCTCACCACCCAGCACCTTCAGGAGCTGACCCTTAACAGCTTCACCCACAGGGACCCCGAAGTTGC
GTCGCCTCCGCAACAGTGTCAATAGCAGCACCAGCACTTCCCCACACCCTCCCCCTCAGGAATCCGTACT
CTCTAGCGAACCCCAGAAACCTCTGGAGAGTTCTGGACAAGGGCGGAACCCACAACTCCGATTACTCAAG
GGAGGCGGGGAAGCTCCACCAGACGCGAAACTGCTGGAAGATTCCTGGCCCCAAGGCCTCCTCCGGCTCG
CTGATTGGCCCAGCGGAGAGTGGGCGGGGCCGGTGAAGACTCCTTAAAGGCGCAGGGCGGCGAGCAGGGC
ACCAGACGCTGACAGCTACTCAGAATCAAATCTGGTTCCATCCAGAGACAAGCGAAGACAAGAGAAGCAG
AGCGAGCGGCGCGTTCCCGATCCTCGGCCAGGACCAGCCTTCCCCAGAGCATCCACGCCGCGGAGCGCAA
CCTTCCCAGGAGCATCCCTGCCGCGGAGCGCAACTTTCCCCGGAGCATCCACGCCGCGGAGCGCAGCCTT
CCAGAAGCAGAGCGCGGCGCCATGGCCAAGAACACGGCGATCGGCATCGACCTGGGCACCACCTACTCGT
GCGTGGGCGTGTTCCAGCACGGCAAGGTGGAGATCATCGCCAACGACCAGGGCAACCGCACGACCCCCAG
CTACGTGGCCTTCACCGACACCGAGCGCCTCATCGGGGACGCCGCCAAGAACCAGGTGGCGCTGAACCCG
CAGAACACCGTGTTCGACGCGAAGCGGCTGATCGGCCGCAAGTTCGGCGATGCGGTGGTGCAGTCCGACA
TGAAGCACTGGCCCTTCCAGGTGGTGAACGACGGCGACAAGCCCAAGGTGCAGGTGAACTACAAGGGCGA
GAGCCGGTCGTTCTTCCCGGAGGAGATCTCGTCCATGGTGCTGACGAAGATGAAGGAGATCGCTGAGGCG
TACCTGGGCCACCCGGTGACCAACGCGGTGATCACGGTGCCCGCCTACTTCAACGACTCTCAGCGGCAGG
CCACCAAGGACGCGGGCGTGATCGCCGGTCTAAACGTGCTGCGGATCATCAACGAGCCCACGGCGGCCGC
CATCGCCTACGGGCTGGACCGGACCGGCAAGGGCGAGCGCAACGTGCTCATCTTCGACCTGGGGGGCGGC
ACGTTCGACGTGTCCATCCTGACGATCGACGACGGCATCTTCGAGGTGAAGGCCACGGCGGGCGACACGC
ACCTGGGAGGGGAGGACTTCGACAACCGGCTGGTGAGCCACTTCGTGGAGGAGTTCAAGAGGAAGCACAA
GAAGGACATCAGCCAGAACAAGCGCGCGGTGCGGCGGCTGCGCACGGCGTGTGAGAGGGCCAAGAGGACG
CTGTCGTCCAGCACCCAGGCCAGCCTGGAGATCGACTCTCTGTTCGAGGGCATCGACTTCTACACATCCA
TCACGCGGGCGCGGTTCGAAGAGCTGTGCTCGGACCTGTTCCGCGGCACGCTGGAGCCCGTGGAGAAGGC
CCTGCGCGACGCCAAGATGGACAAGGCGCAGATCCACGACCTGGTGCTGGTGGGCGGCTCGACGCGCATC
CCCAAGGTGCAGAAGCTGCTGCAGGACTTCTTCAACGGGCGCGACCTGAACAAGAGCATCAACCCGGACG
AGGCGGTGGCCTACGGGGCGGCGGTGCAGGCGGCCATCCTGATGGGGGACAAGTCGGAGAACGTGCAGGA
CCTGCTGCTGCTGGACGTGGCGCCGCTGTCGCTGGGCCTGGAGACTGCGGGCGGCGTGATGACGGCGCTC
ATCAAGCGCAACTCCACCATCCCCACCAAGCAGACGCAGACCTTCACCACCTACTCGGACAACCAGCCCG
GGGTGCTGATCCAGGTGTACGAGGGCGAGAGGGCCATGACGCGCGACAACAACCTGCTGGGGCGCTTCGA
GCTGAGCGGCATCCCGCCGGCGCCCAGGGGCGTGCCGCAGATCGAGGTGACCTTCGACATCGACGCCAAC
GGCATCCTGAACGTCACGGCCACCGACAAGAGCACCGGCAAGGCCAACAAGATCACCATCACCAACGACA
AGGGCCGCCTGAGCAAGGAGGAGATCGAGCGCATGGTGCAGGAGGCCGAGCGCTACAAGGCCGAGGACGA
GGTGCAGCGCGACAGGGTGGCCGCCAAGAACGCGCTCGAGTCCTATGCCTTCAACATGAAGAGCGCCGTG
GAGGACGAGGGTCTCAAGGGCAAGCTCAGCGAGGCTGACAAGAAGAAGGTGCTGGACAAGTGCCAGGAGG
TCATCTCCTGGCTGGACTCCAACACGCTGGCCGACAAGGAGGAGTTCGTGCACAAGCGGGAGGAGCTGGA
GCGGGTGTGCAGCCCCATCATCAGTGGGCTGTACCAGGGTGCGGGTGCTCCTGGGGCTGGGGGCTTCGGG
GCCCAGGCGCCGCCGAAAGGAGCCTCTGGCTCAGGACCCACCATCGAGGAGGTGGATTAGAGGCCTCTGC
TGGCTCTCCCGGTGTGGTCTAGAAAACAGACTCTTTGCACTTGATAGCTGCTTGGGCACCGATTACTGTC
AAGGTTATTTAAAGTCTTCTTCATGGTTCAGTTTAAAGTTACAGTCTTTCTTAAGGTAATTGCGTTGACT
GTTAAATTTTGTATGCATATATATATATATATATATATATATATATATATATTCAAATATATTCAAAGTA
ATGTTGGGAGCAGCACTGTGCACTGTACCAGGGGATTATGTTTTATAGCTAATGATGTGTAAAGTCTAAA
GATTTTTTTGTAATTTTTATATCAGTGTTCCAGTAGCCTGGGAAGACATATAGTCTAGCTGCCCAGTTCC
CTGGAGATGGTCATCTCTAAGACAAAGTGTCTTAAACAAACGTCTTGGCACTGTGTACTACATAACTTTA
CTCTTTTGTACTTAAAACTTTATCTGCTTGTCCATGTTAAGGTTTTGTGGTATAACCAGTATGTTCTTTG
CATTTAATCTAAGTAGGTTAAAGATGGTGTATCCTTCCTGCATACATGTCTACACTGCCACCCTGTGTAC
ATTTTTTTCTTTGCATCACTACAAACTAATGAAAAAAACTTTTATGACTTAAATATTCAAAATAAAAGGT
TACAAGTATATTTTGTCTGTTTGTATGTTGGAAGGGCTAATGGATTCTGGGCTTCTGTGGATTTCTTAAG
TTTTTTTTAAGATTTATTATTATATGTGAACACATTGTAGCTATCTTCAGACACACCAGAAAAGGGCATC
AGATCTCATTACAGATGGGTGTGAGCCACCATGTGGTTCCTGGGATTTGAACTCAGGACCTTCGGAAGAG
6
TAGTCAGTGCTCTTAACTGCTGAGCTGTCTCTCCAGCCCCCGGATTTCTTAGTTTTGTGATAACTGGAAA
AGGGATTTTTTTTTGGTGGATTTTCAGTGCAGTTATGCAGGGAGTACAGGTATTTACTTTGAGGGTCGGG
CTCATTCATGGGAAAAAGTAGGGTGGTGCTGTTGTTTGGGGTCAGTGGAAGAGGGACTCAAGGGGCTATG
AGAGCTCAGCTC
Exerccio 5: exerccios de alinhamento A partir das sequncias abaixo, responda:
>seq 1
1 tgtgttcact agcaacctca aacagacacc atggtgcacc tgactcctga ggagaagtct
61 gccgttactg ccctgtgggg caaggtgaac gtggatgaag ttggtggtga ggccctgggc
121 aggctgctgg tggtctaccc ttggacccag aggttccttg agtcctttgg ggatctgtcc
181 actcctgatg ctgttatggg caaccctaag gtgaaggctc atggcaagaa agtgctcggt
241 gcctttagtg atggcctggc tcacctggac aacctcaagg gcacctttgc cacactgagt
301 gagctgcact gtgacaagct gcacgtggat cctgagaact tcaggctcct gggcaacgtg
361 ctggtctgtg tgctggccca tcactttggc aaagaattca ccccaccagt gcaggctgcc
421 tatcagaaag tggtggctgg tgtggctaat gccctggccc acaagtatca ctaagctcgc
481 tttcttgctg tccaatttct attaaaggtt cctttgttcc ctaagtccaa ctactaaact
541 gggggatatt atgaagggcc tt
>seq 2
1 aacgtggatg aagttggtgg tgaggccctg ggcaggctgc tggtggtcta cccttggacc
61 cagaggttct ttgagtcctt tggggatctg tccactcctg atgctgttat gggcaaccct
121 aaggtgaagg ctcatggcaa gaaagtgctc ggtgccttta gtgatggcct ggctcacctg
181 gacaacctca agggcacctt tgccacactg agtgagctgc actgtgacaa gctgcacgtg
241 gatcctgaga acttcaggct cctgggcaac gtgctggtct gtgtgctgga ccatcacttt
301 ggcaaagaat tcaccccacc agtgcaggct gcctatcaga aagtggtggc tggtgtggct
361 aatgccctgg cccacaagta tcactaagct cgctttcttg ctgtccaatt tctattaaag
421 gttcctttgt tccctaagtc caactactaa actgggggat attatgaagg gccttgagca
481 tctggatt
1. Faa o alinhamento das sequncias acima. 2. Classifique o tipo de alinhamento realizado? 3. Quais foram as posies que mostraram diferenas entre as duas sequncias? 4. Obtenha a sequncia protica dos genes. 5. H diferenas na regio que codifica a protena?
Exerccio 6:
Utilizando a sequncia abaixo:
ctactggtacttcgatctctggggccgtggcaccctggtcactgtctcctcagagtcttctctgtccaggcacc
1. Identifique o gene e a qual organismo ele pertence. 2. Obtenha a sequncia completa do gene no NCBI. 3. Alinhe as duas sequncias utilizando o programa CLUSTALW e verifique se houve
mismatches e gaps no alinhamento. 4. H alguma alterao na regio que codifica a protena?
7
Exerccio 7:
Defina os seguintes itens e d exemplos de aplicao:
a) Comparao de sequncias; b) Alinhamento mltiplo; c) Primer; d) Busca booleana; e) Via metablica; f) Patente; g) Domnio.
Exerccio 8:
Execute os seguintes passos e registre (print screen) os resultados de cada etapa
marcada com *:
a) Escolha um gene com 100 ou mais aas e inclua sua sequncia em formato fasta na resposta;
b) Execute BlastP (NCBI BLAST) e registre: a tela de configurao da busca; a lista de resultados (aps o grfico) e o segundo e o terceiro alinhamento;
c) Interprete sucintamente o resultado (ignorando o primeiro alinhamento), confirmando ou no a identidade da protena e justificando a resposta;
d) Execute um alinhamento mltiplo com sete (7) protenas prximas entre si, de organismos diferentes, e registre o resultado do alinhamento. Comente o resultado;
e) Desenhe um par de primers para amplificar uma regio de 190 a 238 bp no gene desta protena, e registre configurao da ferramenta e as informaes dos primers
escolhidos;
f) Verifique se existem patentes relacionadas a esta protena, e descreva: g) Detalhes da busca executada; h) Nmero de patentes encontradas; i) Comente sucintamente uma destas patentes. j) Identifique o EC number desta protena; k) Classifique a protena segundo as ontologias do Gene Ontology.
Exerccio 9:
A sequencia abaixo est indicada em duas verses. A primeira sem edio (contendo
ntrons e xons) e a segunda editada (spliced). A partir de uma destas sequncias
indique:
a) Local onde comea e termina o gene.
b) Aps esta identificao da linha a, indique tambm a posio do gene completo no
clone BAC de 79.629bp anexo (ex. +50bp a + 350bp, ou -1800bp a -1350bp).
c) Em qual sentido esta o gene. Fita + ou fita -?
d) Marque no gene completo, quem so os ntrons e os xons.
e) Utilizando o clone BAC, e a localizao do gene feito no item b, identifique e
selecione a regio de 2 kb upstream ao gene. Qual a funo desta regio de 2kb
upstream ao gene?
8
f) A partir da seleo da regio upstream, monte a sequncia completa entre regio de
2kb e gene. Monte tanto com o gene completo quanto com o gene editado.
g) Identifique se h regies promotoras em cada uma das sequencias (gene + upstream).
E se houver, indique suas posies na sequencia (use cores).
h) A partir das duas sequncias de genes, se tivssemos que desenhar 1 par de primers
para ser usado em PCR tempo real, qual seria a melhor estratgia para desenhar estes
primers? Justifique? E como fazer para saber se o cDNA usado no tempo real no esta
contaminado por gDNA?
> Gene completo CGTGGCTGATGTTGCCAGTTGAAATCGGTAAAGCAACTGCATCGCTGTATCCTGACGAGAATAATGATAC
ATCTCAAAATAATGTTATAAGTGCATTAGCCCCTCCGAAGGATGTGGATGATCTGCGGCTGATATCTGGA
TATGGAAATGTCAATATATTCACGTATAGTGAATTGAGAGCTGCTACCAAGAATTTCCGGCCAGATCAGG
TTCTTGGAGAGGGTGGCTTTGGGGTTGTATATAAAGGTGTTATTGATGAGAGTGTCAGGCCAGGTTCTGA
AACCATCCAAGTTGCTGTGAAAGAGCTAAAGTCAGATGGCTTGCAGGGAGACAAAGAGTGGCTGGTAATT
TCTCCCTTGTACTTTTTCAATATTTCTAAGCCAAATTTGTGATATTTTTTCTATCAGCATGAAATTTTCT
TCACTGCACCATGTATGAGTTGGCAATTGTAGTGAAAAACAAGTTGTCTGAAAAATGTTGTGGAGATTAC
ACCTAAGGTCTTATCTATTTCATCCTCTTGTATGTAGTTGAGTTAATGCTGTAAACACCCTCAACTTATA
TAAAATGAAGTCACCTACACAACAAACGGTTGCATTCTGCAGATCATGATTATGTTATTTAATGGCAATA
TCACTGATTGTATAAGTTGGAAACACTTTTATTGTTCTTTGGGAAATATATATTACATATTTGCTCTTCT
TTGACTTTAGGCTTAAAGGGATGTTCATCCCTAAAAGCAGATATCTAAAGATGTATGGGAATATGAAATT
TCTTCAAGTAACCACGGTATCTTCCAAGATGTAATTTATGTTACCTTATTTAGATATCTTATGGAAATCC
AGAGAGGCATTACTGTCAAGAGAAATCGTTGTCGGTCCCCCTTTGGCTCCATCTCAACTTAAAATTCAGG
AGTACATGACATGTGTAGTAAACTAATACCAAGTTTTGTCTTGGTCAGCTCCTCTATTATAAGATATATC
CATATATCAGTTTATTTGGAAGTCCAAGATGCTTAGTAACCTGTGACAAAACCCTATAATAGAAATTTGG
ACAGCCATATGATTACCAAAGATGGATTCCTTTTCCAATACCTCTGTGATCCATCTAGAATACTAATTAA
ATGAGTTACTTTTTGGGTTTTTCAGGCAGAAGTAAACTATCTGGGGCAACTTAGTCATCCCAATCTTGTC
AAACTTATTGGTTACTGCTGTGAAGGTGATCACAGGCTGCTAGTTTATGAGTATATGGCTTCTGGCAGCC
TTGACAAGCACCTATTCCGACGTAAGTACTTCTCTGACACGAATATCATTACAATACTTAACTAGTTATT
GTATAAAAATATTAATATCTGAGTTTGTGGAATTTAGTAAATGGTCATTTGTGGCCATCTATCCATTATA
ACTTCATTTTTCGTTTTTGACAACAGATAAATAATTTTTTAATCTACTATGAAGAAGGAAGACTTTATTA
CTAGAAACTAGATATTGCTGGGATATATATACCTTTTGTATATATGCATGGATTTGTTTATTATGCCTTT
GTATAACAAGAGCAATTAATTGTAACTACTAACTATCTTCATCATATCATTCCTTCCTGCCACTGGGCTC
TGTTAAGAGCAATGTTAGTAGTTTATAGTTGTTTATGGACTCTTTCCCCTGGAACCACTAATCTGGAATA
ACCTACTTATACATCTAGATATCTTTATGTATGAAAGAATTATGAAAAAGATCTCATCTATCCTCAAGTT
TTTTCATATATTCAAATAAATTTTATCTTATATTTTTTAATAATAACATAAATCTATCTTGTTATCCTCA
TATGTGGCTAACAATTAATAATAATTAATCCTTTATCCTTCCTAGTTACTAAGTGACAAGTAAACCGGTG
GGTAAGTAGTAAGCTAGGGGCAAAATTAGTAAGTATTGGGAATTGTACATGATGAGTTGTTGCCTTTTGC
TTATGAGCACAATTGCCCCTATCTAATTTCCCATATAATAAACATAAGTTTCTAAAATATACAATGATTA
AGTATAAGACAATACACTAAAAACAACTTTGCAACTTGGTAACTTTCTATATTCATTGTGTGTGTCTGAA
ATATTTCTACCTTCTTGACAGGGGTTTGTCTTACAATGCCATGGTCTACTCGAATGAAAATTGCCCTTGG
TGCTGCAAAAGGACTAGCCTTTCTTCATGCAGCTGAAAGATCAATCATCTATCGTGACTTCAAGACATCA
AATATCTTACTGGATGAAGTTTGTATCTCTTCCTGCACCATTGGACTGGATATTTAAATTCCCTAGAACT
ATCTCCATTATCATTATCATATTTAGCAAATACAACTCGGTTACAGATTATTAAGTCCCTATATATATTT
GTAGTCCATAAATTTTATGGCCATAACATTGGCAAAATTAGATAAATTTGTTATATGGTTAAGAGCACAA
TTACTTACATAAAAAAGATTTCATTCAGTGTGAACTCATCAATTTCTAAATAATGAGTCTGTTCCATTAA
AAAAAAATGCATACTTATTATTTGAAAAAGAAAATTGCAAATGTCCAGTATGTGAGCAACAAAAGTGGTT
ACTGAATCAATGAAAACAAGTAACTAAGGAACTCCATCGTATAATAATATTAAGGATACCCTTTTGAAGC
ATGCCCATACTGTGAAAGGTCATTTATATGTTTTTCATAACCTGAAAATATAGAAATCATGAAAACATAG
TTGTTTCTCATCAATTGTCTAATTCTTGGATTCACTTTGCAGGATTACAATGCAAAGCTCTCAGACTTTG
GCCTTGCAAAAGAGGGGCCTACAGGTGACCAAACTCACGTTTCCACTCGGGTCGTTGGTACATATGGATA
TGCAGCTCCCGAGTATATAATGACTGGTGAGCTTCTCAAACCACAATCCCAACTTTATGCAAAAGGGAGT
GCAGATAATTAAGCGCTACTAGCCACTTTCTATCGAGTGTCTCAAATGTGTAGCGATTACATGTCGTTTT
GTATATTTCTCTAGATGCCTTGCAGTAGGTGACTCTTTCTGGTTCTCTATTCCTTTTCTCTAAAGAAAAC
CCATGTTTCTGAGCAACTTTACCTCTATTTTAGGGCGTATAGCTTAAATTAATAACTTCTTTCATTTATT
CCAGGCCATTTAACTGCAAGGAGTGATGTTTACGGATTTGGAGTTGTATTGCTGGAGATGCTTTTAGGGA
GAAGGGCAATGGACAAGAGCAGGCCCAGCAGACACCAGAACCTCGTTGAGTGGGCTCGACCACTCCTGAT
CAATGGTCGGAAGTTGCTAAAGATCTTGGATCCAAGAATGGAAGGGCAATATTCTAACAGAGTTGCAACA
GATGTGGCTAGTTTAGCATATCGATGCCTGAGCCAGAACCCGAAAGGGAGGCCAACAATGAACCAAGTAG
9
TCGAGTCGCTTGAGAGCCTTCAAGACCTGCCTGAGAACTGGGAAGGCATCCTGTTTCAGAGCAGTGAAGC
TGCTGTGACTCTCTATGAGGCTCCAAAAGAGATTGCGAGTGACCATTTAGAAAAGAACTCCAGCGAGAAT
GGAGAGAATGGATCAAATGTGCATGCCAAGGGAAGAAAGAAGCTTGGAAATGGCAGAAGCAACAGCGAGC
CACCGCCGGTGGAGTTCAGTCAGTACAGTCCTTCACCTGAGTCAGAGAGACATGAGCCAAGTAGAAGATC
AATCGATCATGACAGAATTCCAAGGCCACCTGCCTATTGACGTGGCTG
> gene editado CGTGGCTGATGTTGCCAGTTGAAATCGGTAAAGCAACTGCATCGCTGTATCCTGACGAGAATAATGATAC
ATCTCAAAATAATGTTATAAGTGCATTAGCCCCTCCGAAGGATGTGGATGATCTGCGGCTGATATCTGGA
TATGGAAATGTCAATATATTCACGTATAGTGAATTGAGAGCTGCTACCAAGAATTTCCGGCCAGATCAGG
TTCTTGGAGAGGGTGGCTTTGGGGTTGTATATAAAGGTGTTATTGATGAGAGTGTCAGGCCAGGTTCTGA
AACCATCCAAGTTGCTGTGAAAGAGCTAAAGTCAGATGGCTTGCAGGGAGACAAAGAGTGGCTGGCAGAA
GTAAACTATCTGGGGCAACTTAGTCATCCCAATCTTGTCAAACTTATTGGTTACTGCTGTGAAGGTGATC
ACAGGCTGCTAGTTTATGAGTATATGGCTTCTGGCAGCCTTGACAAGCACCTATTCCGACGGGTTTGTCT
TACAATGCCATGGTCTACTCGAATGAAAATTGCCCTTGGTGCTGCAAAAGGACTAGCCTTTCTTCATGCA
GCTGAAAGATCAATCATCTATCGTGACTTCAAGACATCAAATATCTTACTGGATGAAGATTACAATGCAA
AGCTCTCAGACTTTGGCCTTGCAAAAGAGGGGCCTACAGGTGACCAAACTCACGTTTCCACTCGGGTCGT
TGGTACATATGGATATGCAGCTCCCGAGTATATAATGACTGGCCATTTAACTGCAAGGAGTGATGTTTAC
GGATTTGGAGTTGTATTGCTGGAGATGCTTTTAGGGAGAAGGGCAATGGACAAGAGCAGGCCCAGCAGAC
ACCAGAACCTCGTTGAGTGGGCTCGACCACTCCTGATCAATGGTCGGAAGTTGCTAAAGATCTTGGATCC
AAGAATGGAAGGGCAATATTCTAACAGAGTTGCAACAGATGTGGCTAGTTTAGCATATCGATGCCTGAGC
CAGAACCCGAAAGGGAGGCCAACAATGAACCAAGTAGTCGAGTCGCTTGAGAGCCTTCAAGACCTGCCTG
AGAACTGGGAAGGCATCCTGTTTCAGAGCAGTGAAGCTGCTGTGACTCTCTATGAGGCTCCAAAAGAGAT
TGCGAGTGACCATTTAGAAAAGAACTCCAGCGAGAATGGAGAGAATGGATCAAATGTGCATGCCAAGGGA
AGAAAGAAGCTTGGAAATGGCAGAAGCAACAGCGAGCCACCGCCGGTGGAGTTCAGTCAGTACAGTCCTT
CACCTGAGTCAGAGAGACATGAGCCAAGTAGAAGATCAATCGATCATGACAGAATTCCAAGGCCACCTGC
CTATTGACGTGGCTG