BIOINFORMÁTICA UFMG
A TGC
BIOINFORMÁTICA UFMG
A TGC Genômica e Bioinformática
ESTs mesmo que redundantesGenoma completo ou morte!
19952000
BIOINFORMÁTICA UFMG
A TGC O fim de uma EST
(A)200
AUG
(A)200(T)18cDNA (fita -)
AUG(A)18
cDNA (fita +)
cDNA (fita +)
(T)18cDNA (fita -)
(A)18
ATGATCATGACTTACGGGCGCGCGATxxxxxx
GGCGCGCGATATCCxxxx
AAATTTATTATCCxxxxx
3’EST
3’EST5’EST
5’EST
AAATTTATTATCCATCTACGxxxx
Uma foto de um novo transcriptoma [otorrin...] [...damonh...]
start
end
BIOINFORMÁTICA UFMG
A TGC Vida depois de PHRED 15
Query: 469 TTAGGAGGATCGTTTTTAGAATCCCCTGCAACGTTACCACGGTGGATTTCACTGACTGCG 528 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||Sbjct: 1038 ttaggaggatcgtttttagaatcccctgcaacgttaccacggtggatttcactgactgcg 979
Query: 529 ACGTTCTTAACGTTGAATCCAACGTTGCTACCAgggagagcctcagtaagtgcttcatga 588 ||||||||||||||||| || |||||||||||||||||| ||||||||||||||||||||Sbjct: 978 acgttcttaacgttgaagcccacgttgctaccagggagaccctcagtaagtgcttcatga 919
Query: 589 tgcatttcgacagaattgacttcagtcgacaaaccttgcggagcaaaagtgacgaccata 648 |||||||||||||| |||||||||| |||| ||||||||||| |||||||||||||||||Sbjct: 918 tgcatttcgacagacttgacttcagccgaccaaccttgcggaccaaaagtgacgaccata 859
Query: 649 ccaggcttgatgataccagtttcaacgc 676 ||||||||||||||||||||||||||||Sbjct: 858 ccaggcttgatgataccagtttcaacgc 831
Query: non trimmed read. Subj: published sequence
BIOINFORMÁTICA UFMG
A TGC When PHRED meets BLAST
pUC18 (published sequence)
Sequencing reaction: single pool distributed over 3 96-well plates3 MegaBACE3 reads each - 846 reads total
Processing:MegaBLAST (BLASTn, SWAT)Phred– trim: a chromatogram analyzer– trim_alt: increasing trim_cutoff from 1% up to 25%
BIOINFORMÁTICA UFMG
A TGC O fim de uma EST
-500
-400
-300
-200
-100
0
100
200
1% 2% 3% 4% 5% 6% 7% 8% 9% 10% 11% 12% 13% 14% 15% 16% 17% 18% 19% 20% 21% 22% 23% 24% 25%
Trim_cutoff parameter value(%)
Num
ber o
f bas
es
Included (trim) Discarded (trim) Included (TrimAlt) Discarded(TrimAlt)
PHRED 10 (10% error): only losses
BIOINFORMÁTICA UFMG
A TGC
BIOINFORMÁTICA UFMG
A TGC
0,00%
5,00%
10,00%
15,00%
20,00%
25,00%
30,00%
1% 2% 3% 4% 5% 6% 7% 8% 9% 10% 11% 12% 13% 14% 15% 16% 17% 18% 19% 20% 21% 22% 23% 24% 25%
total miscall stepwise miscall
16% 17%
Trimmed reads
% errorin sequence
Added bases
3%
% errorin the tip
Error occurrence:
BIOINFORMÁTICA UFMG
A TGC Virtual pUC18 protein: STOP = *
>protein_puc18RQGFPSHDVVKRRPVPSLHACRSTLEDPRVPSSNS*SWS*LFPV*NCYPLTIPHNIRAGSIKCKAWGA**VS*LTLIALRSLPAFQSGNLSCQLH**IGQRAGRGGLRIGRSSASSLTDSLRSVVRLRRAVSAHSKAVIRLSTESGDNAGKNM*AKGQQKARNRKKAALLAFFHRLRPPDEHHKNRRSSQRWRNPTGL*RYQAFPPGSSLVRSPVPTLPLTGYLSAFLPSGSVALSHSSRCRYLSSV*VVRSKLGCVHEPPVQPDRCALSGNYRLESNPVRHDLSPLAAATGNRISRARYVGGATEFLKWWPNYGYTRRTVFGICALLKPVTFGKRVGSS*SGKQTTAGSGGFFVCKQQITRRKKGSQEDPLIFSTGSDAQWNENSR*GILVMRLSKRIFT*ILLN*K*SFKSI*SIYE*TWSDSYQCLISEAPISAICLFRSSIVA*LPVV*ITTIREGLPSGPSAAMIPRDPRSPAPDLSAINQPAGRAERRSGPATLSASIQSINCCREARVSSSPVNSLRNVVAIATGIVVSRSSFGMASFSSGSQRSRRVT*SPMLCKKAVSSFGPPIVVRSKLAAVLSLMVMAALHNSLTVMPSVRCFSVTGEYSTKSF*E*CMRRPSCSCPASIRDNTAPHSRTLKVLIIGKRSSGRKLSRILPLLRSSSM*PTRAPN*SSASFTFTSVSG*AKTGRQNAAKKGIRATRKC*ILILFLFQYY*SIYQGYCLMSGYIFECI*KNKQIGVPRTFPRKVPPDV*ETIIIMTLTYKNRRITRPFRLARFGDDGENL*HMQLPETVTACL*ADAGSRQARQGASAGVGGCRGWLNYAASEQIVLRVHHMRCEIPHRCVRRKYRIRRHSPFRLRNCWEGRSVRASSLLRQLAKGGCAARRLSWV
BIOINFORMÁTICA UFMG
A TGC tBLASTn (BLASTx) maximize with PHRED 8
Variação de score usando tblastn em sequências pUC 18 trimadas com phred em diversos valores de cutoff
% erro0 4 8 12 16 20 24
scor
e
0
100
200
300
400
500
15
8
Trim_cutoff parameter value (%)
BL
AST
x sc
ore
BIOINFORMÁTICA UFMG
A TGC Summarizing
PHRED meets BLAST as errors in tip are 16%Molecules carry 3% global errorAnd scores for EST vs aa comparisons maximizeReal life: crossmatch ends with X’sAuthors:– Fabiano Peixoto (CENAPAD)– Francisco Prosdocimi (Lab Biodiversidade)– Maurício Mudado (Lab Biodados)
BIOINFORMÁTICA UFMG
A TGC pUC18 proteina virtual