Combinatorial Pattern Matching BLAST

View
83
Download
0
Category

Documents

Preview:

DESCRIPTION

Combinatorial Pattern Matching BLAST. Tópicos. Introdução Repetições Gênicas Combinatorial Pattern Matching Exact Pattern Matching Approximate Pattern Matching Query Matching BLAST. Introdução. Genomas seqüenciados geram bases de dados gigantescas, que crescem a cada dia - PowerPoint PPT Presentation

Citation preview

Combinatorial Pattern Combinatorial Pattern MatchingMatching

BLASTBLAST

TópicosTópicos• Introdução• Repetições Gênicas• Combinatorial Pattern Matching

– Exact Pattern Matching– Approximate Pattern Matching– Query Matching

• BLAST

IntroduçãoIntrodução• Genomas seqüenciados geram bases de dados

gigantescas, que crescem a cada dia• É importante comparar cada novo com os já

existentes, em busca de similaridades e respostas

• Muitas doenças podem ser identificadas através do genoma, o que aumenta ainda mais essa necessidade

• Algoritmos cada vez mais eficientes são necessários para atender à essa demanda crescente de comparações e para se adequar aos computadores atuais

GenBankGenBankSeqüências

2.000.000

4.000.000

6.000.000

8.000.000

10.000.000

12.000.000

14.000.000

16.000.000

1982

1983

1984

1985

1986

1987

1988

1989

1990

1991

1992

1993

1994

1995

1996

1997

1998

1999

2000

2001

Ano

606

15 milhõesCrescimento do GenBank

Europeu Japonês

24h

HistóriaHistória

Repetições gênicas – Repetições gênicas – MotivaçãoMotivação

• Rearranjos gênicos geralmente são associados a repetições

• Revelam segredos evolutivos• Muitos tumores são caracterizados por

explosões de repetições• ATGGTCTAGGTCCTAGTGGTC• ATGGTCTAGGACCTAGTGTTC

Pode ser bem difícil encontrar repetições, principalmente não exatas

Combinatorial Pattern Matching –Combinatorial Pattern Matching –MotivaçãoMotivação

• Um grande problema em biologia computacional é buscar por um padrão numa grande base de dados

• Combinatorial Pattern Matching engloba vários algoritmos que fazem esse tipo de busca / comparação

• Entre os algoritmos existem aqueles que são exatos (buscam pelo padrão sem erros) ou aproximados (permitem substituições e às vezes gaps)

• Existem ainda algoritmos que buscam múltiplos padrões em um texto (Multiple Pattern Matching)

EXACT PATTERN EXACT PATTERN MATCHINGMATCHING

Descrição do ProblemaDescrição do Problema• Dado um padrão p e um texto t, encontrar

todas as ocorrências exatas de p em t.• Algoritmo de força

bruta:

– Padrão GCAT

– Texto CGCATC

GCATCGCATCGCATCGCATC

CGCATCGCATCGCATCGCATCGCATCGCAT

Algoritmo e ComplexidadeAlgoritmo e Complexidade

• Geralmente O(m)• Pior caso: O(m.n)

ProblemaProblema• Problema:

– Se n for grande demais, o algoritmo torna-se impraticável

• Exemplo:– Padrão: AAAAAA...AAA– Texto: AAAAAAAAAAAAAAA...AAAAA

SoluçãoSolução• Solução:

– 1973, Peter Weiner:• Uma nova estrutura de dados: Suffix Trees

– Resolvem este problema em O(m + n) para qualquer texto e qualquer padrão

• Suffix Tree paraATCATG

(...(...Keyword Tree / Suffix Tree

Keyword TreeKeyword Tree• Um conjunto de padrões

colocados numa árvore comuma raiz– Cada aresta é rotulada com

uma letra do alfabeto– Duas arestas vizinhas têm

rótulos diferentes– Cada padrão pode ser lido

varrendo-se a árvore da raizaté uma folha

Keyword Tree – ConstruçãoKeyword Tree – Construção• Apple

Keyword Tree – ConstruçãoKeyword Tree – Construção• Apple• Apropos

Keyword Tree – ConstruçãoKeyword Tree – Construção• Apple• Apropos• Banana

Keyword Tree – ConstruçãoKeyword Tree – Construção• Apple• Apropos• Banana• Bandana

Keyword Tree – ConstruçãoKeyword Tree – Construção• Apple• Apropos• Banana• Bandana• Orange

Busca na Keyword TreeBusca na Keyword Tree• Busca por “Appeal”

– appeal

Busca na Keyword TreeBusca na Keyword Tree• Busca por “Appeal”

– appeal

Busca na Keyword TreeBusca na Keyword Tree• Busca por “Appeal”

– appeal

Busca na Keyword TreeBusca na Keyword Tree• Busca por “Appeal”

– appeal