33
Bioinformática: Anotação Funcional de Proteínas Rafael Dias Mesquita [email protected] Laboratório de Bioinformática Departamento de Bioquímica Instituto de Química - UFRJ

Bioinformática: Anotação Funcional de Proteínas · Bioinformática: Anotação Funcional de Proteínas Rafael Dias Mesquita [email protected] Laboratório de Bioinformática

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Bioinformática: Anotação Funcional de Proteínas · Bioinformática: Anotação Funcional de Proteínas Rafael Dias Mesquita rdmesquita@iq.ufrj.br Laboratório de Bioinformática

Bioinformática: Anotação Funcional de Proteínas

Rafael Dias Mesquita

[email protected]

Laboratório de Bioinformática

Departamento de Bioquímica Instituto de Química - UFRJ

Page 2: Bioinformática: Anotação Funcional de Proteínas · Bioinformática: Anotação Funcional de Proteínas Rafael Dias Mesquita rdmesquita@iq.ufrj.br Laboratório de Bioinformática

FAT (Functional Analysis Tool)

Estratégia: 1.  Escolha de padrão de domínios conservados.

2.  Anotação automática blasts + identificação de outros domínios

3.  Organização dos Resultados em uma web page

Page 3: Bioinformática: Anotação Funcional de Proteínas · Bioinformática: Anotação Funcional de Proteínas Rafael Dias Mesquita rdmesquita@iq.ufrj.br Laboratório de Bioinformática

Facilidades: 1.  Não sobrescreve resultados antigos. 2.  Tradução, caso queira-se usar sequências de

nucleotídeos (Bibliotecas de EST). 3.  Download dos domínios conservados para uso. 4.  Recuperação de análises parciais. 5.  MODULAR.

FAT (Functional Analysis Tool)

Page 4: Bioinformática: Anotação Funcional de Proteínas · Bioinformática: Anotação Funcional de Proteínas Rafael Dias Mesquita rdmesquita@iq.ufrj.br Laboratório de Bioinformática

Funcionamento:

FAT (Functional Analysis Tool)

Page 5: Bioinformática: Anotação Funcional de Proteínas · Bioinformática: Anotação Funcional de Proteínas Rafael Dias Mesquita rdmesquita@iq.ufrj.br Laboratório de Bioinformática

Validação:

Sequências usadas : Tirosinas fosfatases clássicas de Aedes aegypti (receptoras e não receptoras) - PTP

Domínio Y-PTP

36 (15 ok)

1 + 3*

FAT

21

PTPs humanas (86 seqs)

6329 (16 + 3*)

0

Blast

6313

Filtro ou query

Seqs selecionadas

Falsos Negativos

Falsos Positivos

* Fragmentos (exons de PTPs preditos como gene )

FAT (Functional Analysis Tool)

Page 6: Bioinformática: Anotação Funcional de Proteínas · Bioinformática: Anotação Funcional de Proteínas Rafael Dias Mesquita rdmesquita@iq.ufrj.br Laboratório de Bioinformática

Uso: Resultado

FAT (Functional Analysis Tool)

Page 7: Bioinformática: Anotação Funcional de Proteínas · Bioinformática: Anotação Funcional de Proteínas Rafael Dias Mesquita rdmesquita@iq.ufrj.br Laboratório de Bioinformática

Pedido de Registro ao INPI: 16/11/2010 Concessão do Registro: 22/02/2011

Page 8: Bioinformática: Anotação Funcional de Proteínas · Bioinformática: Anotação Funcional de Proteínas Rafael Dias Mesquita rdmesquita@iq.ufrj.br Laboratório de Bioinformática

EXEMPLOS

FAT (Functional Analysis Tool)

Page 9: Bioinformática: Anotação Funcional de Proteínas · Bioinformática: Anotação Funcional de Proteínas Rafael Dias Mesquita rdmesquita@iq.ufrj.br Laboratório de Bioinformática

Proteínas Cinases e Fosfatases

•  Enzimas responsáveis pela adição ou remoção de um grupamento fosfato em radicais Ser, Thr e Tyr de proteínas e em outras moléculas.

Proteína Proteína

PO4-3

PO4-3

Kinase

Fosfatase

ATP

Page 10: Bioinformática: Anotação Funcional de Proteínas · Bioinformática: Anotação Funcional de Proteínas Rafael Dias Mesquita rdmesquita@iq.ufrj.br Laboratório de Bioinformática

Cinases, Fosfatases e Sinalização Celular

PROBLEMA: Desconhecimento das sequências de Cinases e Fosfatases em vetores Desconhecimento de sua relação com processos fisiológicos e com a

infecção IDENTIFICAÇÃO: 279 proteínas cinases identificadas em Aedes usando FAT 125 tirosinas fosfatases identificadas em todos os mosquitos

48 em Aedes 38 em Anopheles 37 em Culex 2 em outros

Sequências Classificadas Alinhamento dos domínios conservados usando clustalw e PRALINE Construção de dendogramas

Page 11: Bioinformática: Anotação Funcional de Proteínas · Bioinformática: Anotação Funcional de Proteínas Rafael Dias Mesquita rdmesquita@iq.ufrj.br Laboratório de Bioinformática

Proteínas Cinases Típicas Aedes aegypti Domínio conservado: PF00069

Groups A. aegypti DrosophilaAGC 40 30CAMK 36 32CK1 11 10CMGC 41 33Other 43 45STE 22 18TK 36 32TKL 19 17RGC 7 6total 255 223

Typical protein kinases

Page 12: Bioinformática: Anotação Funcional de Proteínas · Bioinformática: Anotação Funcional de Proteínas Rafael Dias Mesquita rdmesquita@iq.ufrj.br Laboratório de Bioinformática

Family Subfamily Models A. aegypti DrosophilaA6 - A6* , A6 duplo* 1 1ABC1 ABC1a, ABC1b, ABC1c ABC1 family 4 3Alpha Chak, eEF2K, other Alpha cinase family 0 1BCR - RhoGAP,RhoGEF,C2 0 0BRD - Bromodomain 0 1FAST - FAST1, FAST2 4 0G11 - G11* 0 0H11 - Alpha-crystallin-hsps 25 ??? 0 ???HistK - Response_reg, Hiska, HATpase_c 1 0PDHK - HATpase_c 4 0

PIKK ATM, ATR, DNAPK, SMG1, TRRAP, FRAP

PI3-PI4 cinase, FAT, FATC, UME 7 5

RIO RIO1, RIO2, RIO3 RIO 1 Family 3 3TAF1 - Bromodomain 0 1TIF - Tif* 0 0

Atypical protein kinases

Proteínas Cinases Atípicas Aedes aegypti

Page 13: Bioinformática: Anotação Funcional de Proteínas · Bioinformática: Anotação Funcional de Proteínas Rafael Dias Mesquita rdmesquita@iq.ufrj.br Laboratório de Bioinformática

Tirosinas Fosfatases

Human Aedes aegypti

Anopheles gambiae

Culex quinquefasciatus

Culicoides sonorensis

Simulium nigrimanum

Class I

PTP Receptor 21 8 9 6 1 -

PTP N-Receptor

17 8 8 7 - 1

MKPs 11 2 2 1 - -

Atypical 19 7 4 5 - -

Slingshots 3 1 1 2 - -

PRL 3 1 1 1 - -

CDC14 4 3 1 2 - -

PTEN 5 6 2 2 - -

Myotubularins 16 7 6 4 - -

Class II LMPTP 1 2 2 4 - -

Class III CDC25 3 1 1 1 - -

Class IV Asp-based 4 2 1 2 - -

Cla

ssic

al

DSP

s

Page 14: Bioinformática: Anotação Funcional de Proteínas · Bioinformática: Anotação Funcional de Proteínas Rafael Dias Mesquita rdmesquita@iq.ufrj.br Laboratório de Bioinformática

Tirosinas Fosfatases Classicas A. aegypti, C. quinquefasciatus, A. gambiae

Page 15: Bioinformática: Anotação Funcional de Proteínas · Bioinformática: Anotação Funcional de Proteínas Rafael Dias Mesquita rdmesquita@iq.ufrj.br Laboratório de Bioinformática

Esterases e Resistência a Inseticidas

•  Esterases são hidrolases que possuem o domínio Carboxil-esterase (PF00135) e participam do processo de detoxificação de inseticidas.

•  O frequente uso de inseticidas provoca uma pressão seletiva nos mosquitos

•  A resistência metabólica pode incluir alterações como maior expressão de esterases

Page 16: Bioinformática: Anotação Funcional de Proteínas · Bioinformática: Anotação Funcional de Proteínas Rafael Dias Mesquita rdmesquita@iq.ufrj.br Laboratório de Bioinformática

PROBLEMA: Dificuldade de classificação das atividades enzimáticas de esterases Desconhecimento da especificidade de substratos Desconhecimento das sequências de esterases em Aedes aegypti IDENTIFICAÇÃO: 66 sequências identificadas usando FAT 6 eliminadas (domínios truncados) => 60 esterases Alinhamento das sequências completas e dos domínios conservados usando

clustalw e PRALINE Busca da tríade catalítica descrita na literatura Construção de dendogramas

Esterases e Resistência a Inseticidas

Page 17: Bioinformática: Anotação Funcional de Proteínas · Bioinformática: Anotação Funcional de Proteínas Rafael Dias Mesquita rdmesquita@iq.ufrj.br Laboratório de Bioinformática

TRIADE CATALÍTICA

Esterases e Resistência a Inseticidas

1yaj

Page 18: Bioinformática: Anotação Funcional de Proteínas · Bioinformática: Anotação Funcional de Proteínas Rafael Dias Mesquita rdmesquita@iq.ufrj.br Laboratório de Bioinformática

PRONEX – Rede Dengue

PRÊMIO de MELHOR POSTER de BIOINFORMÁTICA

SBBq 2011

Esterases e Resistência a Inseticidas

Page 19: Bioinformática: Anotação Funcional de Proteínas · Bioinformática: Anotação Funcional de Proteínas Rafael Dias Mesquita rdmesquita@iq.ufrj.br Laboratório de Bioinformática

HMGB1 – Fator transcricional relacionado a manutenção da estrutura da cromatina

Page 20: Bioinformática: Anotação Funcional de Proteínas · Bioinformática: Anotação Funcional de Proteínas Rafael Dias Mesquita rdmesquita@iq.ufrj.br Laboratório de Bioinformática
Page 21: Bioinformática: Anotação Funcional de Proteínas · Bioinformática: Anotação Funcional de Proteínas Rafael Dias Mesquita rdmesquita@iq.ufrj.br Laboratório de Bioinformática
Page 22: Bioinformática: Anotação Funcional de Proteínas · Bioinformática: Anotação Funcional de Proteínas Rafael Dias Mesquita rdmesquita@iq.ufrj.br Laboratório de Bioinformática

Reparo de DNA Reparo de DNA – Humano

Page 23: Bioinformática: Anotação Funcional de Proteínas · Bioinformática: Anotação Funcional de Proteínas Rafael Dias Mesquita rdmesquita@iq.ufrj.br Laboratório de Bioinformática

Reparo de DNA

tBRCT SH2 Reader

Writer Tyr Kinases PI3K-like Kinases

Text pY pS

Eraser Tyr Phosphatases Ser/Thr Phosphatases

Growth factor signaling

DNA Damage signaling

Domínio BRCT – PF00533

Page 24: Bioinformática: Anotação Funcional de Proteínas · Bioinformática: Anotação Funcional de Proteínas Rafael Dias Mesquita rdmesquita@iq.ufrj.br Laboratório de Bioinformática

PROBLEMA: Confirmar por bioinformática os dados da literatura das proteínas contendo

BRCT Estudar a relação das sequências e a conservação dos BRCTs e suas

funções, incluindo os tandens Estudar a evolução da função do domínio BRCT IDENTIFICAÇÃO: 25 sequências identificadas usando FAT 2 eliminadas (domínios truncados) => 23 proteínas contendo BRCT Alinhamento dos domínios conservados usando clustalw e PRALINE Construção de dendogramas

Reparo de DNA

Page 25: Bioinformática: Anotação Funcional de Proteínas · Bioinformática: Anotação Funcional de Proteínas Rafael Dias Mesquita rdmesquita@iq.ufrj.br Laboratório de Bioinformática

Proteínas relacionadas a vias de reparo de DNA, que possuem o domínio BRCT em tandem.

Page 26: Bioinformática: Anotação Funcional de Proteínas · Bioinformática: Anotação Funcional de Proteínas Rafael Dias Mesquita rdmesquita@iq.ufrj.br Laboratório de Bioinformática
Page 27: Bioinformática: Anotação Funcional de Proteínas · Bioinformática: Anotação Funcional de Proteínas Rafael Dias Mesquita rdmesquita@iq.ufrj.br Laboratório de Bioinformática

RFC1

LIG4_22 LIG4_12

RFC1 possui o BRCT mais próximo do BRCT ancestral Anotação Funcional de Proteínas

Page 28: Bioinformática: Anotação Funcional de Proteínas · Bioinformática: Anotação Funcional de Proteínas Rafael Dias Mesquita rdmesquita@iq.ufrj.br Laboratório de Bioinformática

P R O T E I N D O M A I N S

Charting the Landscape of Tandem BRCTDomain–Mediated Protein InteractionsNicholas T. Woods,1 Rafael D. Mesquita,2* Michael Sweet,1,3 Marcelo A. Carvalho,2,4

Xueli Li,1 Yun Liu,5 Huey Nguyen,1 C. Eric Thomas,6 Edwin S. Iversen Jr.,7 Sylvia Marsillac,1

Rachel Karchin,5 John Koomen,6 Alvaro N. A. Monteiro1†

Eukaryotic cells have evolved an intricate system to resolveDNAdamage to prevent its transmission to daugh-ter cells. This system, collectively knownas theDNAdamage response (DDR) network, includesmany proteinsthat detectDNAdamage, promote repair, andcoordinateprogression through thecell cycle. Becausedefects inthis network can lead to cancer, this network constitutes a barrier against tumorigenesis. Themodular BRCA1carboxyl-terminal (BRCT) domain is frequently present in proteins involved in the DDR, can exist either as anindividual domain or as tandem domains (tBRCT), and can bind phosphorylated peptides. We performed a sys-tematicanalysisofprotein-protein interactions involvingtBRCTin theDDRbycombining literaturecuration,yeasttwo-hybrid screens, and tandem affinity purification coupled to mass spectrometry. We identified 23 proteinscontainingconservedBRCTdomainsandgeneratedahumanprotein-protein interactionnetwork for sevenpro-teinswith tBRCT. This studyalso revealedpreviously unknowncomponents inDNAdamagesignaling, suchasCOMMD1andthe targetof rapamycincomplexmTORC2.Additionally, integrationof tBRCTdomain interactionswith DDR phosphoprotein studies and analysis of kinase-substrate interactions revealed signaling subnet-works that may aid in understanding the involvement of tBRCT in disease and DNA repair.

INTRODUCTION

Cells are constantly subjected toDNAdamage from external aswell as internalcauses. The chemical and physical changes associated with DNA damagecompromise the faithful transmission of genetic information to daughter cells,and thus, cells have evolved an intricate system to coordinate damage sensing,signal transduction, repair processes, and cell cycle progression (1). Defects intheDNAdamage response (DDR)causedbygermlineor somatic changeshaveimportant implications for disease, and the DDR has been proposed to consti-tute an early barrier to tumorigenesis (2, 3). Current cancer therapy regimensexploit weaknesses in this system to selectively kill cancer cells (4, 5). Thus, theestablishment of a platform for the identification of potential sensitizers of ther-apy should accelerate the development of new treatment strategies.

Transmission of intracellular signals often relies on the coordinated in-teractions between protein modular domains and linear peptide motifs (6).A large number of modular domains, defined as units of ~100 amino acidsthat can independently fold in isolation, have been identified, and several ofthem, such as the Src homology 2 (SH2) domain and the BRCA1 C-terminal(BRCT) domain, recognize phosphorylated linear motifs (7–11). The orches-trated recognition of linear motifs by 14-3-3 proteins, proteins with the BRCTdomain, and proteins with the forkhead-associated (FHA) domain is at the coreof DNA damage signaling (12). Modular domains that recognize phosphoryl-ated linear motifs are components of three-part signaling toolkits that also in-

clude protein kinases and phosphatases (10). Thus, to achieve a systemsview of the DDR, it is important to understand not only which modulardomains, linear motifs, kinases, and phosphatases participate in conveyingDNA damage signals but also how their dynamic interactions orchestratethe response.

BRCT domains were initially recognized in the C-terminal region ofBRCA1, a protein encoded by the major breast and ovarian susceptibilitygene with pleiotropic roles in DNA damage repair (13). BRCT domainsare present in a large superfamily of ~40 nonorthologous proteins that partic-ipate in cell cycle checkpoints and in the DDR (14, 15). BRCT domains areversatile modules that mediate protein-protein and protein nucleic acid inter-actions (16). Several BRCT domains recognize linear motifs phosphorylatedby kinases that are activated by DNA damage (8, 9, 17). Therefore, proteinsfound to interact with BRCT domains are likely to have roles in the cellularresponse to DNA damage.

To gain further insight into the determinants of DDR signal transduction,we generated a human protein-protein interaction network (PIN) centered oninteractions mediated by the BRCT domain. This network contains kinases,phosphatases, and other potential BRCT targets previously unknown to par-ticipate in the DDR. In addition, bioinformatics analysis of the constituents ofthe network enabled the identification of biological processes and proteincomplexes that integrate the DDR with other cellular activities, such as cellcycle regulation and transcription. This interaction network has implicationsfor cancer therapy for which understanding the cellular response to DDR-inducing chemotherapy and radiation therapy can be used to improve pa-tient survival and quality of life during treatment.

RESULTS

Analysis of the minimal complement of human genesencoding BRCT-containing proteinsUsing a combination of search strategies (Materials and Methods), weidentified 23 human genes encoding BRCT domain–containing proteins

1Cancer Epidemiology Program, H. Lee Moffitt Cancer Center and ResearchInstitute, Tampa, FL 33612, USA. 2Instituto Federal de Educação, Ciência eTecnologia, Rio de Janeiro, RJ 20270, Brazil. 3Graduate Program in BiomedicalSciences, College of Medicine, University of South Florida, Tampa, FL 33612,USA. 4Instituto Nacional do Câncer, Rio de Janeiro, RJ 20231, Brazil. 5Institutefor Computational Medicine, Department of Biochemical Engineering, JohnsHopkins University, Baltimore, MD 21218, USA. 6Molecular Oncology Program,H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL 33612, USA. 7De-partment of Statistical Science, Duke University, Durham, NC 27708, USA.*Departamento de Bioquímica, Instituto de Química, Universidade Federaldo Rio de Janeiro, Rio de Janeiro, RJ 21941, Brazil.†To whom correspondence should be addressed. E-mail: [email protected]

R E S E A R C H R E S O U R C E

www.SCIENCESIGNALING.org 18 September 2012 Vol 5 Issue 242 rs6 1

on September 19, 2012

stke.sciencemag.org

Dow

nloaded from

Proteínas relacionadas a vias de reparo de DNA, que possuem o domínio BRCT em tandem.

P R O T E I N D O M A I N S

Charting the Landscape of Tandem BRCTDomain–Mediated Protein InteractionsNicholas T. Woods,1 Rafael D. Mesquita,2* Michael Sweet,1,3 Marcelo A. Carvalho,2,4

Xueli Li,1 Yun Liu,5 Huey Nguyen,1 C. Eric Thomas,6 Edwin S. Iversen Jr.,7 Sylvia Marsillac,1

Rachel Karchin,5 John Koomen,6 Alvaro N. A. Monteiro1†

Eukaryotic cells have evolved an intricate system to resolveDNAdamage to prevent its transmission to daugh-ter cells. This system, collectively knownas theDNAdamage response (DDR) network, includesmany proteinsthat detectDNAdamage, promote repair, andcoordinateprogression through thecell cycle. Becausedefects inthis network can lead to cancer, this network constitutes a barrier against tumorigenesis. Themodular BRCA1carboxyl-terminal (BRCT) domain is frequently present in proteins involved in the DDR, can exist either as anindividual domain or as tandem domains (tBRCT), and can bind phosphorylated peptides. We performed a sys-tematicanalysisofprotein-protein interactions involvingtBRCTin theDDRbycombining literaturecuration,yeasttwo-hybrid screens, and tandem affinity purification coupled to mass spectrometry. We identified 23 proteinscontainingconservedBRCTdomainsandgeneratedahumanprotein-protein interactionnetwork for sevenpro-teinswith tBRCT. This studyalso revealedpreviously unknowncomponents inDNAdamagesignaling, suchasCOMMD1andthe targetof rapamycincomplexmTORC2.Additionally, integrationof tBRCTdomain interactionswith DDR phosphoprotein studies and analysis of kinase-substrate interactions revealed signaling subnet-works that may aid in understanding the involvement of tBRCT in disease and DNA repair.

INTRODUCTION

Cells are constantly subjected toDNAdamage from external aswell as internalcauses. The chemical and physical changes associated with DNA damagecompromise the faithful transmission of genetic information to daughter cells,and thus, cells have evolved an intricate system to coordinate damage sensing,signal transduction, repair processes, and cell cycle progression (1). Defects intheDNAdamage response (DDR)causedbygermlineor somatic changeshaveimportant implications for disease, and the DDR has been proposed to consti-tute an early barrier to tumorigenesis (2, 3). Current cancer therapy regimensexploit weaknesses in this system to selectively kill cancer cells (4, 5). Thus, theestablishment of a platform for the identification of potential sensitizers of ther-apy should accelerate the development of new treatment strategies.

Transmission of intracellular signals often relies on the coordinated in-teractions between protein modular domains and linear peptide motifs (6).A large number of modular domains, defined as units of ~100 amino acidsthat can independently fold in isolation, have been identified, and several ofthem, such as the Src homology 2 (SH2) domain and the BRCA1 C-terminal(BRCT) domain, recognize phosphorylated linear motifs (7–11). The orches-trated recognition of linear motifs by 14-3-3 proteins, proteins with the BRCTdomain, and proteins with the forkhead-associated (FHA) domain is at the coreof DNA damage signaling (12). Modular domains that recognize phosphoryl-ated linear motifs are components of three-part signaling toolkits that also in-

clude protein kinases and phosphatases (10). Thus, to achieve a systemsview of the DDR, it is important to understand not only which modulardomains, linear motifs, kinases, and phosphatases participate in conveyingDNA damage signals but also how their dynamic interactions orchestratethe response.

BRCT domains were initially recognized in the C-terminal region ofBRCA1, a protein encoded by the major breast and ovarian susceptibilitygene with pleiotropic roles in DNA damage repair (13). BRCT domainsare present in a large superfamily of ~40 nonorthologous proteins that partic-ipate in cell cycle checkpoints and in the DDR (14, 15). BRCT domains areversatile modules that mediate protein-protein and protein nucleic acid inter-actions (16). Several BRCT domains recognize linear motifs phosphorylatedby kinases that are activated by DNA damage (8, 9, 17). Therefore, proteinsfound to interact with BRCT domains are likely to have roles in the cellularresponse to DNA damage.

To gain further insight into the determinants of DDR signal transduction,we generated a human protein-protein interaction network (PIN) centered oninteractions mediated by the BRCT domain. This network contains kinases,phosphatases, and other potential BRCT targets previously unknown to par-ticipate in the DDR. In addition, bioinformatics analysis of the constituents ofthe network enabled the identification of biological processes and proteincomplexes that integrate the DDR with other cellular activities, such as cellcycle regulation and transcription. This interaction network has implicationsfor cancer therapy for which understanding the cellular response to DDR-inducing chemotherapy and radiation therapy can be used to improve pa-tient survival and quality of life during treatment.

RESULTS

Analysis of the minimal complement of human genesencoding BRCT-containing proteinsUsing a combination of search strategies (Materials and Methods), weidentified 23 human genes encoding BRCT domain–containing proteins

1Cancer Epidemiology Program, H. Lee Moffitt Cancer Center and ResearchInstitute, Tampa, FL 33612, USA. 2Instituto Federal de Educação, Ciência eTecnologia, Rio de Janeiro, RJ 20270, Brazil. 3Graduate Program in BiomedicalSciences, College of Medicine, University of South Florida, Tampa, FL 33612,USA. 4Instituto Nacional do Câncer, Rio de Janeiro, RJ 20231, Brazil. 5Institutefor Computational Medicine, Department of Biochemical Engineering, JohnsHopkins University, Baltimore, MD 21218, USA. 6Molecular Oncology Program,H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL 33612, USA. 7De-partment of Statistical Science, Duke University, Durham, NC 27708, USA.*Departamento de Bioquímica, Instituto de Química, Universidade Federaldo Rio de Janeiro, Rio de Janeiro, RJ 21941, Brazil.†To whom correspondence should be addressed. E-mail: [email protected]

R E S E A R C H R E S O U R C E

www.SCIENCESIGNALING.org 18 September 2012 Vol 5 Issue 242 rs6 1

on September 19, 2012

stke.sciencemag.org

Downloaded from

Page 29: Bioinformática: Anotação Funcional de Proteínas · Bioinformática: Anotação Funcional de Proteínas Rafael Dias Mesquita rdmesquita@iq.ufrj.br Laboratório de Bioinformática
Page 30: Bioinformática: Anotação Funcional de Proteínas · Bioinformática: Anotação Funcional de Proteínas Rafael Dias Mesquita rdmesquita@iq.ufrj.br Laboratório de Bioinformática

(Fig. 1A and table S1). Of those, only DBF4B had not been previouslyrecognized as encoding a BRCT-containing protein. Because DBF4Bhas some, but not all, of the structural hallmarks of BRCT domains, finaldetermination of the classification of this protein as a BRCT-containingone requires structural and functional confirmation. We also identifieda group of eight genes that encode proteins with truncated or degener-ated (lacking conserved residues or structural motifs) BRCT domains(Fig. 1A and table S1). BRCT-encoding geneswere located scattered along different au-tosomal chromosomes; none was located insex chromosomes (table S1). Several of thegenes encode for BRCT domain–containingproteins involved in DNAmetabolism, such asDNA polymerases and polymerase-associatedproteins (encoded by POLL, POLM, REV1,RFC1, and DNTT) and DNA ligases (en-coded by LIG4 and LIG3). Unlike some othermodular domains that preferentially co-occur with other specific modules, BRCTdomains co-occurred with many differentdomains (Fig. 1A).

BRCT in tandem: A differentclass of BRCT domainsBRCT domains in several proteins recog-nize and bind phosphorylated serine residues(phosphoserines) (8, 9). BRCT domainsfound in tandem (Fig. 1A) can behave as asingle structural unit (18, 19) that binds phos-phopeptides through recognition pocketsformed by both BRCT units (20–23). Thestructural basis for phosphopeptide recogni-tion by single BRCTs is presently unknown(9, 24). Therefore, we propose that these tan-dem BRCT (tBRCT) domains represent adistinct class of BRCT domains. An un-rooted tree for all of the individual humanBRCTunits revealed that they group in sev-eral distinct branches (Fig. 1B, top). Toexplore the relationship between tBRCTdomains, we generated an unrooted treein which the units were not the individualBRCTs but rather the tBRCT domains (de-fined as two singleton BRCTs connectedby a linker region) (Fig. 1B, bottom). Thetwo singletons from a tandem frequentlyoriginated from different branches of theunrooted BRCT singleton tree (Fig. 1B).Indeed, only in the tandem domains forTP53BP1 did the two singletons come fromthe same branch. Thus, we hypothesize thatthe sequence asymmetry arising from tBRCTdomain composed of BRCT singletons withdifferent evolutionary origins reflects the struc-tural determinants for phosphopeptide rec-ognition and specificity (20–23).

To probe the biology of tBRCT domains,we generated a comprehensive PIN centeredon tBRCT domain–mediated interactions.We focused our search on proteins for which

there was structural or functional evidence in the literature that the tBRCTfunctioned as a single unit (Materials and Methods). Because these do-mains might display phosphorylation-independent modes of binding,we did not initially restrict the analysis to phosphorylation-dependentinteractions. Our final list contained seven tBRCT domains from ECT2,LIG4, MDC1, BARD1, TP53BP1, BRCA1, and the C-terminal tandemfrom PAXIP1.

TOPBP1 (1522)

TP53BP1 (1972, 1977, 1975)

MDC1 (2089)

NBN (754) PARP1 (1014)

PARP4 (1724)

PAXIP1 (1069)

PES1 (588)

POLL (575)

POLM (494)

REV1 (1251, 1250)

RFC1 (1147)

XRCC1 (633)

ANKRD32 (1058)

BARD1 (777)

BRCA1 (1863)

CTDP1 (961, 867)

DBF4B (615, 170)

ECT2 (883)

LIG3 (1009, 949)

A

MCPH1 (835)

DNTT (509, 508)

LIG4 (911)

DDX39B (428)

DBF4 (674)

RB1 (928)

UBE3C (1083)

GAS8 (478)

RBL1 (1068)

RBL2 (1139)

TEF2IP (399)

500 1000100

DH (DBL homology)PH (Pleckstrin homology)

FHABRCT TUDORCo-occurring domains

AnkyrinDEADcHelicase c

Biotinyl domainFCP1 c

RING

DBF zf

PolXc

PARP and DNAlig zfLigase

PARPPolzeta

AAA+ATPaseMyb-likeHECT

BRCTw or truncated

B

Support (%)a: 91-100b: 81-90c: 71-80d: 61-70e: 51-60f: 41-50

N-term C-term

tBRCT module key:Name_t1

*Colors determined by Singleton tree, above.

BRCT singletons

TOPBP1_77

a

a

b a

a

d

aabb

aa

c

aa

aa a

a

aa b d

dd

dd

a

a

a

d

a

acb a

bd

aa

c

a PARP1_11

RFC1_11

TOPBP1_67

XR

CC

1_12

CTD

P1_11

PA

XIP

1_46

TOP

BP

1_57

PA

XIP

1_26

TOP

BP

1_27

ECT2

_22

DN

TT_1

1

POLM

_11

LIG4_

22BARD1_12

BRCA1_12MCPH1_23

ECT2_12TOPBP1_17

MDC1_12

NBN_12PAXIP1_56

ANKRD32_12

MCPH1_13

PARP4_11

PAXIP1_36P

OLL_11

AN

KR

D32_22

BA

RD

1_22

TOP

BP

1_37

LIG

3_11

XR

CC

1_22

BRC

A1_2

2LI

G4_

12 MDC1_22

PAXIP1_

66

TOPBP1_47

PAXIP1_16

TP53BP1_22

REV1_11

DBF4B_11PES1_11MCPH1_33NBN_22

TP53BP1_12

BRCT tandems

a

bedf

ef

NBN_t1

PA

XIP

1_t3 M

DC

1_t1

BRC

A1_t1

MCPH1_t1

BARD1_t1

ANKRD32_t1

TP53BP1_t1

TOPB

P1_t

1

TOP

BP

1_t2

PA

XIP

1_t2

PAXIP1_t1

ECT2_t1

XRCC1_t1

LIG4_t1

aa

aa

aa

Fig. 1. The BRCT superfamily. (A) Depiction of the minimal complement of all human proteins containingBRCT domains. Individual diagrams represent the predominant splice forms in the published literaturewith the number of amino acid residues in parentheses. BRCT domains (blue boxes) are illustrated withother co-occurring domains. Proteins at the bottom left (dashed line box) represent those with truncated ordegenerated BRCT domains (tan boxes). The proteins are drawn to scale. (B) Human sequences for theisolated BRCT domains were used to cluster singleton and tBRCT domains on the basis of sequencealignments using the neighbor-joining method to generate unrooted trees. Top (BRCT singletons): Un-rooted tree illustrating amino acid sequence conservation between individual BRCT singleton domains withbootstrapping scores represented by lowercase letters (see key for percent support). Numbers shown afterunderscore in protein names refer to BRCT relative position (for example, TOPBP1_57 refers to the fifth BRCTin TOPBP1, which contains seven BRCTs, starting from the N terminus). Bottom (BRCT tandems): Unrootedtree illustrating amino acid sequence conservation between tBRCT domains. Numbers shown afterunderscore in protein names refer to tBRCT’s relative position (for example, PAXIP1_t3 refers to the thirdtBRCT in PAXIP1). Colored bars below protein names represent the position of the individual BRCT domainsin different branches from the BRCT singleton tree above.

R E S E A R C H R E S O U R C E

www.SCIENCESIGNALING.org 18 September 2012 Vol 5 Issue 242 rs6 2

on September 19, 2012

stke.sciencemag.org

Dow

nloaded from

Page 31: Bioinformática: Anotação Funcional de Proteínas · Bioinformática: Anotação Funcional de Proteínas Rafael Dias Mesquita rdmesquita@iq.ufrj.br Laboratório de Bioinformática

Agradecimentos

Estudantes Eloy, Tathyanne, Rafael, Thayani, Rodrigo e Gabriel

Financiamentos INCT Entomologia Molecular - CNPq+FAPERJ Grupos Emergentes - FAPERJ PRONEX Dengue - CNPq APQ1 – FAPERJ IC - FAPERJ PIBIC – UFRJ

Colaborações Bioinformática •  José Marcos Ribeiro (NIH) •  Glória Braz (IQ-UFRJ) Anotação Funcional •  Mário Silva-Neto (IBqM-UFRJ) •  Alvaro Monteiro (Moffitt Cancer Center) •  Marcelo Alex Carvalho (IFRJ) •  Denise Valle (Fiocruz) •  Renata Schama (Fiocruz) •  Fernando Genta (Fiocruz) •  Marcelo Fantapiee (IBqM – UFRJ)

Page 32: Bioinformática: Anotação Funcional de Proteínas · Bioinformática: Anotação Funcional de Proteínas Rafael Dias Mesquita rdmesquita@iq.ufrj.br Laboratório de Bioinformática

Agradecimentos Estudantes •  Eloy Seabra-Junior (IC tec) •  Hanna Condelo (IC) •  Julia Miranda (IC) •  Nathalia Souza (IC) •  Eduardo Matos

Colaborações Genoma Rhodnius •  Pedro Oliveira (IBqM-UFRJ) •  José Marcos Ribeiro (NIH) •  Glória Braz (IQ-UFRJ) Anotação Funcional •  Mário Silva-Neto (IBqM-UFRJ) •  Alvaro Monteiro (Moffitt Cancer Center) •  Marcelo Alex Carvalho (IFRJ) •  Denise Valle (Fiocruz) •  Renata Schama (Fioc

Eloy

Hanna Julia Nathalia

Eduardo

Laboratório de Bioinformática: Equipe

Page 33: Bioinformática: Anotação Funcional de Proteínas · Bioinformática: Anotação Funcional de Proteínas Rafael Dias Mesquita rdmesquita@iq.ufrj.br Laboratório de Bioinformática

Nós temos vagas… Estudantes de mestrado

Rafael Dias Mesquita [email protected]