21
18/07/14 1 Francis de Morais Franco Nunes O mundo dos RNAs não-codificadores de proteínas _________________________________________________ !"#" %"& '()& *"+,-#.+"/0& +0 1/"203%.&

13Non-Coding FNunes · Distribuição comparativa do tamanho dos genomas (barras vermelhas e valores no eixo vertical esquerdo) versus o número de genes (barras azuis e valores no

Embed Size (px)

Citation preview

Page 1: 13Non-Coding FNunes · Distribuição comparativa do tamanho dos genomas (barras vermelhas e valores no eixo vertical esquerdo) versus o número de genes (barras azuis e valores no

18/07/14

1

Francis de Morais Franco Nunes

O mundo dos RNAs não-codificadores de proteínas

_________________________________________________

!"#"$%"&$'()&$*"+,-#.+"/0&$+0$1/"203%.&$

Page 2: 13Non-Coding FNunes · Distribuição comparativa do tamanho dos genomas (barras vermelhas e valores no eixo vertical esquerdo) versus o número de genes (barras azuis e valores no

18/07/14

2

DNA

mRNA

Proteínas

Gene A Gene B Gene C

transcrição

tradução

Expressão Gênica Diferencial Temporal

Expressão Gênica Diferencial Espacial

Page 3: 13Non-Coding FNunes · Distribuição comparativa do tamanho dos genomas (barras vermelhas e valores no eixo vertical esquerdo) versus o número de genes (barras azuis e valores no

18/07/14

3

444$5"#"$2.6786$%"&$09060%2"&$:0%8;#"&$%<"=#"+,-#.+"/0&$.&&"#,.+"&$

>$&3%20&0$?/"20,#.$

Locus para rRNA! Locus para tRNA! Locus para mRNA!

Molécula de rRNA! Molécula de tRNA!

Transcrição!

Molécula de mRNA!

Tradução!

Proteína!

Produtos finais!

Transcrição!

Cla

sse

s d

e R

NA

s

Page 4: 13Non-Coding FNunes · Distribuição comparativa do tamanho dos genomas (barras vermelhas e valores no eixo vertical esquerdo) versus o número de genes (barras azuis e valores no

18/07/14

4

Distribuição comparativa do tamanho dos genomas (barras vermelhas e valores no eixo vertical esquerdo) versus o número de genes (barras azuis e valores no eixo vertical direito) em diferentes organismos. Dados compilados de informações do NCBI-GenBank (http://www.ncbi.nlm.nih.gov/genbank) e literatura científica (buscas em http://www.ncbi.nlm.nih.gov/pubmed) referente aos genomas sequenciados de cada espécie: homem (Homo sapiens), camundongo (Mus musculus), gato (Felis catus), cavalo (Equus caballus), cão (Canis lupus familiaris), galinha (Gallus gallus), rã (Xenopus tropicalis), peixe (Danio rerio), ouriço-do-mar (Strongylocentrotus purpuratus), mosquito (Aedes aegypti), mosca (Drosophila melanogaster), verme (Caenorhabditis elegans), arroz (Oryza sativa), levedura (Saccharomyces cerevisiae) e bactéria (Escherichia coli).

Nunes, 2011 – Rev. Genética na Escola 6(1):67-70.

Ameba = 290 x Cebola = 43 x

Batata = 890Mb/39 mil genes

A proporção de ncRNA aumenta em função da complexidade dos organismos

Procariotos < 25% Eucariotos simples 25-50% Eucariotos complexos > 50% Humanos ~ 98%

Mattick 2007 – Genome Biol.

Page 5: 13Non-Coding FNunes · Distribuição comparativa do tamanho dos genomas (barras vermelhas e valores no eixo vertical esquerdo) versus o número de genes (barras azuis e valores no

18/07/14

5

Genes < 2% Não-codificador > 98%

... ...

‘@A%B$C()’$D!"#$%&'()*E$

>90% do genoma humano (eucromatina) é transcrito

Birney et al. 2007 - Nature Kapranov et al. 2007 – Nature Rev. Genet.

~ 75%

>90%

GEN

OM

A

TRA

NSC

RITO

- Seqs não-repetitivas -  Funcional x “noise” ? -  Numerosas classes de ncRNAs -  Complexidade = info genética

+ regulação fina

Next-generation sequencing

Nagalakshmi et al. 2008 - Science

Page 6: 13Non-Coding FNunes · Distribuição comparativa do tamanho dos genomas (barras vermelhas e valores no eixo vertical esquerdo) versus o número de genes (barras azuis e valores no

18/07/14

6

Fonte: http://www2.uah.es/biologia_celular/LaCelula/Cel4Nuc.html

*/"6.;%.$+0$+,50/0%20&$#89A9.&$$

+ = !

E = eucromatina H = heterocromatina

TRA

NSC

RIÇ

ÃO

PE

RVA

SIV

A

-  Não possuem características típicas de reconhecimento de genes codificadores (mRNA):

Metionina (ATG)

Stop codons (TAA, TAG, TGA)

ORFs

Sítios canônicos de splicing

Métodos baseados em transcritos poliadenilados?

$

1"/$FA0$"&$%#'()&$%<"$5"/.6$+0&#"70/2"&$.%20&G$

GT AG ... 3’ 5’ ...

Page 7: 13Non-Coding FNunes · Distribuição comparativa do tamanho dos genomas (barras vermelhas e valores no eixo vertical esquerdo) versus o número de genes (barras azuis e valores no

18/07/14

7

(0H2$I0%0/.;"%$J0FA0%#,%:$K.?0.%+"$/0.+&$%"$:0%"6.$+0$/050/L%#,.$

Estr

utur

a s

ec

und

ária

microRNA

mRNA

5,8 S rRNA

tRNA

Page 8: 13Non-Coding FNunes · Distribuição comparativa do tamanho dos genomas (barras vermelhas e valores no eixo vertical esquerdo) versus o número de genes (barras azuis e valores no

18/07/14

8

Conformações - Estrutura Secundária RNA

Fonte: Robert Giegerich – EMBO RNA Course - 2008

$'()&$%<"=#"+,-#.+"/0&$

+0$?/"203%.&$$

%#'()$

Page 9: 13Non-Coding FNunes · Distribuição comparativa do tamanho dos genomas (barras vermelhas e valores no eixo vertical esquerdo) versus o número de genes (barras azuis e valores no

18/07/14

9

RNA

s B

IFU

NC

ION

AIS

2008

p53 - proteína!

N

C

p53 - mRNA!

Domínio Mdm2!

Com Ligação = proteína ativa

Sem Ligação = degradação

-  “unnanotated”

-  Trans-ativação x Cis-ativação (ou ‘reg’)

-  Expressão diferencial: Temporal (desenvolvimento)

Espacial (céls/tecidos-específicos)

- Localização genômica: intergênicos, intragênicos, intrônicos, antisense

- Tamanho (arbitrário):

small < 200 nucleotídeos > long

No

n-c

od

ing

RN

A

Page 10: 13Non-Coding FNunes · Distribuição comparativa do tamanho dos genomas (barras vermelhas e valores no eixo vertical esquerdo) versus o número de genes (barras azuis e valores no

18/07/14

10

-  Orquestram DIVERSOS Processos Biológicos e Funções Moleculares

Replicação, trancrição, tradução, silenciamento,

estabilidade cromossômica, modificação/processamento/estabilidade de RNA, estabilidade/

localização de proteínas…

Comportamento, senescência, reprodução, metabolismo, apoptose, proliferação e diferenciação

celular, morfogênese, respostas ao estresse…

Perturbações nesses reguladores resultam em desordens!

No

n-c

od

ing

RN

A

Page 11: 13Non-Coding FNunes · Distribuição comparativa do tamanho dos genomas (barras vermelhas e valores no eixo vertical esquerdo) versus o número de genes (barras azuis e valores no

18/07/14

11

Junho, 2011

Nível de expressão

Oligoribonuclease

exonuclease

Page 12: 13Non-Coding FNunes · Distribuição comparativa do tamanho dos genomas (barras vermelhas e valores no eixo vertical esquerdo) versus o número de genes (barras azuis e valores no

18/07/14

12

PIWI-interacting RNAs (piRNAs) Tamanho = 28–33 nt Origem = células da linhagem germinativa. Em Drosophila, foram encontrados em células somáticas. Alguns piRNAs de camundongo se originam de sequências repetitivas. Ação = se associam a proteínas da família PIWI, controlam atividade de transposons em linhagem germinativa e a viabilidade desta em C. elegans, Drosophila, peixes e mamíferos.

S. Valadkhan and L.S. Gunawardane 81

© 2013 Biochemical Society

Figure 1. U6 and U2 snRNAs and the mRNA at the time of fi rst and second steps of splicingThe location of U6, U2 and the U6 ISL is shown. The intron is shown by a thick light blue line connecting the two exons. Position of the 5! splice site (5!SS), 3! splice site (3!SS) and branch site are shown. Solid arrows point to the site of the nucleophilic attack during the two steps of splicing. The ! rst step involves a nucleophilic attack by the 2! hydroxy group of a speci! c adenosine residue in the intron, the branch site adenosine (the bulged A), on the 5! splice site. This leads to a trans-esteri! cation reaction in which the 2! oxygen of the branch-site adenosine replaces the 3! oxygen of the last nucleotide of the upstream exon. The result of this reaction is the release of the ! rst exon and the formation of an unusual 2!–5! linkage between the branch site adenosine and the ! rst nucleotide of the intron (right-hand panel). During the second step, the free 3! hydroxyl moiety of the newly released exon is activated for a similar nucleophilic attack on the 3! splice site, resulting in ligation of the two exons and release of the intron as a branched lariat. Base-pairing interactions are shown by short black lines. The location of the 2!–5! linkage formed after the ! rst step of splicing at the branch site is shown.

Figure 2. The structural organization of the group II self-splicing intron aI5"The location of domains I–VI, the two exons (shown in green), the splice sites and the branch site (5!SS, 3!SS and BS respectively) are shown. The position of J2/3 and the AGC sequence are indicated. The metal-binding site of domain V is shown by a red ‘Mg’ sign. Broken lines connect regions which are juxtaposed to form the catalytic core. The circles denote the functional equivalent of each domain or subdomain in the spliceosome. # and #’ sites, which are involved in an interaction important in recognition of the 5! splice site, are shown.

79

© The Authors Journal compilation © 2013 Biochemical Society Essays Biochem. (2013) 54, 79–90: doi: 10.1042/BSE0540079 6Role of small nuclear RNAs in eukaryotic gene expressionSaba Valadkhan1 and Lalith S. GunawardaneCenter for RNA Molecular Biology, Case Western Reserve University, 10900 Euclid Avenue, Cleveland, OH 44106, U.S.A.

AbstractEukaryotic cells contain small, highly abundant, nuclear-localized non-coding RNAs [snRNAs (small nuclear RNAs)] which play important roles in splicing of introns from primary genomic transcripts. Through a combination of RNA–RNA and RNA–protein interactions, two of the snRNPs, U1 and U2, recognize the splice sites and the branch site of introns. A complex remodelling of RNA–RNA and protein-based interactions follows, resulting in the assembly of catalytically competent spliceosomes, in which the snRNAs and their bound proteins play central roles. This process involves formation of extensive base-pairing interac-tions between U2 and U6, U6 and the 5! splice site, and U5 and the exonic sequences immediately adjacent to the 5! and 3! splice sites. Thus RNA–RNA interactions involving U2, U5 and U6 help position the reacting groups of the ! rst and second steps of splicing. In addition, U6 is also thought to participate in for-mation of the spliceosomal active site. Furthermore, emerging evidence suggests additional roles for snRNAs in regulation of various aspects of RNA biogenesis, from transcription to polyadenylation and RNA stability. These snRNP-mediated regulatory roles probably serve to ensure the co-ordination of the different pro-cesses involved in biogenesis of RNAs and point to the central importance of snRNAs in eukaryotic gene expression.

Keywords:

group II intron, ribozyme, small nuclear RNA, spliceosome.

1To whom correspondence should be addressed (email [email protected]).

79

© The Authors Journal compilation © 2013 Biochemical Society Essays Biochem. (2013) 54, 79–90: doi: 10.1042/BSE0540079 6Role of small nuclear RNAs in eukaryotic gene expressionSaba Valadkhan1 and Lalith S. GunawardaneCenter for RNA Molecular Biology, Case Western Reserve University, 10900 Euclid Avenue, Cleveland, OH 44106, U.S.A.

AbstractEukaryotic cells contain small, highly abundant, nuclear-localized non-coding RNAs [snRNAs (small nuclear RNAs)] which play important roles in splicing of introns from primary genomic transcripts. Through a combination of RNA–RNA and RNA–protein interactions, two of the snRNPs, U1 and U2, recognize the splice sites and the branch site of introns. A complex remodelling of RNA–RNA and protein-based interactions follows, resulting in the assembly of catalytically competent spliceosomes, in which the snRNAs and their bound proteins play central roles. This process involves formation of extensive base-pairing interac-tions between U2 and U6, U6 and the 5! splice site, and U5 and the exonic sequences immediately adjacent to the 5! and 3! splice sites. Thus RNA–RNA interactions involving U2, U5 and U6 help position the reacting groups of the ! rst and second steps of splicing. In addition, U6 is also thought to participate in for-mation of the spliceosomal active site. Furthermore, emerging evidence suggests additional roles for snRNAs in regulation of various aspects of RNA biogenesis, from transcription to polyadenylation and RNA stability. These snRNP-mediated regulatory roles probably serve to ensure the co-ordination of the different pro-cesses involved in biogenesis of RNAs and point to the central importance of snRNAs in eukaryotic gene expression.

Keywords:

group II intron, ribozyme, small nuclear RNA, spliceosome.

1To whom correspondence should be addressed (email [email protected]).

Page 13: 13Non-Coding FNunes · Distribuição comparativa do tamanho dos genomas (barras vermelhas e valores no eixo vertical esquerdo) versus o número de genes (barras azuis e valores no

18/07/14

13

104 Essays in Biochemistry volume 54 2013

© The Authors Journal compilation © 2013 Biochemical Society

abolish either transcription or translation (Figure 1A) [3]. A second class of pseudogene, the duplicated pseudogene (Figure 1B), is formed when replication of the chromosome is per-formed incorrectly [2]. Such duplication events o!en lead to the formation of functional gene families, such as those found in the Hox gene clusters, but if part of the gene is not faithfully copied then these can lead to frameshi! mutations or the loss of a promoter or enhancer, thus resulting in a non-functional duplicated pseudogene. "e #nal class, known as the processed pseudogene (Figure 1C), is formed when an mRNA molecule is reverse-transcribed and inte-grated into a new location in the parental genome [4]. Because processed pseudogenes are pro-duced from mRNA, they usually lack introns and a promoter, and are therefore only transcribed if they become integrated close to a pre-existing promoter [5].

"e sequencing of a range of genomes, including the human genome, has revealed the extent of pseudogene abundance [6–8]. Estimates for the number of human pseudogenes range from 10000 to 20000, making them almost as prevalent as coding genes [9]. "e major-ity of these are processed pseudogenes [6–8] and less than 100 are unitary pseudogenes [3]. Interestingly, the processed pseudogenes found in the human genome have been formed from just 10% of the coding genes [6,8], suggesting that either not all genes are capable of producing processed pseudogenes, or that only the processed pseudogenes produced by certain types of gene are selected for by evolution. "e types of genes that produce processed pseudogenes are predominantly highly expressed housekeeping genes or shorter RNAs such as genes encoding ribosomal proteins [10]. It is of note that whereas mammalian genomes are particularly well endowed with pseudogene numbers [9], they are by no means the only species that harbour them. Pseudogenes have been found in various species [11], including bacteria, plants, insects and nematode worms, examples of which can be found in various databases [12].

Pseudogenes have o!en been labelled as ‘junk DNA’ because they lack protein-coding capacity. In fact, some genes that appear to be pseudogenized may in fact code for proteins

Figure 1. Different classes of pseudogene(A) A unitary pseudogene is formed when a spontaneous mutation occurs in a coding gene. Such mutations may ablate transcription from the promoter or cause premature stop codons or frameshifts to occur. (B) A duplicated pseudogene is formed when a gene is duplicated, but in such a way that mutations in the copy prevent formation of a protein. (C) Processed pseudogenes arise when DNA is transcribed into RNA, which is then reverse-transcribed into copy DNA (cDNA) and integrated into the genome. Such pseudogenes often lack promoter activity and may have deletions or truncations that prevent protein formation. Closed boxes depict exons; open boxes depict introns; ‘X’ shows a mutation that prevents the DNA from being able to make a protein.

bse0540103h.indd 104 2/23/2013 9:30:57 PM

Page 14: 13Non-Coding FNunes · Distribuição comparativa do tamanho dos genomas (barras vermelhas e valores no eixo vertical esquerdo) versus o número de genes (barras azuis e valores no

18/07/14

14

114 Essays in Biochemistry volume 54 2013

© The Authors Journal compilation © 2013 Biochemical Society

advocating that all are functional to those proposing that, without strong experimental evi-dence, they should be considered as being functionally inert; others’ opinions lie between these polar extremes [4–6]. Transcripts exceeding 200 nt in length and that are apparently non-cod-ing may be categorized as lncRNAs [long ncRNAs (non-coding RNAs); also previously desig-nated ‘large’ ncRNAs]. A subset of these which do not overlap known protein-coding gene loci are known as lincRNAs (long intergenic ncRNAs) [7]. ! ese have been preferred for investiga-tion because their transcripts and functions are more likely to be independent of known pro-tein-coding genes. It is this subset of the larger class of lncRNAs that we shall mainly discuss in the present chapter. Two examples of mouse lncRNA loci are shown in Figure 1. Most such loci generate apparently non-coding transcripts and can be complex, with transcripts produced whose genomic sequences overlap on both strands. Non-coding transcript maps are also exten-sive: the majority of nucleotides have been suggested to be transcribed at some point during normal development in, for example, Drosophila melanogaster [8].

Metazoan genomes are currently predicted to contain thousands of these loci, from approx-imately 1119 in the fruit" y [9] to more than 8000 in the human genome [10,11]. Such loci can be described as genes since they show some of the transcriptional, chromatin and evolutionary fea-tures of protein-coding genes. Nevertheless, this should not be meant to imply that each is func-tional. For example, some transcripts (RNA molecules) may not themselves transact a function, even if the act of their transcription is functional, for example by transcriptional interference [7].

Like the majority of mRNAs, many lncRNAs are thought to be polyadenylated and tran-scribed by RNA Pol II (RNA polymerase II). Recent mouse and human lincRNA sets have been de# ned using chromatin immunoprecipitation experiments targeting H3K4me3 (histone H3 Lys4 trimethylation) and H3K36me3 (histone H3 Lys36 trimethylation) modifications [12,13] which are markers for RNA Pol II activity. Such lncRNAs may be spliced, and show a tendency to be expressed in a low and tissue–speci# c manner, with many thought not to be

Figure 1. Two examples of mouse lncRNA loci whose transcripts’ sequences overlap protein–coding lociUCSC genes are shown in blue, with supporting mRNA sequence evidence in black and a conservation track across 30 vertebrate species is shown below. lncRNA loci are highlighted using yellow boxes. (a) Airn, an imprinted lncRNA locus. (b) Evf2, also known as Dlx6os. An antisense transcript Dlx6as is also apparent.

R.C. Pink and D.R.F. Carter 107

© 2013 Biochemical Society

Interestingly, the siRNAs generated did not always come from the pairing of a pseudogene RNA with a coding gene mRNA. Sometimes they were generated from the pairing of two pseudogenes (one transcribed in the sense direction and the other in the antisense), but the siRNA then represses the coding parent gene, such as in the case of HDAC1 (encoding a his-tone deacetylase enzyme) (Figure 2B) [34]. In other instances the siRNAs were generated from the internal pairing of di!erent regions within the same pseudogene transcript (i.e. from dou-ble-stranded regions formed by secondary structure folding). An example of the latter is the formation of hairpin loop structures in the Au76 pseudogene RNA, which are processed into siRNAs that repress expression of the homologous coding gene Rangap1 (encoding a protein that regulates G-coupled receptor signalling) (Figure 2C) [35]. Other organisms, including rice [36] and trypanosomes [37] have been shown to generate siRNAs from pseudogenes, which

Figure 2. Mechanisms of pseudogene functionality(A) Pseudogene RNA transcribed in the reverse (antisense) direction can combine with forward (sense) transcripts from the coding gene to produce dsRNA. This can inhibit translation of the coding RNA, or produce siRNAs that go into the RNAi pathway and cause the coding RNA to be degraded. siRNAs that destroy the coding transcript can also be generated by (B) pairing between sense and antisense transcribed pseudogenes and (C) double-stranded regions formed by secondary structure within a single pseudogene transcript. (D) Pseudogene transcripts may share binding sites for miRNAs or trans-acting proteins that regulate the stability of the mRNA. Increased levels of pseudogene transcripts can compete for these factors and therefore shield the coding transcripts from their effects.

bse0540103h.indd 107 2/23/2013 9:30:58 PM

103

© The Authors Journal compilation © 2013 Biochemical Society Essays Biochem. (2013) 54, 103–112: doi: 10.1042/BSE0540103 8Pseudogenes as regulators of biological functionRyan C. Pink and David R.F. Carter1

School of Life Sciences, Oxford Brookes University, Gipsy Lane, Headington, Oxford OX3 0BP, U.K.

AbstractA pseudogene arises when a gene loses the ability to produce a protein, which can be due to mutation or inaccurate duplication. Previous dogma has dictated that because the pseudogene no longer produces a protein that it becomes func-tionless and evolutionarily inert, being neither conserved or removed. However, recent evidence has forced a re-evaluation of this view. Some pseudogenes, although not translated into protein, are at least transcribed into RNA. In some cases, these pseudogene transcripts are capable of in!uencing the activity of other genes that code for proteins, thereby in!uencing expression and in turn affecting the phenotype of the organism. In the present chapter, we will de"ne pseudogenes, describe the evidence that they are transcribed into non-coding RNAs and outline the mechanisms by which they are able to influence the machinery of the eukaryotic cell.

Keywords:

non-coding RNA, pseudogene, RNA, transcription.

1To whom correspondence should be addressed (email [email protected]).

IntroductionA pseudogene is generally de!ned as a copy of a gene that has lost the capacity to produce a functional protein. "ey were !rst discovered in the 1970s when a copy of the 5S rRNA gene was found in Xenopus laevis with homology to the active gene, but with a clear truncation that rendered it non-functional [1]. Sporadic discovery and characterization of pseudogenes over the following 20 years has revealed a number of mechanisms for pseudogene formation [2]. Unitary pseudogenes are formed when spontaneous mutations occur in a coding gene that

bse0540103h.indd 103 2/23/2013 9:30:57 PM

103

© The Authors Journal compilation © 2013 Biochemical Society Essays Biochem. (2013) 54, 103–112: doi: 10.1042/BSE0540103 8Pseudogenes as regulators of biological functionRyan C. Pink and David R.F. Carter1

School of Life Sciences, Oxford Brookes University, Gipsy Lane, Headington, Oxford OX3 0BP, U.K.

AbstractA pseudogene arises when a gene loses the ability to produce a protein, which can be due to mutation or inaccurate duplication. Previous dogma has dictated that because the pseudogene no longer produces a protein that it becomes func-tionless and evolutionarily inert, being neither conserved or removed. However, recent evidence has forced a re-evaluation of this view. Some pseudogenes, although not translated into protein, are at least transcribed into RNA. In some cases, these pseudogene transcripts are capable of in!uencing the activity of other genes that code for proteins, thereby in!uencing expression and in turn affecting the phenotype of the organism. In the present chapter, we will de"ne pseudogenes, describe the evidence that they are transcribed into non-coding RNAs and outline the mechanisms by which they are able to influence the machinery of the eukaryotic cell.

Keywords:

non-coding RNA, pseudogene, RNA, transcription.

1To whom correspondence should be addressed (email [email protected]).

IntroductionA pseudogene is generally de!ned as a copy of a gene that has lost the capacity to produce a functional protein. "ey were !rst discovered in the 1970s when a copy of the 5S rRNA gene was found in Xenopus laevis with homology to the active gene, but with a clear truncation that rendered it non-functional [1]. Sporadic discovery and characterization of pseudogenes over the following 20 years has revealed a number of mechanisms for pseudogene formation [2]. Unitary pseudogenes are formed when spontaneous mutations occur in a coding gene that

bse0540103h.indd 103 2/23/2013 9:30:57 PM

Page 15: 13Non-Coding FNunes · Distribuição comparativa do tamanho dos genomas (barras vermelhas e valores no eixo vertical esquerdo) versus o número de genes (barras azuis e valores no

18/07/14

15

R.C. Pink and D.R.F. Carter 107

© 2013 Biochemical Society

Interestingly, the siRNAs generated did not always come from the pairing of a pseudogene RNA with a coding gene mRNA. Sometimes they were generated from the pairing of two pseudogenes (one transcribed in the sense direction and the other in the antisense), but the siRNA then represses the coding parent gene, such as in the case of HDAC1 (encoding a his-tone deacetylase enzyme) (Figure 2B) [34]. In other instances the siRNAs were generated from the internal pairing of di!erent regions within the same pseudogene transcript (i.e. from dou-ble-stranded regions formed by secondary structure folding). An example of the latter is the formation of hairpin loop structures in the Au76 pseudogene RNA, which are processed into siRNAs that repress expression of the homologous coding gene Rangap1 (encoding a protein that regulates G-coupled receptor signalling) (Figure 2C) [35]. Other organisms, including rice [36] and trypanosomes [37] have been shown to generate siRNAs from pseudogenes, which

Figure 2. Mechanisms of pseudogene functionality(A) Pseudogene RNA transcribed in the reverse (antisense) direction can combine with forward (sense) transcripts from the coding gene to produce dsRNA. This can inhibit translation of the coding RNA, or produce siRNAs that go into the RNAi pathway and cause the coding RNA to be degraded. siRNAs that destroy the coding transcript can also be generated by (B) pairing between sense and antisense transcribed pseudogenes and (C) double-stranded regions formed by secondary structure within a single pseudogene transcript. (D) Pseudogene transcripts may share binding sites for miRNAs or trans-acting proteins that regulate the stability of the mRNA. Increased levels of pseudogene transcripts can compete for these factors and therefore shield the coding transcripts from their effects.

bse0540103h.indd 107 2/23/2013 9:30:58 PM

R.C. Pink and D.R.F. Carter 107

© 2013 Biochemical Society

Interestingly, the siRNAs generated did not always come from the pairing of a pseudogene RNA with a coding gene mRNA. Sometimes they were generated from the pairing of two pseudogenes (one transcribed in the sense direction and the other in the antisense), but the siRNA then represses the coding parent gene, such as in the case of HDAC1 (encoding a his-tone deacetylase enzyme) (Figure 2B) [34]. In other instances the siRNAs were generated from the internal pairing of di!erent regions within the same pseudogene transcript (i.e. from dou-ble-stranded regions formed by secondary structure folding). An example of the latter is the formation of hairpin loop structures in the Au76 pseudogene RNA, which are processed into siRNAs that repress expression of the homologous coding gene Rangap1 (encoding a protein that regulates G-coupled receptor signalling) (Figure 2C) [35]. Other organisms, including rice [36] and trypanosomes [37] have been shown to generate siRNAs from pseudogenes, which

Figure 2. Mechanisms of pseudogene functionality(A) Pseudogene RNA transcribed in the reverse (antisense) direction can combine with forward (sense) transcripts from the coding gene to produce dsRNA. This can inhibit translation of the coding RNA, or produce siRNAs that go into the RNAi pathway and cause the coding RNA to be degraded. siRNAs that destroy the coding transcript can also be generated by (B) pairing between sense and antisense transcribed pseudogenes and (C) double-stranded regions formed by secondary structure within a single pseudogene transcript. (D) Pseudogene transcripts may share binding sites for miRNAs or trans-acting proteins that regulate the stability of the mRNA. Increased levels of pseudogene transcripts can compete for these factors and therefore shield the coding transcripts from their effects.

bse0540103h.indd 107 2/23/2013 9:30:58 PM

Page 16: 13Non-Coding FNunes · Distribuição comparativa do tamanho dos genomas (barras vermelhas e valores no eixo vertical esquerdo) versus o número de genes (barras azuis e valores no

18/07/14

16

SNC e SNP

Mesoderme e céls. germinativas

Constitutivos

Long

nc

RNA

s

FANTOM3 Consortium - 100 mil transcritos ncRNA maiores que 200nt -  A maioria deles:

* 5´CAP * cauda poliA * splicing alternativo * transcrito pela RNA pol II

-  ~1/5 sobrepõem regiões codificadoras na orientação inversa (NATs) -  Expressão de ncRNAs intergênicos ou intrônicos tendem a ser célula ou tecido-específica, além de responder a sinais ambientais

-  Em geral, ncRNAs tem expressão muito menor que dos mRNA (coerente com função reguladora)

-  Falta de conservação de ncRNA não indica ausência de função

M"%:"&$'()&$%<"=#"+,-#.+"/0&$

Page 17: 13Non-Coding FNunes · Distribuição comparativa do tamanho dos genomas (barras vermelhas e valores no eixo vertical esquerdo) versus o número de genes (barras azuis e valores no

18/07/14

17

!"#$%#"#&'"()#$%*+,-%.#(%$/#/%/012/--)"#%)#%'.#'/2%'/33-%

%%456%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%,7%8%92.#-3%*/-%5:45;<=5>?45@&4A:%

.#(%A &'.11/(B%C"7/%!#'*+,-%.2/%/D"3EF)"#.2&)3G%'"#-/2D/(%.#(%'.#%H/%/012/--/(%.F%3"I%3/D/3B%,% 3.2$/% 12"1"2F)"#% "J% !#'*+,-% K.-% K)$K3G% '"#&

-/2D/(% 12"0)7.3% 12"7"F/2% -/LE/#'/M% /0"#)'%-/LE/#'/-M% )#F2"#)'% -/LE/#'/-% "2% -/'"#(.2G%*+,% -F2E'FE2/-B% !#'*+,-% "2)$)#.F/% J2"7% )#&

9.H3/%4B%*+,%N3.--)J)'.F)"#-!N3.--! OE#'F)"#-!7*+,-! P#'"()#$%12"F/)#-!!"#$%&! %!4B%Q"E-/&R//1)#$%"2%CF2E'FE2.3%#'*+,-! %!

F*+,%=F2.#-J/2%*+,>! 7*+,%F2.#-3.F)"#!2*+,%=2)H"-"7.3%*+,>! 7*+,%F2.#-3.F)"#!-#"*+,%=-7.33%#E'3/"3.2%*+,->! 2*+,%7"()J)'.F)"#!-#*+,%=-7.33%#E'3/.2%*+,%)#'3E()#$%-13)'/"-"7.3%*+,->!

%*+,%-13)')#$M%1"3G.(/#G3.F)"#!!

5B%*/$E3.F"2G%#'*+,-! %!CK"2F%#'*+,! %!7)*+,%=55&5S#F>! T/$2.(.F)"#%"J%7*+,%"2%2/12/--)"#%"J%F2.#-3.F)"#!1)*+,%=5U&S4#FM%1)I)&)#F/2.'F)#$%*+,>! C)3/#')#$%"J%F2.#-1"-"#-!

V/()E7%#'*+,%=A:&5::#F>! %!1.*+,%=12"7"F/2&.--"').F/(%#'*+,>! W/#/%2/12/--)"#%'!("'&(D).%)#F/2.'F)#$%I)FK%X*N5!

!"#$%=3.2$/>%#'*+,%=Y5::#F>! %!Z#F/2$/#)'%#'*+,!Z#F2"#)'%#'*+,![9*%!#'*+,!

P1)$/#/F)'%2/$E3.F"2-%"J%F2.#-'2)1F)"#%'!("'&\'!()*+!&(,-'.%%,-'.%%

,#F)-/#-/%F2.#-F'2)1F! 7*+,%-F.H)3)FG%"J%)F-%K"7"3"$"E-%'"()#$%$/#/%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%X-/E("$/#/%F2.#-'2)1F! W/#/2.F)"#%"J%+,9-%"2%'/*+,-M%-F.H)3)].F)"#%"J%)F-%'"(&

)#$%F2.#-'2)1F%HG%'"71/F)F)D/%H)#()#$%7)*+,-!P#K.#'/2&3)R/%#'*+,%=/*+,>! ,'F)D.F)"#%"J%12"7"F/2%.'F)D)FG%HG%E#R#"I#%7/'K.#)-7!V)F"'K"#(2).3%#'*+,! N/33%'G'3/%2/$E3.F)"#%.#(%7"2/%E#R#"I#%JE#'F)"#-!*/1/.F&.--"').F/(%#'*+,! */$E3.F)"#%"J%2/1/.F%-)3/#')#$!C.F/33)F/%#'*+,! Z#D"3D/7/#F%"J%J"27.F)"#%.#(%JE#'F)"#%"J%'/#F2"7/2/&

.--"').F/(%'"713/0/-!

!

O)$E2/%4B%^2)$)#-%"J%!#'*+,B%,22"I-%2/12/-/#F%()JJ/2/#F%FG1/-%"J%!#'*+,%F2.#-'2)1F-B%%

!

!

!"#$%&'(#)%"**+,-"* #,-* .,%/-* ,'01"* 2-"%0-* .13* &-#-$40)"-&5* )#* .13* 1* 3'$6$)3-* #,1#* #,-$-* 1$-* %"/7*18%'#* 9:5:::49;5:::* 6$%#-)"4(%&)"2* 2-"-35*$-6$-3-"#)"2* /-33* #,1"*9<*%=*.,%/-*,'01"*2-4"%0->* !#*.13*,1$&* #%* )012)"-* #,1#*0%3#*%=* #,-*2-"%0)(*3-?'-"(-3*1$-* @'"A*BCD*3)"(-*3)06/-*%$21")303* 3'(,* 13* !"#$#%&'()* +,()-#.)$/,"*1"&* 0),-#"&)12'/'$* ,(,.)-$* ,1E-* 1* E-$7* (/%3-*"'08-$* %=* 6$%#-)"4(%&)"2* 2-"-3>* F,-* /)0)#-&*"'08-$*%=*6$%#-)"4(%&)"2*2-"-3*(1""%#*-G6/1)"*#,-*&-E-/%60-"#1/*1"&*6,73)%/%2)(1/*(%06/-G)#7*%=*,'01"3>*+)#,*#,-*$16)&*&-E-/%60-"#*)"*,)2,4#,$%'2,6'#* 3-?'-"()"2* #-(,"%/%2)-3* 3'(,* 13*&--6* 3-?'-"()"2* 1"&* .,%/-* 2-"%0-* ,)2,4&-"3)#7* #)/)"2* 1$$175* )#* )3* "%.* A"%."* #,1#* 18%'#*HI<*%=*#,-*J@'"AK*BCD3*1$-*#$1"3($)8-&*13*"%"4(%&)"2*LCD3*M"(LCD3N* )"(/'&)"2*3,%$#*1"&* /%"2*"(LCD3*OP5*9Q>***D* /1$2-* E1$)-#7* %=* "(LCD3* (1"* 8-* &)E)&-&* )"#%*#.%* (/133-3R* 3#$'(#'$1/* 1"&* $-2'/1#%$7* "(LCD3*MF18/-* PN>* S#$'(#'$1/* "(LCD3* )"(/'&-* #LCD5*

$LCD5* 1"&* 3"%LCD>* T13-&* %"* "(LCD* /-"2#,5*$-2'/1#%$7*"(LCD*(1"*8-*='$#,-$*&)E)&-&* )"#%*1#*/-13#* #,$--* 2$%'63R* MPN* S,%$#* "(LCD* )"(/'&)"2*U)($%LCD* M0)LCDN* M9949V* "#3N* 1"&* 6).)4)"#-$1(#)"2*LCD*M6)LCDN*M9W4VP*"#3NX*M9N*0-&)'0*"(LCD*M;:49::*"#3NX*MVN*/%"2*"(LCD*MY9::*"#3N>*F,)3* $-E)-.* .)//* =%('3* %"* 0-&)'0* 1"&* /%"2*"(LCD35* .,)(,* 1$-* (%//-(#)E-/7* $-=-$$-&* #%* 13*/1$2-*%$*/%"2*"(LCD3*MZ"(LCD3N>*T13-&*%"*/1$2-43(1/-*3-?'-"()"2*1"&*6$-&)(#)%"* =$%0*(,$%014#)"43#1#-* 0163* %=* ='//* /-"2#,* (BCD* /)8$1$)-3* )"*[DCF\U9* 1"&* V* 13* .-//* 13* ,'01"* #$1"3($)64#%0-35*0%$-*#,1"*]5W::*Z"(LCD3*)"*0%'3-*1"&*%E-$*V5V::*Z"(LCD3*)"*,'01"*,1E-*8--"*)&-"4#)=)-&* .)#,* 1* #%#1/* %=* 166$%G)01#-/7* 9V5:::*Z"(LCD3*)"*1*01001/)1"*2-"%0-*OV4^Q>***T)%2-"-3)3*%=*Z"(LCD3* )3*?')#-*(%06/)(1#-&>* !"*2-"-$1/5*Z"(LCD*#$1"3($)6#)%"*1"&*6$%(-33)"2*)3*E-$7* 3)0)/1$* #%* 6$%#-)"4(%&)"2* LCD>* U%3#* %=*Z"(LCD3* 1$-* #$1"3($)8-&* 87* LCD* 6%/70-$13-*MLCD_N* !!5* 8'#* 3%0-* Z"(LCD3* ,1E-* 8--"* $-46%$#-&* #%* 8-* #$1"3($)8-&* 87* LCD_* !!!5* 1"&* #,-*01@%$)#7*%=*Z"(LCD3*1$-*36/)(-&5*6%/71&-"7/1#-&*

D0*`*F$1"3/*L-3*9:P9X]M9NRP9^4P;:*...>1@#$>%$2*a!SSCRPH]V4IP]PaD`FLP9:9::P**

3,4',5*6"/'7(,*Z%"2*"%"4(%&)"2*LCD3R*E-$31#)/-*013#-$*$-2'/1#%$3*%=*2-"-*-G6$-33)%"*1"&*($'()1/*6/17-$3*)"*(1"(-$**Z-)*C)-P5*b3)"24`'*+'95*`'"24U1%*b3'P5*S,),4S,)"*c,1"2P5*D&10*U*Z1T1==P5*c,)14+-)*Z)P5*d1"*+1"2P5*`-"")=-$*Z>*b3'P595]5*U)-"4c,)-*b'"2P595V5]**8!,%)"/+,-/*#9*:#(,7;()"*)-2*0,((;()"*<-7#(#.=>*?&,*@-'4,"$'/=*#9*?,A)$*:!*6-2,"$#-*0)-7,"*0,-/,">*8B8B*C#(D7#+1,*E#;(,4)"2>*C#;$/#->*?F*GGHIHJ*K0,-/,"*9#"*:#(,7;()"*:,2'7'-,>*0&'-)*:,2'7)(*@-'4,"$'/=*C#$%'/)(>*?)'7&;-.>*?)'5)-J*IL")2;)/,*M-$/'/;/,*#9*0)-7,"*E'#(#.=>*0#((,.,*#9*:,2'7'-,>*0&'-)*:,2'7)(*@-'4,"$'/=>*?)'7&;-.>*?)'5)-J*N6$')*@-'4,"$'/=>*?)'7&;-.>*?)'5)-**L-(-)E-&*[-8$'1$7*V5*9:P9X*1((-6#-&*[-8$'1$7*PW5*9:P9X*e6'8*D6$)/*I5*9:P9X*_'8/)3,-&*D6$)/*V:5*9:P9**D83#$1(#R*+)#,* $16)&* &-E-/%60-"#* %=* 3-?'-"()"2* #-(,"%/%2)-3* 3'(,* 13* &--6* 3-?'-"()"2* 1"&*.,%/-* 2-"%0-* ,)2,4&-"3)#7* #)/)"2*1$$175*.-*"%.*A"%.* #,1#*0%3#*%=* #,-*J@'"AK*2-"%0)(*3-?'-"(-3*1$-* #$1"3($)8-&*13*"%"4(%&)"2*LCD3*M"(LCD3N>* D* /1$2-* "'08-$* %=* /%"2* "(LCD* #$1"3($)6#3* MY* 9::86N* ,1E-* 8--"* )&-"#)=)-&5* 1"&* #,-3-* /%"2* "(LCD3*MZ"(LCD3N*1$-*=%'"&*#%*8-*($'()1/*$-2'/1#%$3*=%$*-6)2-"-#)(*0%&'/1#)%"5*#$1"3($)6#)%"5*1"&*#$1"3/1#)%">*!"*#,)3*$-E)-.5*.-*8$)-=/7* 3'001$)f-* #,-* $-2'/1#%$7* ='"(#)%"*%=* Z"(LCD3*.)#,*1*61$#)('/1$* =%('3*%"* #,-*'"&-$/7)"2*0-(,1")303*%=*Z"(LCD3*)"*%"(%2-"-3)35*#'0%$*0-#13#13)3*1"&**3'66$-33)%">**g-7.%$&3R*Z%"2*"%"4(%&)"2*LCD*MZ"(LCDN5*-6)2-"-#)(* $-2'/1#)%"5*(%06-#)#)E-*-"&%2-"%'3*LCD*M(-LCDN5*%"(%2-")(*/"(LCD5*63-'&%2-"-*#$1"3($)6#5*"1#'$1/*1"#)3-"3-*LCD*MCDFN**

MINI REVIEW ARTICLEpublished: 09 January 2012

doi: 10.3389/fgene.2011.00107

The long non-coding RNAs: a new (p)layer in the“dark matter”Thomas Derrien1*, Roderic Guigó1,2 and Rory Johnson1

1 Bioinformatics and Genomics, Centre for Genomic Regulation, Universitat Pompeu Fabra, Barcelona, Spain2 Departament de Ciències Experimentals i de la Salut, Universitat Pompeu Fabra, Barcelona, Spain

Edited by:Philipp Kapranov, St. LaurentInstitute, USA

Reviewed by:Yohei Kirino, Cedars-Sinai MedicalCenter, USAChris Ponting, MRC FunctionalGenomics Unit, UK

*Correspondence:Thomas Derrien, Bioinformatics andGenomics Group, Centre for GenomicRegulation, Biomedical Research Parkof Barcelona, C. Dr. Aiguader 88,Barcelona 08003, Spain.e-mail: [email protected]

The transcriptome of a cell is represented by a myriad of different RNA molecules with andwithout protein-coding capacities. In recent years, advances in sequencing technologieshave allowed researchers to more fully appreciate the complexity of whole transcriptomes,showing that the vast majority of the genome is transcribed, producing a diverse populationof non-protein coding RNAs (ncRNAs).Thus, the biological significance of non-coding RNAs(ncRNAs) have been largely underestimated. Amongst these multiple classes of ncRNAs,the long non-coding RNAs (lncRNAs) are apparently the most numerous and functionallydiverse. A small but growing number of lncRNAs have been experimentally studied, anda view is emerging that these are key regulators of epigenetic gene regulation in mam-malian cells. LncRNAs have already been implicated in human diseases such as cancer andneurodegeneration, highlighting the importance of this emergent field. In this article, wereview the catalogs of annotated lncRNAs and the latest advances in our understandingof lncRNAs.

Keywords: non-coding RNAs, regulation, long non-coding RNA, epigenetics

THE CELL, AN RNA-DEPENDENT MACHINERYSome of the most fundamental cellular processes rely onanciently conserved non-coding RNAs (ncRNAs). These include,for instance, the ribosomal RNAs which are assembled togetherto constitute ribosomes, the factories for translation of messen-ger RNAs (mRNAs) into proteins. Other ancient roles of ncRNAsinclude the transport of amino acids through ribosomes via thetransfer RNAs (tRNAs) or the splicing of introns of pre-mRNAwhich is mediated in part by the snRNAs (small nuclear RNAs).More recently, the crucial role of ncRNA in post-transcriptionalgene regulation has been highlighted by the discovery of microR-NAs (miRNAs), which repress gene expression by targeting semi-complementary motifs in target mRNAs (Lee et al., 1993). Manyadditional classes of ncRNAs have been discovered in the lastdecade reinforcing the view that they are of central importancein the functioning of cells from all the branches of life (Amaralet al., 2008).

Amongst the various ncRNA classes, we know probably leastabout the long non-coding RNAs (lncRNAs). In particular, whatis the total number of lncRNAs in mammalian genomes? Whereare they localized? What is their significance in the context of evo-lution, and particularly in the evolution of complex processing inprimate brains? Now that good catalogs of lncRNAs have becomeavailable, the most critical question is to address the functional-ity of these transcripts. This question is particularly acute giventhat we have no a priori methods for the prediction of lncRNAfunction based on sequence alone, in contrast to proteins whereconfident inferences on protein function can be made by simplyanalysis of the amino acid sequence. Given the sheer number ofnew unexplored lncRNA transcripts (!15,000 at last count; Der-rien et al., submitted), the field must move forward to address this

question of function by using large-scale functional screens. Suchmoves are already underway, with groups such as Eric Lander’scarrying out siRNA screens (Guttman et al., 2011). Large-scaleanalysis of protein-binding partners will also add another layerof valuable information to such annotation of lncRNA catalogs.Hopefully, advances in bioinformatic annotation of RNA struc-tures (Torarinsson et al., 2006; Parker et al., 2011), and methodsto predict functions based on this, will be developed. In this way,we might build up a richly annotated catalog of lncRNAs withfunctional predictions, that will enable us to integrate them intoexisting knowledge of the cell, and infer possible roles in humandiseases.

Cis AND trans FUNCTIONS FOR lncRNAsUntil recently, only a handful of lncRNAs have been described inthe literature. One of the earliest examples was XIST, a 19 kb non-protein-coding transcript which is responsible for the inactivationof one of the two X chromosome in placental females throughDNA methylation (Brockdorff et al., 1992). Others examples oflncRNAs located in imprinted regions, such as Airn (Sleutels et al.,2002; Nagano et al., 2008), H19 (Gabory et al., 2009), NESPAS(Wroe et al., 2000), or Kcnq1ot1 (Mancini-Dinardo et al., 2006;Mohammad et al., 2010) are involved in the inactivation of geneexpression via specific associations with chromatin-modifyingcomplexes. More recently, the HOTAIR lncRNA was shown toepigenetically repress the HOXD locus via the recruitment of thePRC2 complex (Rinn et al., 2007). Strikingly, this study describeda trans mechanism of action of a lncRNA located on humanChromosome 5 which modulates expression of multiple genesclustered on human Chromosome 4 (HOXD locus; Rinn et al.,2007). Supporting this hypothesis, two recent papers (Cabili et al.,

www.frontiersin.org January 2012 | Volume 2 | Article 107 | 1

MINI REVIEW ARTICLEpublished: 09 January 2012

doi: 10.3389/fgene.2011.00107

The long non-coding RNAs: a new (p)layer in the“dark matter”Thomas Derrien1*, Roderic Guigó1,2 and Rory Johnson1

1 Bioinformatics and Genomics, Centre for Genomic Regulation, Universitat Pompeu Fabra, Barcelona, Spain2 Departament de Ciències Experimentals i de la Salut, Universitat Pompeu Fabra, Barcelona, Spain

Edited by:Philipp Kapranov, St. LaurentInstitute, USA

Reviewed by:Yohei Kirino, Cedars-Sinai MedicalCenter, USAChris Ponting, MRC FunctionalGenomics Unit, UK

*Correspondence:Thomas Derrien, Bioinformatics andGenomics Group, Centre for GenomicRegulation, Biomedical Research Parkof Barcelona, C. Dr. Aiguader 88,Barcelona 08003, Spain.e-mail: [email protected]

The transcriptome of a cell is represented by a myriad of different RNA molecules with andwithout protein-coding capacities. In recent years, advances in sequencing technologieshave allowed researchers to more fully appreciate the complexity of whole transcriptomes,showing that the vast majority of the genome is transcribed, producing a diverse populationof non-protein coding RNAs (ncRNAs).Thus, the biological significance of non-coding RNAs(ncRNAs) have been largely underestimated. Amongst these multiple classes of ncRNAs,the long non-coding RNAs (lncRNAs) are apparently the most numerous and functionallydiverse. A small but growing number of lncRNAs have been experimentally studied, anda view is emerging that these are key regulators of epigenetic gene regulation in mam-malian cells. LncRNAs have already been implicated in human diseases such as cancer andneurodegeneration, highlighting the importance of this emergent field. In this article, wereview the catalogs of annotated lncRNAs and the latest advances in our understandingof lncRNAs.

Keywords: non-coding RNAs, regulation, long non-coding RNA, epigenetics

THE CELL, AN RNA-DEPENDENT MACHINERYSome of the most fundamental cellular processes rely onanciently conserved non-coding RNAs (ncRNAs). These include,for instance, the ribosomal RNAs which are assembled togetherto constitute ribosomes, the factories for translation of messen-ger RNAs (mRNAs) into proteins. Other ancient roles of ncRNAsinclude the transport of amino acids through ribosomes via thetransfer RNAs (tRNAs) or the splicing of introns of pre-mRNAwhich is mediated in part by the snRNAs (small nuclear RNAs).More recently, the crucial role of ncRNA in post-transcriptionalgene regulation has been highlighted by the discovery of microR-NAs (miRNAs), which repress gene expression by targeting semi-complementary motifs in target mRNAs (Lee et al., 1993). Manyadditional classes of ncRNAs have been discovered in the lastdecade reinforcing the view that they are of central importancein the functioning of cells from all the branches of life (Amaralet al., 2008).

Amongst the various ncRNA classes, we know probably leastabout the long non-coding RNAs (lncRNAs). In particular, whatis the total number of lncRNAs in mammalian genomes? Whereare they localized? What is their significance in the context of evo-lution, and particularly in the evolution of complex processing inprimate brains? Now that good catalogs of lncRNAs have becomeavailable, the most critical question is to address the functional-ity of these transcripts. This question is particularly acute giventhat we have no a priori methods for the prediction of lncRNAfunction based on sequence alone, in contrast to proteins whereconfident inferences on protein function can be made by simplyanalysis of the amino acid sequence. Given the sheer number ofnew unexplored lncRNA transcripts (!15,000 at last count; Der-rien et al., submitted), the field must move forward to address this

question of function by using large-scale functional screens. Suchmoves are already underway, with groups such as Eric Lander’scarrying out siRNA screens (Guttman et al., 2011). Large-scaleanalysis of protein-binding partners will also add another layerof valuable information to such annotation of lncRNA catalogs.Hopefully, advances in bioinformatic annotation of RNA struc-tures (Torarinsson et al., 2006; Parker et al., 2011), and methodsto predict functions based on this, will be developed. In this way,we might build up a richly annotated catalog of lncRNAs withfunctional predictions, that will enable us to integrate them intoexisting knowledge of the cell, and infer possible roles in humandiseases.

Cis AND trans FUNCTIONS FOR lncRNAsUntil recently, only a handful of lncRNAs have been described inthe literature. One of the earliest examples was XIST, a 19 kb non-protein-coding transcript which is responsible for the inactivationof one of the two X chromosome in placental females throughDNA methylation (Brockdorff et al., 1992). Others examples oflncRNAs located in imprinted regions, such as Airn (Sleutels et al.,2002; Nagano et al., 2008), H19 (Gabory et al., 2009), NESPAS(Wroe et al., 2000), or Kcnq1ot1 (Mancini-Dinardo et al., 2006;Mohammad et al., 2010) are involved in the inactivation of geneexpression via specific associations with chromatin-modifyingcomplexes. More recently, the HOTAIR lncRNA was shown toepigenetically repress the HOXD locus via the recruitment of thePRC2 complex (Rinn et al., 2007). Strikingly, this study describeda trans mechanism of action of a lncRNA located on humanChromosome 5 which modulates expression of multiple genesclustered on human Chromosome 4 (HOXD locus; Rinn et al.,2007). Supporting this hypothesis, two recent papers (Cabili et al.,

www.frontiersin.org January 2012 | Volume 2 | Article 107 | 1

MINI REVIEW ARTICLEpublished: 09 January 2012

doi: 10.3389/fgene.2011.00107

The long non-coding RNAs: a new (p)layer in the“dark matter”Thomas Derrien1*, Roderic Guigó1,2 and Rory Johnson1

1 Bioinformatics and Genomics, Centre for Genomic Regulation, Universitat Pompeu Fabra, Barcelona, Spain2 Departament de Ciències Experimentals i de la Salut, Universitat Pompeu Fabra, Barcelona, Spain

Edited by:Philipp Kapranov, St. LaurentInstitute, USA

Reviewed by:Yohei Kirino, Cedars-Sinai MedicalCenter, USAChris Ponting, MRC FunctionalGenomics Unit, UK

*Correspondence:Thomas Derrien, Bioinformatics andGenomics Group, Centre for GenomicRegulation, Biomedical Research Parkof Barcelona, C. Dr. Aiguader 88,Barcelona 08003, Spain.e-mail: [email protected]

The transcriptome of a cell is represented by a myriad of different RNA molecules with andwithout protein-coding capacities. In recent years, advances in sequencing technologieshave allowed researchers to more fully appreciate the complexity of whole transcriptomes,showing that the vast majority of the genome is transcribed, producing a diverse populationof non-protein coding RNAs (ncRNAs).Thus, the biological significance of non-coding RNAs(ncRNAs) have been largely underestimated. Amongst these multiple classes of ncRNAs,the long non-coding RNAs (lncRNAs) are apparently the most numerous and functionallydiverse. A small but growing number of lncRNAs have been experimentally studied, anda view is emerging that these are key regulators of epigenetic gene regulation in mam-malian cells. LncRNAs have already been implicated in human diseases such as cancer andneurodegeneration, highlighting the importance of this emergent field. In this article, wereview the catalogs of annotated lncRNAs and the latest advances in our understandingof lncRNAs.

Keywords: non-coding RNAs, regulation, long non-coding RNA, epigenetics

THE CELL, AN RNA-DEPENDENT MACHINERYSome of the most fundamental cellular processes rely onanciently conserved non-coding RNAs (ncRNAs). These include,for instance, the ribosomal RNAs which are assembled togetherto constitute ribosomes, the factories for translation of messen-ger RNAs (mRNAs) into proteins. Other ancient roles of ncRNAsinclude the transport of amino acids through ribosomes via thetransfer RNAs (tRNAs) or the splicing of introns of pre-mRNAwhich is mediated in part by the snRNAs (small nuclear RNAs).More recently, the crucial role of ncRNA in post-transcriptionalgene regulation has been highlighted by the discovery of microR-NAs (miRNAs), which repress gene expression by targeting semi-complementary motifs in target mRNAs (Lee et al., 1993). Manyadditional classes of ncRNAs have been discovered in the lastdecade reinforcing the view that they are of central importancein the functioning of cells from all the branches of life (Amaralet al., 2008).

Amongst the various ncRNA classes, we know probably leastabout the long non-coding RNAs (lncRNAs). In particular, whatis the total number of lncRNAs in mammalian genomes? Whereare they localized? What is their significance in the context of evo-lution, and particularly in the evolution of complex processing inprimate brains? Now that good catalogs of lncRNAs have becomeavailable, the most critical question is to address the functional-ity of these transcripts. This question is particularly acute giventhat we have no a priori methods for the prediction of lncRNAfunction based on sequence alone, in contrast to proteins whereconfident inferences on protein function can be made by simplyanalysis of the amino acid sequence. Given the sheer number ofnew unexplored lncRNA transcripts (!15,000 at last count; Der-rien et al., submitted), the field must move forward to address this

question of function by using large-scale functional screens. Suchmoves are already underway, with groups such as Eric Lander’scarrying out siRNA screens (Guttman et al., 2011). Large-scaleanalysis of protein-binding partners will also add another layerof valuable information to such annotation of lncRNA catalogs.Hopefully, advances in bioinformatic annotation of RNA struc-tures (Torarinsson et al., 2006; Parker et al., 2011), and methodsto predict functions based on this, will be developed. In this way,we might build up a richly annotated catalog of lncRNAs withfunctional predictions, that will enable us to integrate them intoexisting knowledge of the cell, and infer possible roles in humandiseases.

Cis AND trans FUNCTIONS FOR lncRNAsUntil recently, only a handful of lncRNAs have been described inthe literature. One of the earliest examples was XIST, a 19 kb non-protein-coding transcript which is responsible for the inactivationof one of the two X chromosome in placental females throughDNA methylation (Brockdorff et al., 1992). Others examples oflncRNAs located in imprinted regions, such as Airn (Sleutels et al.,2002; Nagano et al., 2008), H19 (Gabory et al., 2009), NESPAS(Wroe et al., 2000), or Kcnq1ot1 (Mancini-Dinardo et al., 2006;Mohammad et al., 2010) are involved in the inactivation of geneexpression via specific associations with chromatin-modifyingcomplexes. More recently, the HOTAIR lncRNA was shown toepigenetically repress the HOXD locus via the recruitment of thePRC2 complex (Rinn et al., 2007). Strikingly, this study describeda trans mechanism of action of a lncRNA located on humanChromosome 5 which modulates expression of multiple genesclustered on human Chromosome 4 (HOXD locus; Rinn et al.,2007). Supporting this hypothesis, two recent papers (Cabili et al.,

www.frontiersin.org January 2012 | Volume 2 | Article 107 | 1

Derrien et al. lncRNAs: a new player of the transcription

the genome. Approximately one third (Derrien et al., submitted)to one half (Jia et al., 2010) of lncRNAs overlap protein-codingloci in some way – “genic” lncRNAs. It seems therefore essential toannotate lncRNAs both in intergenic and coding regions since (i)the exact boundaries of protein-coding genes is frequently subjectto variations and reannotations (Denoeud et al., 2007; Gingeras,2007) and thus could lead to the revision of a lincRNAs into abona-fide lncRNAs, (ii) thousands of protein-coding genes har-bor natural antisense transcripts belonging to the lncRNAs class(He et al., 2008; iii) numerous functional genic lncRNAs over-lapping protein-coding genes have been experimentally validated,especially in disease states (Faghihi et al., 2008; Pasmant et al.,2011; Wapinski and Chang, 2011). A recent catalog of both genicand intergenic lncRNAs has been released based on genome-widecomputational approach combined with intensive manual anno-tation. This led to the identification oh 6,736 lncRNA genes inhuman (Jia et al., 2010) among which 63% are localized within orin a close proximity (<10 kb) of known protein coding genes (Jiaet al., 2010).

THE GENCODE CATALOG OF HUMAN lncRNAsMost recently, the GENCODE annotation group has producedthe most comprehensive, high-quality human lncRNA annota-tion to date. In order to identify all evidence-based functionalgene features in the human genome, the GENCODE group (Har-row et al., 2006) within the ENCODE framework (ENCyclopediaOf DNA Elements; ENCODE Project Consortium et al., 2007)provides a high-quality collection of lncRNAs. GENCODE anno-tation involves manual curation, multiple computational analysis,and targeted experimental approaches, all together representingcomplementary methodologies for the complete identification ofall human functional elements (coding and non-coding genes). Atpresent, the GENCODE collection (Version 7) comprises 14,880lncRNA transcripts arising from 9,277 distinct gene loci (Derrienet al., submitted).

In a recent study, we investigated whether these lncRNAs areunder negative evolutionary selection, indicative of functionality(Derrien et al., submitted). Evolutionary scores were computedbased both on the phastCons program (Siepel et al., 2005) andcustom BLAST alignments within mammals in order to mea-sure the conservation profiles of GENCODE lncRNAs in com-parison with protein-coding transcripts and ancestral repeats(ARs), the latter representing a good proxy for measuring neu-trally evolving sequences (Ponjavic et al., 2007). Overall, lncR-NAs show moderate sequence conservation compared to cod-ing transcripts. This lower sequence conservation may reflectthe fact that functional RNA structures are more robust in theface of sequence mutations and insertions–deletions (indels),compared to the higher constraints inherent of protein-codingopen reading frames. Nevertheless, lncRNAs and more especially,

their promoters, showed statistically significant, non-randomconservation, strongly suggesting a functional role for these ncR-NAs. Interestingly, about one third of the 15,000 lncRNAs dis-play a primate-specific pattern of conservation (Derrien et al.,submitted).

Using whole transcriptome sequencing (RNAseq) of 16 humancell lines produced in the framework of the ENCODE consortium(ENCODE Project Consortium et al., 2007) and 16 tissues fromthe Human Body Map project (www.illumina.com), we showedthat 94% of the GENCODE lncRNAs transcripts are expressedin at least one of these tissue/cell line studied. Strikingly, thelevel of expression of polyA+ lncRNAs is !10–20 times lowerthan protein-coding transcripts reinforcing the need to use deepsequencing based technologies to identify these low expressed non-coding loci (Figure 1.). We also demonstrated that lncRNAs tendto be enriched in nucleus in comparison with mRNAs; this latterobservation being consistent with the idea that many lncRNAs maybe devoted to gene regulation in the nucleus. Finally, the questionis raised as to whether lincRNAs could encode very small peptidesas shown by Ingolia et al. (2011). However, there is still conflictingevidence about this hypothesis since a recent study which usedcomprehensive mass spectrometry data (MS) produced as partof the ENCODE project only found about a hundred of GEN-CODE lncRNA to be matched by small peptides (Banfai et al.,submitted).

CONCLUSIONOver the past decade, the estimation of the proportion of “func-tional DNA” in the human genome has been constantly revisedupward (Ponting and Hardison, 2011).

We now know that the human genome contains thousands oflncRNAs, both genic and intergenic. This new class of non-proteincoding RNAs (ncRNAs) lack functional ORFs, are modestly con-served and seem to negatively and positively regulate protein cod-ing gene expression, in cis and trans. Diverse mechanisms of actionhave been observed (see for reviews Ponting et al., 2009; Naganoand Fraser, 2011) suggesting that lncRNAs are a fundamental regu-lators of transcription. The classification of lncRNAs remains diffi-cult, and we presently have only a vague idea of what sub-categoriesexist, and how we might use experimental or sequence informa-tion to distinguish between such categories. With the ongoing andincreasing number of RNAseq experiments characterizing tran-scriptomes of multiples cell lines and human tissues (in particularwithin the ENCODE consortium), it is likely that the number ofannotated lncRNAs will increase dramatically in the near future.Future studies will likely focus on identifying functional lncRNAs,and those involved in human disease processes.

ACKNOWLEDGMENTSWe would like to thank reviewers for the helpful comments.

REFERENCESAmaral, P. P., Dinger, M. E., Mercer,

T. R., and Mattick, J. S. (2008).The Eukaryotic genome as an RNAmachine. Science 319, 1787–1789.

Amaral, P. P., Michael, B. C., Dennis,K. G., Marcel, E. D., and John, S.

M. (2011). lncRNAdb: a referencedatabase for long noncoding RNAs.Nucleic Acids Res. 39, D146–D151.

Brockdorff, N, Ashworth, A., Kay, G. F.,McCabe,V. M., Norris, D. P., Cooper,P. J., Swift, S., and Rastan, S. (1992).The product of the mouse Xist gene

is a 15 kb inactive X-specific tran-script containing no conserved ORFand located in the nucleus. Cell 71,515–526.

Cabili, M. N., Trapnell, C., Goff,L., Koziol, M., Tazon-Vega, B.,Regev, A., and Rinn, J. L. (2011).

Integrative annotation of humanlarge intergenic noncoding RNAsreveals global properties andspecific subclasses. Genes Dev.doi:10.1101/gad.17446611

Carninci, P., Kasukawa, T., Katayama, S.,Gough, J., Frith, M. C., Maeda, N.,

Frontiers in Genetics | Non-Coding RNA January 2012 | Volume 2 | Article 107 | 4

Derrien et al. lncRNAs: a new player of the transcription

Table 1 | Description of human lncRNAs published catalogs.

References Number oflncRNA elements

LncRNAs classesconsidered

Type of annotation PolyA type Experimental evidence

Khalil et al. (2009) !3,300 Intergenic Bioinformatic predictions PolyA+ (ChiPSeq) + expression arrayJia et al. (2010). 6,736 Genic + intergenic Bioinformatic predictions +

manual curationPolyA+ Full-length cDNAs

Kapranov et al. (2010) 580 Intergenic Bioinformatic predictions PolyA+ PolyA! Single-molecule sequencing(SMS) Helicos

Ørom et al. (2010) 3,019 Intergenic Manual curation Mainly polyA+ cDNA/ESTs + RNAseqCabili et al. (2011) 8,263 Intergenic Bioinformatic predictions +

manual curationPolyA+ (ChiPSeq) + RNAseq

Derrien et al. (submitted) 9,277 Genic + intergenic Manual curation PolyA+ PolyA! (ChiPSeq) + cDNA/ESTs +RNAseq + CAGE/diTAG

FIGURE 1 | Proportion of GENCODE polyA+ LncRNAs and protein coding at the gene (n = 9,277 and 18,063; respectively) and transcript levels withincreasing thresholds of expression values (RPKM) in ENCODE RNASeq experiments.

studies revealed that “dark matter” transcription may representthe majority of the total (non-ribosomal and non-mitochondrial)RNA of a cell. In addition, it shed light on a new class of verylong ncRNAs (min size !50 kb), abundantly expressed and local-ized in intergenic regions of the genome, the so-called vlincRNAs(very long intergenic ncRNAs). Focusing on the total RNA of acell rather than the highly selected polyA+ transcripts seems to

complement the latest catalog of lincRNAs (Cabili et al., 2011)since only 40% of these vlincRNAs overlap the lincRNA genes.We also recently showed that the GENCODE lncRNA set tends tohave higher PolyA! representation compared to protein-codingmRNAs (Derrien et al., submitted). Although many studies haveconcentrated on the intergenic lncRNAs (the lincRNAs), this seri-ously underestimates the true number of lncRNA transcripts in

www.frontiersin.org January 2012 | Volume 2 | Article 107 | 3

Derrien et al. lncRNAs: a new player of the transcription

Table 1 | Description of human lncRNAs published catalogs.

References Number oflncRNA elements

LncRNAs classesconsidered

Type of annotation PolyA type Experimental evidence

Khalil et al. (2009) !3,300 Intergenic Bioinformatic predictions PolyA+ (ChiPSeq) + expression arrayJia et al. (2010). 6,736 Genic + intergenic Bioinformatic predictions +

manual curationPolyA+ Full-length cDNAs

Kapranov et al. (2010) 580 Intergenic Bioinformatic predictions PolyA+ PolyA! Single-molecule sequencing(SMS) Helicos

Ørom et al. (2010) 3,019 Intergenic Manual curation Mainly polyA+ cDNA/ESTs + RNAseqCabili et al. (2011) 8,263 Intergenic Bioinformatic predictions +

manual curationPolyA+ (ChiPSeq) + RNAseq

Derrien et al. (submitted) 9,277 Genic + intergenic Manual curation PolyA+ PolyA! (ChiPSeq) + cDNA/ESTs +RNAseq + CAGE/diTAG

FIGURE 1 | Proportion of GENCODE polyA+ LncRNAs and protein coding at the gene (n = 9,277 and 18,063; respectively) and transcript levels withincreasing thresholds of expression values (RPKM) in ENCODE RNASeq experiments.

studies revealed that “dark matter” transcription may representthe majority of the total (non-ribosomal and non-mitochondrial)RNA of a cell. In addition, it shed light on a new class of verylong ncRNAs (min size !50 kb), abundantly expressed and local-ized in intergenic regions of the genome, the so-called vlincRNAs(very long intergenic ncRNAs). Focusing on the total RNA of acell rather than the highly selected polyA+ transcripts seems to

complement the latest catalog of lincRNAs (Cabili et al., 2011)since only 40% of these vlincRNAs overlap the lincRNA genes.We also recently showed that the GENCODE lncRNA set tends tohave higher PolyA! representation compared to protein-codingmRNAs (Derrien et al., submitted). Although many studies haveconcentrated on the intergenic lncRNAs (the lincRNAs), this seri-ously underestimates the true number of lncRNA transcripts in

www.frontiersin.org January 2012 | Volume 2 | Article 107 | 3

Page 18: 13Non-Coding FNunes · Distribuição comparativa do tamanho dos genomas (barras vermelhas e valores no eixo vertical esquerdo) versus o número de genes (barras azuis e valores no

18/07/14

18

Due to the high similarity of rhesus PWS region sno-lncRNAs with human in the genomic context, we rea-soned that they might function similarly as well. Wethus scanned the rhesus sno-lncRNA sequence for Foxbinding motifs, and identified an enrichment of Foxbinding sites (Figure 5B), further indicating that rhesusPWS region sno-lncRNAs might also interact with Fox

family splicing regulators and play a similar role in spli-cing regulation. On the other hand, the absence of PWSregion sno-lncRNAs in mouse indicated that a similarregulation mechanism is absent in mouse.In sum, PWS region sno-lncRNAs are highly expressed

in human and rhesus, but are absent in mouse. The ab-sence of PWS region sno-lncRNAs in mouse also suggests

Figure 4 Unique expression of PWS region sno-lncRNAs across species. Normalized read densities of poly(A)!/ribo! RNA-seq (red) and poly(A)+ RNA-seq (black) from human (A), rhesus (B) and mouse (C) were shown from UCSC genome browser with customer bigwig inputs.Confident sno-lncRNA signals were detected from human and rhesus ESCs, but not from mouse ESCs. Note there is no rhesus Refseq annotationin this regions, transcripts from de novo assembly (black lines) and transposed annotations from other species (blue lines) are shown.

Zhang et al. BMC Genomics 2014, 15:287 Page 7 of 15http://www.biomedcentral.com/1471-2164/15/287

Human ESC H9

Long ncRNA

Zhang et al 2014 – BMC Genomics

serine/arginine-rich proteins which are involved in theregulation and selection of the splice sites in pre-mRNAs.By regulating the phosphorylation of SR proteins,MALAT1 may thus regulate the cellular levels of activeSR proteins and subsequently the splicing of manypre-mRNAs (68). Finally, it remains to be determined ifother lncRNAs may also play a role in alternativesplicing. Performing biochemical assays such as RNAco-immunoprecipitation followed by deep sequencing ofRNAs (RIP-seq) on proteins that are known to beinvolved in alternative splicingmay lead to the identificationof other lncRNAs with functions similar to MALAT1.

HOW DO LNCRNAS EXERT THEIR EFFECTS?

Thus far lncRNAs have been implicated in numerousbiological functions and pathways, and their mechanismsof actions are very diverse (Figure 2) (7,11). Here we willdiscuss several mechanisms by which lncRNAs exert theireffects, although it is worth noting that in every case muchremains to be learned about the detailed mechanism ofaction.

lncRNAs ‘guide’ chromatin-modifying complexes tospecific genomic loci in cis and in trans

Cellular identity in an organism is determined by epigeneticfactors that modulate specific gene expression programs(69). These epigenetic factors, such as chromatin-modifyingcomplexes and DNA methyltransferases, activate andrepress specific genes by enzymatically modifyingchromatin and DNA (70). One of the most puzzling ques-tions in biology is how do these ubiquitous enzymes, whichlack DNA binding capacity, recognize their target genes inthe various cell types. Emerging evidence suggest that somelncRNAs ‘guide’ chromatin-modifying complexes as wellas other nuclear proteins to specific genomic loci to exerttheir effects (14,71,72). In essence, some lncRNAs mayfunction as ‘GPS’ devices to target other cellularcomponents to their sites of action. Below, we will highlightseveral examples of lncRNAs that have been shown topossess such activity.As discussed previously, the lncRNA HOTAIR directs

the chromatin-modifying complexes PRC2 and LSD1 tonumerous gene loci on a genome-wide scale in trans(4,52–54). By contrast, other lncRNAs, such as

A

B

C

D

Figure 2. LncRNAs exert their effects by diverse mechanisms. (A) lncRNAs can act as guides and tethers for chromatin-modifying complexes, andthus contribute to tissue-specific gene expression. (B) lncRNAs can act as molecular scaffolds for protein complexes that lack protein–proteininteraction domains. (C) lncRNAs can bind to transcription factors and prevent them from binding to their target DNA sequence. (D) lncRNAscan interact directly with microRNAs (miRNAs) and prevent them from binding to mRNA, thus regulating protein synthesis.

Nucleic Acids Research, 2012, Vol. 40, No. 14 6395

at UFSC

ar - Universidade Federal de Sao C

arlos on July 15, 2014http://nar.oxfordjournals.org/

Dow

nloaded from

SURVEY AND SUMMARY

Emerging functional and mechanistic paradigmsof mammalian long non-coding RNAsVictoria A. Moran1,2,3, Ranjan J. Perera4 and Ahmad M. Khalil1,2,3,5,*

1Center for RNA Molecular Biology, 2Department of Genetics and Genome Sciences, 3Department ofBiochemistry, Case Western Reserve University School of Medicine, Cleveland, OH, 4Sanford-Burnham MedicalResearch Institute, Orlando, FL and 5Case Comprehensive Cancer Center, Case Western Reserve UniversitySchool of Medicine, Cleveland, OH 44106, USA

Received February 14, 2012; Revised March 15, 2012; Accepted March 20, 2012

ABSTRACT

The recent discovery that the human and other mam-malian genomes produce thousands of long non-coding RNAs (lncRNAs) raises many fascinatingquestions. These mRNA-like molecules, which lacksignificant protein-coding capacity, have beenimplicated in a wide range of biological functionsthrough diverse and as yet poorly understoodmolecular mechanisms. Despite some recentinsights into how lncRNAs function in such diversecellular processes as regulation of gene expressionand assembly of cellular structures, by and large, thekey questions regarding lncRNA mechanisms remainto be answered. In this review, we discuss recentadvances in understanding the biology of lncRNAsand propose avenues of investigation that may leadto fundamental new insights into their functions andmechanisms of action. Finally, as numerous lncRNAsare dysregulated inhumandiseases anddisorders,wealso discuss potential roles for these molecules inhuman health.

INTRODUCTION

Recent advances in technologies, such as tiling arrays andRNA deep sequencing (RNA-seq), have made it possibleto survey the transcriptomes of many organisms to anunprecedented degree. Several studies utilizing thesetechnologies have unequivocally demonstrated that thegenomes of mammals, as well as other organisms,produce thousands of long transcripts that have nosignificant protein-coding capacity and thus are referredto as long (or large) non-coding RNAs (lncRNAs) (1–6).LncRNAs are strikingly similar to mRNAs: they are RNApolymerase II transcripts that are capped, spliced and

polyadenylated, yet do not function as templates forprotein synthesis (7).Although a functional lncRNA known as Xist was

discovered and characterized in the early 1990s (8–10),the prevailing view until recently was that such transcriptsare rare and only a handful of functional lncRNAs arerepresented in the genome. However, numerous publica-tions in the past several years have now documented im-portant functions for lncRNAs, affecting many biologicalprocesses, including regulation of gene expression, dosagecompensation, genomic imprinting, nuclear organizationand compartmentalization, and nuclear-cytoplasmictrafficking (7,11–14). It is very likely that additional func-tions for lncRNAs will be discovered, as only a smallpercentage of lncRNAs have been studied in detail todate. Furthermore, there are a number of studies thathave shown many lncRNAs are dysregulated in varioushuman diseases and disorders, although it is not yet clearif these lncRNAs are causal or symptomatic of the diseasestate (15,16).In this review, we will discuss a number of important

topics regarding lncRNAs: (i) How many functionallncRNAs are transcribed in mammals?; (ii) Known biolo-gical functions of lncRNAs; (iii) How do lncRNAs exerttheir effects? And finally, (iv) What are the potential rolesof lncRNAs in human disease? Although in this review wewill focus on mammalian lncRNAs, it is important topoint out that lncRNAs are being actively investigatedin many other organisms (17,18).

HOW MANY FUNCTIONAL LNCRNAs ARE PRESENTIN MAMMALS?

Prior to advances in technologies that made it possible tosurvey transcriptomes in an unbiased manner and to amuch greater depth than previously possible, lncRNAswere discovered and characterized using traditional gene

*To whom correspondence should be addressed. Tel: +1 216 368 0710; Fax: +1 216 368 2010; Email: [email protected]

Published online 5 April 2012 Nucleic Acids Research, 2012, Vol. 40, No. 14 6391–6400doi:10.1093/nar/gks296

! The Author(s) 2012. Published by Oxford University Press.This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

at UFSC

ar - Universidade Federal de Sao C

arlos on July 15, 2014http://nar.oxfordjournals.org/

Dow

nloaded from

SURVEY AND SUMMARY

Emerging functional and mechanistic paradigmsof mammalian long non-coding RNAsVictoria A. Moran1,2,3, Ranjan J. Perera4 and Ahmad M. Khalil1,2,3,5,*

1Center for RNA Molecular Biology, 2Department of Genetics and Genome Sciences, 3Department ofBiochemistry, Case Western Reserve University School of Medicine, Cleveland, OH, 4Sanford-Burnham MedicalResearch Institute, Orlando, FL and 5Case Comprehensive Cancer Center, Case Western Reserve UniversitySchool of Medicine, Cleveland, OH 44106, USA

Received February 14, 2012; Revised March 15, 2012; Accepted March 20, 2012

ABSTRACT

The recent discovery that the human and other mam-malian genomes produce thousands of long non-coding RNAs (lncRNAs) raises many fascinatingquestions. These mRNA-like molecules, which lacksignificant protein-coding capacity, have beenimplicated in a wide range of biological functionsthrough diverse and as yet poorly understoodmolecular mechanisms. Despite some recentinsights into how lncRNAs function in such diversecellular processes as regulation of gene expressionand assembly of cellular structures, by and large, thekey questions regarding lncRNA mechanisms remainto be answered. In this review, we discuss recentadvances in understanding the biology of lncRNAsand propose avenues of investigation that may leadto fundamental new insights into their functions andmechanisms of action. Finally, as numerous lncRNAsare dysregulated inhumandiseases anddisorders,wealso discuss potential roles for these molecules inhuman health.

INTRODUCTION

Recent advances in technologies, such as tiling arrays andRNA deep sequencing (RNA-seq), have made it possibleto survey the transcriptomes of many organisms to anunprecedented degree. Several studies utilizing thesetechnologies have unequivocally demonstrated that thegenomes of mammals, as well as other organisms,produce thousands of long transcripts that have nosignificant protein-coding capacity and thus are referredto as long (or large) non-coding RNAs (lncRNAs) (1–6).LncRNAs are strikingly similar to mRNAs: they are RNApolymerase II transcripts that are capped, spliced and

polyadenylated, yet do not function as templates forprotein synthesis (7).Although a functional lncRNA known as Xist was

discovered and characterized in the early 1990s (8–10),the prevailing view until recently was that such transcriptsare rare and only a handful of functional lncRNAs arerepresented in the genome. However, numerous publica-tions in the past several years have now documented im-portant functions for lncRNAs, affecting many biologicalprocesses, including regulation of gene expression, dosagecompensation, genomic imprinting, nuclear organizationand compartmentalization, and nuclear-cytoplasmictrafficking (7,11–14). It is very likely that additional func-tions for lncRNAs will be discovered, as only a smallpercentage of lncRNAs have been studied in detail todate. Furthermore, there are a number of studies thathave shown many lncRNAs are dysregulated in varioushuman diseases and disorders, although it is not yet clearif these lncRNAs are causal or symptomatic of the diseasestate (15,16).In this review, we will discuss a number of important

topics regarding lncRNAs: (i) How many functionallncRNAs are transcribed in mammals?; (ii) Known biolo-gical functions of lncRNAs; (iii) How do lncRNAs exerttheir effects? And finally, (iv) What are the potential rolesof lncRNAs in human disease? Although in this review wewill focus on mammalian lncRNAs, it is important topoint out that lncRNAs are being actively investigatedin many other organisms (17,18).

HOW MANY FUNCTIONAL LNCRNAs ARE PRESENTIN MAMMALS?

Prior to advances in technologies that made it possible tosurvey transcriptomes in an unbiased manner and to amuch greater depth than previously possible, lncRNAswere discovered and characterized using traditional gene

*To whom correspondence should be addressed. Tel: +1 216 368 0710; Fax: +1 216 368 2010; Email: [email protected]

Published online 5 April 2012 Nucleic Acids Research, 2012, Vol. 40, No. 14 6391–6400doi:10.1093/nar/gks296

! The Author(s) 2012. Published by Oxford University Press.This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

at UFSCar - U

niversidade Federal de Sao Carlos on July 15, 2014http://nar.oxfordjournals.org/

Dow

nloaded from

Page 19: 13Non-Coding FNunes · Distribuição comparativa do tamanho dos genomas (barras vermelhas e valores no eixo vertical esquerdo) versus o número de genes (barras azuis e valores no

18/07/14

19

!"#$%&'(&)*&+,'!'-&.&#*/,

"!

#!

$%&'((

"!

!"

!##!$/0&"%'1$./#*2."0'0./(!3,

$##4%".,/%*5#*2.'21'0./(!3,

%##!$/0&"%'".6'/7#250",8*/'1$./#*2."0'0./(!3,

!& !' !(

"!

"!

"!

"!

#!

#!

#!

#!

#!

#!

#!

$%&'(()*+,

)*+,

-./,

$0-12

34/5$6

-./7

/(89 /(89/(89

"!

#!

$%&'((

+#6:;<+#6,=;<+#6:>?,+#6,=>?#9@A

B5CD#E

)@FG?4?HF<;&&I'JF&?4<?K

)@FG?4?HF<;&&I'>%KL&;H?K

&4</5E';J'@M%H?F4'K?<%I

)43;4<?M'H?H3?MF4G

"!

B5CD#E

N8B7 /)8D

9BON

<%/)8D

04P4%Q4

$/9,

$/9,

/)8D)+CD,

)+CD,

N8B79BON

#!

"!

04P4%Q4

#!

"!

04P4%Q4

#!

"!

B5CD7

&4</5E

&4</5E

&4</5E>/5E >/5E

&4</5E

R6(8/8-7

"!

#!

$%&'((

%"##&4</5E';J'>F/5E'K?<%I %&##&4</5E';J'HM;4J&;HF%4;&'M?GL&;H%M %'##&4</5E';J'J@&F<F4G'M?GL&;H%M

<%/)8D

!"#$!"%& '("$

!"| ADVANCE ONLINE PUBLICATION !"""#$%&'()#*+,-()./)"0-1)$)&/*0

© 2014 Macmillan Publishers Limited. All rights reserved

RNA has long been at the centre of molecu-lar biology and was likely the primordial molecule of life, encompassing both informational and catalytic functions. Its informational functions are thought to have subsequently devolved to the more stable and easily replicable DNA, and its catalytic functions to the more chemically versatile polypeptides1. The idea that the contemporary role of RNA is to function as the intermediary between the two had its roots in the early 1940s with the entry of chemists into the study of biology, notably Beadle and Tatum2, whose work under-pinned the one gene–one enzyme hypothesis (FIG.!1!(TIMELINE)). This idea later matured into the more familiar one gene–one protein concept and became widely accepted despite the prescient misgivings of experienced geneticists, notably McClintock3. The con-cept that genes encode only the functional components of cells (that is, the ‘enzymes’) itself had deeper roots in the mechanical zeitgeist of the era, which was decades before the widespread understanding of the use of digital information for systems control.

Although the one gene–one protein hypothesis has long been abandoned owing to the discovery of alternative splicing in the 1970s, the protein-centric view of molecular biology has persisted. Such persistence was aided by phenotypic and ascertainment bias towards protein-coding mutations in genetic studies and by the assumption that these

mutations affected cis-acting regulatory protein-binding sites4. However, this view was challenged by the discovery of nuclear introns and RNA interference (RNAi), as well as by the advent of high-throughput sequencing, which led to the identification of large numbers and different types of large and small RNAs, the functions of which are still under investigation.

helical structure of DNA in 1953 (REF.!5), the following years were preoccupied with deciphering the ‘genetic code’ and estab-lishing the mechanistic pathway between genes and proteins: the identification of a transitory template (mRNA), an adaptor (tRNA) and the ribosome ‘factory’ com-prised of ribosomal RNAs and proteins for translating the code into a polypeptide. In 1958, Crick published the celebrated cen-tral dogma to describe the flow of genetic information from DNA to RNA to protein, which has proved remarkably accurate and durable, including the prediction of reverse transcription6. Nonetheless, in conceptual terms, RNA was tacitly consigned to be the template and an infrastructural plat-form (with regard to rRNAs and tRNAs) for protein synthesis or has at least been interpreted in this way by most people since that!time.

In the mid-1950s, the link was established between rRNA (which is highly expressed in essentially all cells) and the structures termed ribosomes as the platform for protein synthesis7. The roles of tRNA and mRNA were experimentally confirmed in 1958 (REF.!8) and 1961 (REF.!9), respectively. The latter occurred in the same year that Jacob and Monod published their classic paper on the lac operon of Escherichia coli10, which was the first locus to be characterized at the molecular genetic level. These studies confirmed that at least some, but presumably most, genes encoded proteins and supported the emerging idea that gene expression is controlled by regulating the transcription of the gene, as indicated by the locus encoding the lac repressor in the repressor– operator model. At the time, Jacob and Monod did not know the chemical identity of the repressor and speculated in passing that it “may be a polyribonucleotide” (that is, RNA)10. However, Gilbert later showed that the repressor is a polypeptide that allosteri-cally binds to the lactose substrate, and the brief idea faded11.

These studies reinforced and extended the concept that proteins are not only enzymes but also the primary analogue components and control factors that con-stitute the cellular machinery. This, in turn, has led to the prevailing transcription factor

! "#$% "&$

The rise of regulatory RNAKevin V.!Morris and John S.!Mattick

!"#$%&'$()(*+#',-.%+.#(,-.%($/.(0&#$(1.'&1.(0,%$.21(&(0&%&1+34(#/+5$(+2(4,6.'76&%("+,6,389(:-+1.2'.(#733.#$#($/&$(;<!(+#(2,$(,268(572'$+,2&6(&#(&(4.##.23.%(".$=..2(*<!(&21(0%,$.+2("7$(&6#,(+2-,6-.1(+2($/.(%.376&$+,2(,5(3.2,4.(,%3&2+>&$+,2(&21(3.2.(.?0%.##+,2@(=/+'/(+#(+2'%.&#+2368(.6&",%&$.(+2(',406.?(,%3&2+#4#9(;.376&$,%8(;<!(#..4#($,(,0.%&$.(&$(4&28(6.-.6#A(+2(0&%$+'76&%@(+$(06&8#(&2(+40,%$&2$(0&%$(+2($/.(.0+3.2.$+'(0%,'.##.#($/&$(',2$%,6(1+55.%.2$+&$+,2(&21(1.-.6,04.2$9(B/.#.(1+#',-.%+.#(#733.#$(&('.2$%&6(%,6.(5,%(;<!(+2(/74&2(.-,67$+,2(&21(,2$,3.289(C.%.@(=.(%.-+.=($/.(.4.%3.2'.(,5($/.(0%.-+,7#68(72#7#0.'$.1(=,%61(,5(%.376&$,%8(;<!(5%,4(&(/+#$,%+'&6(0.%#0.'$+-.9

In this Timeline article, we examine the history of, and report the shift in thinking that is still underway about, the role of RNA in cell and developmental biology, especially in animals. The emerging evidence suggests that there are more genes encod-ing regulatory RNAs than those encoding proteins in the human genome, and that the amount and type of gene regulation in com-plex organisms have been substantially mis-understood for most of the past 50!years.

$'()*+,-.'/+01(+23.+(1).+10+4&5RNA — the central dogma and gene regu-lation. After the elucidation of the double

emerging evidence suggests … that the amount and type of gene regulation in complex organisms have been substantially misunderstood

D:;ED:FBGH:E

NATURE REVIEWS | !"#"$%&'( ADVANCE ONLINE PUBLICATION | 6

Nature Reviews Genetics | AOP, published online 29 April 2014; doi:10.1038/nrg3722

© 2014 Macmillan Publishers Limited. All rights reserved

RNA has long been at the centre of molecu-lar biology and was likely the primordial molecule of life, encompassing both informational and catalytic functions. Its informational functions are thought to have subsequently devolved to the more stable and easily replicable DNA, and its catalytic functions to the more chemically versatile polypeptides1. The idea that the contemporary role of RNA is to function as the intermediary between the two had its roots in the early 1940s with the entry of chemists into the study of biology, notably Beadle and Tatum2, whose work under-pinned the one gene–one enzyme hypothesis (FIG.!1!(TIMELINE)). This idea later matured into the more familiar one gene–one protein concept and became widely accepted despite the prescient misgivings of experienced geneticists, notably McClintock3. The con-cept that genes encode only the functional components of cells (that is, the ‘enzymes’) itself had deeper roots in the mechanical zeitgeist of the era, which was decades before the widespread understanding of the use of digital information for systems control.

Although the one gene–one protein hypothesis has long been abandoned owing to the discovery of alternative splicing in the 1970s, the protein-centric view of molecular biology has persisted. Such persistence was aided by phenotypic and ascertainment bias towards protein-coding mutations in genetic studies and by the assumption that these

mutations affected cis-acting regulatory protein-binding sites4. However, this view was challenged by the discovery of nuclear introns and RNA interference (RNAi), as well as by the advent of high-throughput sequencing, which led to the identification of large numbers and different types of large and small RNAs, the functions of which are still under investigation.

helical structure of DNA in 1953 (REF.!5), the following years were preoccupied with deciphering the ‘genetic code’ and estab-lishing the mechanistic pathway between genes and proteins: the identification of a transitory template (mRNA), an adaptor (tRNA) and the ribosome ‘factory’ com-prised of ribosomal RNAs and proteins for translating the code into a polypeptide. In 1958, Crick published the celebrated cen-tral dogma to describe the flow of genetic information from DNA to RNA to protein, which has proved remarkably accurate and durable, including the prediction of reverse transcription6. Nonetheless, in conceptual terms, RNA was tacitly consigned to be the template and an infrastructural plat-form (with regard to rRNAs and tRNAs) for protein synthesis or has at least been interpreted in this way by most people since that!time.

In the mid-1950s, the link was established between rRNA (which is highly expressed in essentially all cells) and the structures termed ribosomes as the platform for protein synthesis7. The roles of tRNA and mRNA were experimentally confirmed in 1958 (REF.!8) and 1961 (REF.!9), respectively. The latter occurred in the same year that Jacob and Monod published their classic paper on the lac operon of Escherichia coli10, which was the first locus to be characterized at the molecular genetic level. These studies confirmed that at least some, but presumably most, genes encoded proteins and supported the emerging idea that gene expression is controlled by regulating the transcription of the gene, as indicated by the locus encoding the lac repressor in the repressor– operator model. At the time, Jacob and Monod did not know the chemical identity of the repressor and speculated in passing that it “may be a polyribonucleotide” (that is, RNA)10. However, Gilbert later showed that the repressor is a polypeptide that allosteri-cally binds to the lactose substrate, and the brief idea faded11.

These studies reinforced and extended the concept that proteins are not only enzymes but also the primary analogue components and control factors that con-stitute the cellular machinery. This, in turn, has led to the prevailing transcription factor

! "#$% "&$

The rise of regulatory RNAKevin V.!Morris and John S.!Mattick

!"#$%&'$()(*+#',-.%+.#(,-.%($/.(0&#$(1.'&1.(0,%$.21(&(0&%&1+34(#/+5$(+2(4,6.'76&%("+,6,389(:-+1.2'.(#733.#$#($/&$(;<!(+#(2,$(,268(572'$+,2&6(&#(&(4.##.23.%(".$=..2(*<!(&21(0%,$.+2("7$(&6#,(+2-,6-.1(+2($/.(%.376&$+,2(,5(3.2,4.(,%3&2+>&$+,2(&21(3.2.(.?0%.##+,2@(=/+'/(+#(+2'%.&#+2368(.6&",%&$.(+2(',406.?(,%3&2+#4#9(;.376&$,%8(;<!(#..4#($,(,0.%&$.(&$(4&28(6.-.6#A(+2(0&%$+'76&%@(+$(06&8#(&2(+40,%$&2$(0&%$(+2($/.(.0+3.2.$+'(0%,'.##.#($/&$(',2$%,6(1+55.%.2$+&$+,2(&21(1.-.6,04.2$9(B/.#.(1+#',-.%+.#(#733.#$(&('.2$%&6(%,6.(5,%(;<!(+2(/74&2(.-,67$+,2(&21(,2$,3.289(C.%.@(=.(%.-+.=($/.(.4.%3.2'.(,5($/.(0%.-+,7#68(72#7#0.'$.1(=,%61(,5(%.376&$,%8(;<!(5%,4(&(/+#$,%+'&6(0.%#0.'$+-.9

In this Timeline article, we examine the history of, and report the shift in thinking that is still underway about, the role of RNA in cell and developmental biology, especially in animals. The emerging evidence suggests that there are more genes encod-ing regulatory RNAs than those encoding proteins in the human genome, and that the amount and type of gene regulation in com-plex organisms have been substantially mis-understood for most of the past 50!years.

$'()*+,-.'/+01(+23.+(1).+10+4&5RNA — the central dogma and gene regu-lation. After the elucidation of the double

emerging evidence suggests … that the amount and type of gene regulation in complex organisms have been substantially misunderstood

D:;ED:FBGH:E

NATURE REVIEWS | !"#"$%&'( ADVANCE ONLINE PUBLICATION | 6

Nature Reviews Genetics | AOP, published online 29 April 2014; doi:10.1038/nrg3722

© 2014 Macmillan Publishers Limited. All rights reserved

!"#$%#"#&'"()#$%*+,-%.#(%$/#/%/012/--)"#%)#%'.#'/2%'/33-%

%%456%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%,7%8%92.#-3%*/-%6:46;<=6>?46@&4A:%

*+,% 9*,B% =C.$$)#$% .#(% 2/'"D/2E% "F% .--"').C/(%12"C/)#->% C/'G#)HI/J% ,)2%K.-% F"I#(% C"% /#D/3"1/%CG/% L3'66.5% 3"')% )#% '"22/3.C)"#% K)CG% M5NO% G)-&C"#/% 7/CGE3C2.#-F/2.-/% =PO.>% .#(% C2.#-'2)1&C)"#.3% 2/12/--)"#% Q6<RS%P/#/C)'% .T3.C)"#% "F%PO.%FI2CG/2% 3/.(-% C"% #"#)712)#C/(% .#(% T).33/3)'% /0&12/--)"#%"F%L3'66.5J%-I$$/-C)#$%CG.C%,)2%2/'2I)C-%PO.% C"% CG/% L3'66.5% 12"7"C/2J% 2/7"(/3-% CG/%'G2"7.C)#% -C2I'CI2/J% .#(% -)3/#'/-% 7"#".33/3)'%/012/--)"#% Q6:J% 6AJ% 6@RS% ,3CG"I$G% CG/% /012/-&-)"#%.#(%FI#'C)"#%"F%,)2%!#'*+,%G.D/%T//#%)(/#&C)F)/(%)#%CG/%3.-C%(/'.(/J%CG/%I#(/23E)#$%7/'G.&#)-7% CG2"I$G% KG)'G% CG/% 'G2"7.C)#% 2/7"(/3%'"713/0%7/().C/-% .% ()2/'C% '"#C.'C%K)CG% CG/%,)2%!#'*+,%2/7.)#-%I#'3/.2S%%%%MU9,V*% =MUW% ,#C)-/#-/% V#C/2$/#)'% *+,>% )-% 3"&'.C/(% .C% CG/%T"I#(.2E% "F% CK"% ().7/C2)'.3% 'G2"&7.C)#% ("7.)#-% )#% CG/% !"#$% 3"'I-S% MU9,V*% )-%C2.#-'2)T/(% .#C)-/#-/% C"% CG/% '.#"#)'.3% !"#$%$/#/-%K)CG%.%-)X/%"F%64AY&#C-S%9G)-%C2.#-'2)1C%)-%-13)'/(% .#(% 1"3E.(/#E3.C/(S% MU9,V*% ()-C.33E%

2/$I3.C/-% CG/%'G2"7"-"7.3%("7.)#% %&' ()*&+%"#%MUWZ%3"'I-%Q6YRS%*/'/#C%-CI(E%2/D/.3/(%CG.C%CG/%A % ("7.)#% "F% MU9,V*% 1GE-)'.33E% )#C/2.'C-% K)CG%B*[6% 7/CGE3.-/% .#(% )#'2/.-/-% )C-% .'C)D)CEJ%KG)'G% F.')3)C.C/-% G)-C"#/%M5% 3E-)#/&6@% C2)7/CGE&3.C)"#%"#% CG/%MUWZ% 3"'I-%.#(% 2/-I3C-% )#% -)3/#'&)#$%"F%CG/%MUWZ%$/#/S%*+,)&7/().C/(%(/13/C)"#%"F% MU9,V*% (2.7.C)'.33E% )#(I'/-% CG/% C2.#-'2)1&C)"#.3% .'C)D.C)"#% )#% MUWZ% 3"'I-% K)CG% CG/% )#&'2/.-/(%C2.#-'2)1C-%"F%MUWZYJ%MUWZOJ%MUWZ4:J%.#(%MUWZ44%Q6YRS%9G/%5 %("7.)#%"F%MU9,V*%G.-%2/'/#C3E% T//#% F"I#(% C"% T)#(% K)CG% (/7/CGE3.-/%!LZ4%.#(%)-%2/HI)2/(%F"2%)C-%"''I1.#'E%"#%'G2"&7.C)#%.#(%#"27.3% FI#'C)"#%Q4ARS%9G/2/F"2/J%MU&9,V*%12"D)(/-%.%13.CF"27%F"2%.C%3/.-C%CK"%()-C)#'C%G)-C"#/% 7"()F)'.C)"#% '"713/0/-?% .% A &("7.)#%CG.C%T)#(-%B*[6%.#(%.%5 &("7.)#%CG.C% )#C/2.'C-%K)CG% CG/% !LZ4\["*]L9\*]L9% '"713/0S% 9/CG/2&)#$% CK"% ()-C)#'C% 'G2"7.C)#&2/7"(/3)#$% '"7&13/0/-% TE% CG/% MU9,V*% -'.FF"3(% /#.T3/-% *+,&7/().C/(% .--/7T3E% "F% B*[6% .#(% !LZ4J% .#(%CG/2/F"2/J% '""2()#.C/-% 2/'2I)C7/#C-% "F% T"CG%

^)$I2/% 5S% !#'*+,% )#% C2.#-'2)1&C)"#.3% 2/$I3.C)"#S% ,S% 92.#-'2)1&C)"#.3% .'C)D.C)"#S% !#'*+,% C2.#&-'2)T/(% F2"7% CG/% I3C2.'"#-/2D/(%2/$)"#%"F% CG/%$/#/%]DF6% FI#'C)"#-%.-%'".'C)D.C"2%"F%.%G"7/"("7.)#%12"C/)#% Z306J% F.')3)C.C)#$% CG/% Z30_%$/#/% C2.#-'2)1C)"#S% `S% 92.#-'2)1&C)"#.3% -I112/--)"#S% 9"1J% !#'*+,%[[+Z4% C2.#-'2)T/(% F2"7% CG/% I1&-C2/.7% 2/$)"#% "F% CG/% [E'3)#% Z4%$/#/% 2/'2I)C-% CG/% *+,&T)#()#$%12"C/)#J% C/27/(% C2.#-3"'.C/(&)#&3)1"-.2'"7.%=9!L>J%.33"-C/2)'.33E%C"%7"(I3.C/% M,9% .'C)D)C)/-% "F% [*]`&T)#()#$% 12"C/)#% =[`B>% .#(% 15::J%2/-I3C)#$% )#% )#G)T)C)"#% "F% CG/% $/#/%/012/--)"#S%`"CC"7J% !#'*+,% C2.#&-'2)T/(% F2"7% CG/% ()GE(2"F"3.C/%2/(I'C.-/%=ZM^*>%7)#"2%12"7"C/2%'.#% F"27% C2)13/0% .C% )C-% 7.a"2% 12"&7"C/2% C"% 12/D/#C% CG/% T)#()#$% "F%$/#/2.3% C2.#-'2)1C)"#% F.'C"2-%-I'G%.-% 9^VVZJ% .#(% -IT-/HI/#C3EJ% -)&3/#'/%CG/%/012/--)"#%"F%ZM^*S%%

!

!

!

!"#$%&'(#)%"**+,-"* #,-* .,%/-* ,'01"* 2-"%0-* .13* &-#-$40)"-&5* )#* .13* 1* 3'$6$)3-* #,1#* #,-$-* 1$-* %"/7*18%'#* 9:5:::49;5:::* 6$%#-)"4(%&)"2* 2-"-35*$-6$-3-"#)"2* /-33* #,1"*9<*%=*.,%/-*,'01"*2-4"%0->* !#*.13*,1$&* #%* )012)"-* #,1#*0%3#*%=* #,-*2-"%0)(*3-?'-"(-3*1$-* @'"A*BCD*3)"(-*3)06/-*%$21")303* 3'(,* 13* !"#$#%&'()* +,()-#.)$/,"*1"&* 0),-#"&)12'/'$* ,(,.)-$* ,1E-* 1* E-$7* (/%3-*"'08-$* %=* 6$%#-)"4(%&)"2* 2-"-3>* F,-* /)0)#-&*"'08-$*%=*6$%#-)"4(%&)"2*2-"-3*(1""%#*-G6/1)"*#,-*&-E-/%60-"#1/*1"&*6,73)%/%2)(1/*(%06/-G)#7*%=*,'01"3>*+)#,*#,-*$16)&*&-E-/%60-"#*)"*,)2,4#,$%'2,6'#* 3-?'-"()"2* #-(,"%/%2)-3* 3'(,* 13*&--6* 3-?'-"()"2* 1"&* .,%/-* 2-"%0-* ,)2,4&-"3)#7* #)/)"2* 1$$175* )#* )3* "%.* A"%."* #,1#* 18%'#*HI<*%=*#,-*J@'"AK*BCD3*1$-*#$1"3($)8-&*13*"%"4(%&)"2*LCD3*M"(LCD3N* )"(/'&)"2*3,%$#*1"&* /%"2*"(LCD3*OP5*9Q>***D* /1$2-* E1$)-#7* %=* "(LCD3* (1"* 8-* &)E)&-&* )"#%*#.%* (/133-3R* 3#$'(#'$1/* 1"&* $-2'/1#%$7* "(LCD3*MF18/-* PN>* S#$'(#'$1/* "(LCD3* )"(/'&-* #LCD5*

$LCD5* 1"&* 3"%LCD>* T13-&* %"* "(LCD* /-"2#,5*$-2'/1#%$7*"(LCD*(1"*8-*='$#,-$*&)E)&-&* )"#%*1#*/-13#* #,$--* 2$%'63R* MPN* S,%$#* "(LCD* )"(/'&)"2*U)($%LCD* M0)LCDN* M9949V* "#3N* 1"&* 6).)4)"#-$1(#)"2*LCD*M6)LCDN*M9W4VP*"#3NX*M9N*0-&)'0*"(LCD*M;:49::*"#3NX*MVN*/%"2*"(LCD*MY9::*"#3N>*F,)3* $-E)-.* .)//* =%('3* %"* 0-&)'0* 1"&* /%"2*"(LCD35* .,)(,* 1$-* (%//-(#)E-/7* $-=-$$-&* #%* 13*/1$2-*%$*/%"2*"(LCD3*MZ"(LCD3N>*T13-&*%"*/1$2-43(1/-*3-?'-"()"2*1"&*6$-&)(#)%"* =$%0*(,$%014#)"43#1#-* 0163* %=* ='//* /-"2#,* (BCD* /)8$1$)-3* )"*[DCF\U9* 1"&* V* 13* .-//* 13* ,'01"* #$1"3($)64#%0-35*0%$-*#,1"*]5W::*Z"(LCD3*)"*0%'3-*1"&*%E-$*V5V::*Z"(LCD3*)"*,'01"*,1E-*8--"*)&-"4#)=)-&* .)#,* 1* #%#1/* %=* 166$%G)01#-/7* 9V5:::*Z"(LCD3*)"*1*01001/)1"*2-"%0-*OV4^Q>***T)%2-"-3)3*%=*Z"(LCD3* )3*?')#-*(%06/)(1#-&>* !"*2-"-$1/5*Z"(LCD*#$1"3($)6#)%"*1"&*6$%(-33)"2*)3*E-$7* 3)0)/1$* #%* 6$%#-)"4(%&)"2* LCD>* U%3#* %=*Z"(LCD3* 1$-* #$1"3($)8-&* 87* LCD* 6%/70-$13-*MLCD_N* !!5* 8'#* 3%0-* Z"(LCD3* ,1E-* 8--"* $-46%$#-&* #%* 8-* #$1"3($)8-&* 87* LCD_* !!!5* 1"&* #,-*01@%$)#7*%=*Z"(LCD3*1$-*36/)(-&5*6%/71&-"7/1#-&*

D0*`*F$1"3/*L-3*9:P9X]M9NRP9^4P;:*...>1@#$>%$2*a!SSCRPH]V4IP]PaD`FLP9:9::P**

3,4',5*6"/'7(,*Z%"2*"%"4(%&)"2*LCD3R*E-$31#)/-*013#-$*$-2'/1#%$3*%=*2-"-*-G6$-33)%"*1"&*($'()1/*6/17-$3*)"*(1"(-$**Z-)*C)-P5*b3)"24`'*+'95*`'"24U1%*b3'P5*S,),4S,)"*c,1"2P5*D&10*U*Z1T1==P5*c,)14+-)*Z)P5*d1"*+1"2P5*`-"")=-$*Z>*b3'P595]5*U)-"4c,)-*b'"2P595V5]**8!,%)"/+,-/*#9*:#(,7;()"*)-2*0,((;()"*<-7#(#.=>*?&,*@-'4,"$'/=*#9*?,A)$*:!*6-2,"$#-*0)-7,"*0,-/,">*8B8B*C#(D7#+1,*E#;(,4)"2>*C#;$/#->*?F*GGHIHJ*K0,-/,"*9#"*:#(,7;()"*:,2'7'-,>*0&'-)*:,2'7)(*@-'4,"$'/=*C#$%'/)(>*?)'7&;-.>*?)'5)-J*IL")2;)/,*M-$/'/;/,*#9*0)-7,"*E'#(#.=>*0#((,.,*#9*:,2'7'-,>*0&'-)*:,2'7)(*@-'4,"$'/=>*?)'7&;-.>*?)'5)-J*N6$')*@-'4,"$'/=>*?)'7&;-.>*?)'5)-**L-(-)E-&*[-8$'1$7*V5*9:P9X*1((-6#-&*[-8$'1$7*PW5*9:P9X*e6'8*D6$)/*I5*9:P9X*_'8/)3,-&*D6$)/*V:5*9:P9**D83#$1(#R*+)#,* $16)&* &-E-/%60-"#* %=* 3-?'-"()"2* #-(,"%/%2)-3* 3'(,* 13* &--6* 3-?'-"()"2* 1"&*.,%/-* 2-"%0-* ,)2,4&-"3)#7* #)/)"2*1$$175*.-*"%.*A"%.* #,1#*0%3#*%=* #,-*J@'"AK*2-"%0)(*3-?'-"(-3*1$-* #$1"3($)8-&*13*"%"4(%&)"2*LCD3*M"(LCD3N>* D* /1$2-* "'08-$* %=* /%"2* "(LCD* #$1"3($)6#3* MY* 9::86N* ,1E-* 8--"* )&-"#)=)-&5* 1"&* #,-3-* /%"2* "(LCD3*MZ"(LCD3N*1$-*=%'"&*#%*8-*($'()1/*$-2'/1#%$3*=%$*-6)2-"-#)(*0%&'/1#)%"5*#$1"3($)6#)%"5*1"&*#$1"3/1#)%">*!"*#,)3*$-E)-.5*.-*8$)-=/7* 3'001$)f-* #,-* $-2'/1#%$7* ='"(#)%"*%=* Z"(LCD3*.)#,*1*61$#)('/1$* =%('3*%"* #,-*'"&-$/7)"2*0-(,1")303*%=*Z"(LCD3*)"*%"(%2-"-3)35*#'0%$*0-#13#13)3*1"&**3'66$-33)%">**g-7.%$&3R*Z%"2*"%"4(%&)"2*LCD*MZ"(LCDN5*-6)2-"-#)(* $-2'/1#)%"5*(%06-#)#)E-*-"&%2-"%'3*LCD*M(-LCDN5*%"(%2-")(*/"(LCD5*63-'&%2-"-*#$1"3($)6#5*"1#'$1/*1"#)3-"3-*LCD*MCDFN**

Page 20: 13Non-Coding FNunes · Distribuição comparativa do tamanho dos genomas (barras vermelhas e valores no eixo vertical esquerdo) versus o número de genes (barras azuis e valores no

18/07/14

20

!"#$%#"#&'"()#$%*+,-%.#(%$/#/%/012/--)"#%)#%'.#'/2%'/33-%

%%454%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%,6%7%82.#-3%*/-%9:49;<=9>?49@&4A:%

(B'C)"#%.#(%D5E9@%C2)6/CFG3.C)"#%"H%I%'F2"6"&-"6/J% .#(% K#"'K&("L#% "H% M*N9% -BOB#)C% P/(%.#(%PQD9%'"612"6)-/(%I)-C% B12/$B3.C)"#% R4STU%V#%.(()C)"#J%E.#/K"%/C%.3U%.3-"%-F"L/(%CF.C% 3)K/%DW8,V*J%*/1,%O)#(-% C"% CF/%+&C/26)#B-%'"#C.)#&)#$%8F2%5<A%"H%PQD9%12"C/)#J%LF)'F%)-%(/H)#/(%.-%CF/% #'*+,% O)#()#$% ("6.)#% H"2% CF/% A &/#(% "H%*/1,XI)-C% .#(% DW8,V*% R94TU% 8F/% 12"6"C/2% "H%I)-C% )-% 2/12/--/(% OG% .% <:% KO% .#C)-/#-/% 8-)0%!#'*+,J% LF)'F% .3-"% O)#(-% C"% M*N9J% .#(% CFB-%)#F)O)C-% CF/% )#C/2.'C)"#%O/CL//#%*/1,%.#(%M*N%OG% '"61/C)#$% L)CF% */1,% 2/'2B)C/(% C"% 12/&INV%R94TU%8-)0%/012/--)"#%)-%CB2#/(%"#%)#%O"CF%I%'F2"&6"-"6/-%12)"2% C"% )#)C).C)"#%"H%INVU%,C% CF/%"#-/C%"H% INVJ% 8-)0% /012/--)"#% CB2#-% )#C"% 6"#".33/3)'%.#(% )-%"#3G%.--"').C/(%L)CF% CF/% HBCB2/%.'C)Y/%I&'F2"6"-"6/% B#C)3% I)-C% /012/--)"#% )-% "HHU% 8-)0%("/-% #"C% /0)-C% "#% CF/% )#.'C)Y/% I&'F2"6"-"6/%"#'/% '/33-% B#(/2$"% CF/% I&)#.'C)Y.C)"#% 12"'/--%R4SJ%94TU%8.K/#% C"$/CF/2J%O"CF%*/1,%.#(%M*N9%.2/%2/ZB)2/(%H"2%CF/%)#)C).C)"#%.#(%-12/.(%"H%INVU%8F/% 6/'F.#)-6% "H% F"L% I)-C% )-% 2/$B3.C/(% 2/&6.)#-% B#'3/.2% OBC% 3)K/3G% )#Y"3Y/-% #/$.C)Y/% .#(%1"-)C)Y/% 2/$B3.C"2-U% ["2% CF/% .'C)Y/% I&'F2"6"-"6/J% 8-)0% *+,% HB#'C)"#-% .-% CF/% /-C.O&3)-F/(%I)-C%2/12/--"2U%\F/2/.-%H"2%CF/%)#.'C)Y/%I&'F2"6"-"6/J% .#"CF/2% INV&/#'"(/(% !#'*+,J%710J% )(/#C)H)/(% OG% 8).#% /C% .3UJ% HB#'C)"#-% .-% I)-C%.'C)Y.C"2%R95TU%]/13/C)"#%"H%710%OG%/)CF/2%K#"'K&"BC% "2% K#"'K("L#% .112".'F% O3"'K-% I)-C% B1&

2/$B.3C)"#J%-B$$/-C)#$%710%HB#'C)"#-%!"#$%&"'%.#(%!"#(!'%R95TU%V#%O2)/HJ%CF/%/012/--)"#%.#(%HB#'C)"#%"H%I)-C% )-% '"#C2"33/(%OG% CL"%6.^"2%#'*+,&O.-/(%-L)C'F/-?%8-)0%H"2%.'C)Y/%I%.#(%710%H"2%)#.'C)Y/%IU%%%)!%#&"*#+,-)./##,#C)-/#-/% C"% V$H92% *+,% =,)2>% )-% .% 4:_% KOJ%1"3G.(/#G3.C/(J% #"#&'"()#$% *+,% CF.C% C2.#&-'2)O/(% H2"6%.#% .#C)-/#-/% 12"6"C/2% 3"'.C/(% )#%)#C2"#% 9% "H% CF/% V$H92% =)#-B3)#&3)K/% $2"LCF% H.'C"2%CG1/%9% 2/'/1C"2>% )#% CF/%6"B-/%'F2"6"-"6/%4@%R9:J%9<TU%V$H92%$/#/%'3B-C/2%'"#C.)#-%5%)612)#C/(%$/#/-?% V$H92J% `3'99.9J% .#(% `3'99.5U% a#3)K/%V$H92J%`3'99.9J%.#(%`3'99.5%6.C/2#.3%C2.#-'2)1&C)"#J% CF/% ,)2% #'*+,% )-% "#3G% /012/--/(% H2"6% CF/%1.C/2#.3%.33/3/%R9ATU%P012/--)"#%"H%CF/%,)2%#'*+,%2/-B3C-%)#%.%b'3"B(b%#B'3/.2%1.CC/2#%"Y/2%CF/%)6&12)#C/(% ]+,% 3"'B-% (B2)#$% /6O2G"#)'% (/Y/3"1&6/#C% )#% CF/% 13.'/#C.% .#(% CF/% .(B3C% F/.2CU% 8F/%/012/--)"#% "H% B#-13)'/(% ,)2% )-% K#"L#% C"% O/% B#&-C.O3/% .#(% /0'3B-)Y/3G% 3"'.3)c/(% C"% CF/% #B'3/B-%LF)3/% -13)'/(% )-"H"26-%.2/% 2/3.C)Y/3G% -C.O3/%.#(%H"B#(% )#% O"CF% #B'3/B-% .#(% 'GC"13.-6% R9A&9@TU%8"%(.C/J% CF/%6"-C% 'F.2.'C/2)c/(% HB#'C)"#%"H% ,)2%)-% 2/$B3.C)"#%"H% $/#"6)'% )612)#C)#$%"H% CF/% V$H92%$/#/% '3B-C/2U% 8F)-% /1)$/#/C)'% 12"'/--% F.-% 2/&'/#C3G% O//#% H"B#(% C"% )#Y"3Y/% F)-C"#/% 6/CFG3.&C)"#% C"% .'F)/Y/% 6"#".33/3)'% $/#/% /012/--)"#%L)CF"BC%.3C/2)#$%CF/%$/#/C)'%-/ZB/#'/U%a-)#$%CF/%

[)$B2/% 9U% !#'*+,% )#% 'F2"6.C)#&2/6"(/3)#$U% !'#*+,% -B'F% .-% I)-CX*/1,J% ,)2J% D"C.)2J% 8-)0J% ,+*V!% .#(%E'#Z4"C4% '.#%2/'2B)C% M"3G'"6O%*/12/--)Y/% N"613/0% =M*N>% Y).% ()2/'C% )#C/2.'C)"#%L)CF% PQD9% "2% "CF/2% '"61"#/#C-% C"% CF/% C.2$/C/(%3"'B-%LF/2/%CF/G%12"6"C/%C2)6/CFG3.C)"#%"H%D5E9@J%3/.()#$%C"%-)3/#')#$%"H%CF/%-1/')H)'%$/#/-U%

!

!

!"#$%&'(#)%"**+,-"* #,-* .,%/-* ,'01"* 2-"%0-* .13* &-#-$40)"-&5* )#* .13* 1* 3'$6$)3-* #,1#* #,-$-* 1$-* %"/7*18%'#* 9:5:::49;5:::* 6$%#-)"4(%&)"2* 2-"-35*$-6$-3-"#)"2* /-33* #,1"*9<*%=*.,%/-*,'01"*2-4"%0->* !#*.13*,1$&* #%* )012)"-* #,1#*0%3#*%=* #,-*2-"%0)(*3-?'-"(-3*1$-* @'"A*BCD*3)"(-*3)06/-*%$21")303* 3'(,* 13* !"#$#%&'()* +,()-#.)$/,"*1"&* 0),-#"&)12'/'$* ,(,.)-$* ,1E-* 1* E-$7* (/%3-*"'08-$* %=* 6$%#-)"4(%&)"2* 2-"-3>* F,-* /)0)#-&*"'08-$*%=*6$%#-)"4(%&)"2*2-"-3*(1""%#*-G6/1)"*#,-*&-E-/%60-"#1/*1"&*6,73)%/%2)(1/*(%06/-G)#7*%=*,'01"3>*+)#,*#,-*$16)&*&-E-/%60-"#*)"*,)2,4#,$%'2,6'#* 3-?'-"()"2* #-(,"%/%2)-3* 3'(,* 13*&--6* 3-?'-"()"2* 1"&* .,%/-* 2-"%0-* ,)2,4&-"3)#7* #)/)"2* 1$$175* )#* )3* "%.* A"%."* #,1#* 18%'#*HI<*%=*#,-*J@'"AK*BCD3*1$-*#$1"3($)8-&*13*"%"4(%&)"2*LCD3*M"(LCD3N* )"(/'&)"2*3,%$#*1"&* /%"2*"(LCD3*OP5*9Q>***D* /1$2-* E1$)-#7* %=* "(LCD3* (1"* 8-* &)E)&-&* )"#%*#.%* (/133-3R* 3#$'(#'$1/* 1"&* $-2'/1#%$7* "(LCD3*MF18/-* PN>* S#$'(#'$1/* "(LCD3* )"(/'&-* #LCD5*

$LCD5* 1"&* 3"%LCD>* T13-&* %"* "(LCD* /-"2#,5*$-2'/1#%$7*"(LCD*(1"*8-*='$#,-$*&)E)&-&* )"#%*1#*/-13#* #,$--* 2$%'63R* MPN* S,%$#* "(LCD* )"(/'&)"2*U)($%LCD* M0)LCDN* M9949V* "#3N* 1"&* 6).)4)"#-$1(#)"2*LCD*M6)LCDN*M9W4VP*"#3NX*M9N*0-&)'0*"(LCD*M;:49::*"#3NX*MVN*/%"2*"(LCD*MY9::*"#3N>*F,)3* $-E)-.* .)//* =%('3* %"* 0-&)'0* 1"&* /%"2*"(LCD35* .,)(,* 1$-* (%//-(#)E-/7* $-=-$$-&* #%* 13*/1$2-*%$*/%"2*"(LCD3*MZ"(LCD3N>*T13-&*%"*/1$2-43(1/-*3-?'-"()"2*1"&*6$-&)(#)%"* =$%0*(,$%014#)"43#1#-* 0163* %=* ='//* /-"2#,* (BCD* /)8$1$)-3* )"*[DCF\U9* 1"&* V* 13* .-//* 13* ,'01"* #$1"3($)64#%0-35*0%$-*#,1"*]5W::*Z"(LCD3*)"*0%'3-*1"&*%E-$*V5V::*Z"(LCD3*)"*,'01"*,1E-*8--"*)&-"4#)=)-&* .)#,* 1* #%#1/* %=* 166$%G)01#-/7* 9V5:::*Z"(LCD3*)"*1*01001/)1"*2-"%0-*OV4^Q>***T)%2-"-3)3*%=*Z"(LCD3* )3*?')#-*(%06/)(1#-&>* !"*2-"-$1/5*Z"(LCD*#$1"3($)6#)%"*1"&*6$%(-33)"2*)3*E-$7* 3)0)/1$* #%* 6$%#-)"4(%&)"2* LCD>* U%3#* %=*Z"(LCD3* 1$-* #$1"3($)8-&* 87* LCD* 6%/70-$13-*MLCD_N* !!5* 8'#* 3%0-* Z"(LCD3* ,1E-* 8--"* $-46%$#-&* #%* 8-* #$1"3($)8-&* 87* LCD_* !!!5* 1"&* #,-*01@%$)#7*%=*Z"(LCD3*1$-*36/)(-&5*6%/71&-"7/1#-&*

D0*`*F$1"3/*L-3*9:P9X]M9NRP9^4P;:*...>1@#$>%$2*a!SSCRPH]V4IP]PaD`FLP9:9::P**

3,4',5*6"/'7(,*Z%"2*"%"4(%&)"2*LCD3R*E-$31#)/-*013#-$*$-2'/1#%$3*%=*2-"-*-G6$-33)%"*1"&*($'()1/*6/17-$3*)"*(1"(-$**Z-)*C)-P5*b3)"24`'*+'95*`'"24U1%*b3'P5*S,),4S,)"*c,1"2P5*D&10*U*Z1T1==P5*c,)14+-)*Z)P5*d1"*+1"2P5*`-"")=-$*Z>*b3'P595]5*U)-"4c,)-*b'"2P595V5]**8!,%)"/+,-/*#9*:#(,7;()"*)-2*0,((;()"*<-7#(#.=>*?&,*@-'4,"$'/=*#9*?,A)$*:!*6-2,"$#-*0)-7,"*0,-/,">*8B8B*C#(D7#+1,*E#;(,4)"2>*C#;$/#->*?F*GGHIHJ*K0,-/,"*9#"*:#(,7;()"*:,2'7'-,>*0&'-)*:,2'7)(*@-'4,"$'/=*C#$%'/)(>*?)'7&;-.>*?)'5)-J*IL")2;)/,*M-$/'/;/,*#9*0)-7,"*E'#(#.=>*0#((,.,*#9*:,2'7'-,>*0&'-)*:,2'7)(*@-'4,"$'/=>*?)'7&;-.>*?)'5)-J*N6$')*@-'4,"$'/=>*?)'7&;-.>*?)'5)-**L-(-)E-&*[-8$'1$7*V5*9:P9X*1((-6#-&*[-8$'1$7*PW5*9:P9X*e6'8*D6$)/*I5*9:P9X*_'8/)3,-&*D6$)/*V:5*9:P9**D83#$1(#R*+)#,* $16)&* &-E-/%60-"#* %=* 3-?'-"()"2* #-(,"%/%2)-3* 3'(,* 13* &--6* 3-?'-"()"2* 1"&*.,%/-* 2-"%0-* ,)2,4&-"3)#7* #)/)"2*1$$175*.-*"%.*A"%.* #,1#*0%3#*%=* #,-*J@'"AK*2-"%0)(*3-?'-"(-3*1$-* #$1"3($)8-&*13*"%"4(%&)"2*LCD3*M"(LCD3N>* D* /1$2-* "'08-$* %=* /%"2* "(LCD* #$1"3($)6#3* MY* 9::86N* ,1E-* 8--"* )&-"#)=)-&5* 1"&* #,-3-* /%"2* "(LCD3*MZ"(LCD3N*1$-*=%'"&*#%*8-*($'()1/*$-2'/1#%$3*=%$*-6)2-"-#)(*0%&'/1#)%"5*#$1"3($)6#)%"5*1"&*#$1"3/1#)%">*!"*#,)3*$-E)-.5*.-*8$)-=/7* 3'001$)f-* #,-* $-2'/1#%$7* ='"(#)%"*%=* Z"(LCD3*.)#,*1*61$#)('/1$* =%('3*%"* #,-*'"&-$/7)"2*0-(,1")303*%=*Z"(LCD3*)"*%"(%2-"-3)35*#'0%$*0-#13#13)3*1"&**3'66$-33)%">**g-7.%$&3R*Z%"2*"%"4(%&)"2*LCD*MZ"(LCDN5*-6)2-"-#)(* $-2'/1#)%"5*(%06-#)#)E-*-"&%2-"%'3*LCD*M(-LCDN5*%"(%2-")(*/"(LCD5*63-'&%2-"-*#$1"3($)6#5*"1#'$1/*1"#)3-"3-*LCD*MCDFN**

elife.elifesciences.org

Mattick. eLife 2013;2:e01968. DOI: 10.7554/eLife.01968 1 of 3

It has been known since the late 1970s that many DNA sequences are transcribed but not translated. Moreover, most protein-coding

genes in mammals are fragmented, with only a small fraction of the primary RNA transcript being spliced together to form messenger RNA. For many years it was assumed that untrans-lated RNA molecules served no useful purpose but, starting in the mid-1990s, a small body of researchers, including the present author (Mattick, 1994), have been arguing that these RNAs transmit regulatory information, possibly associated with the emergence of multicellular organisms. This is supported by the observa-tion that the proportion of noncoding genomic sequences broadly correlates with developmen-tal complexity, reaching over 98% in mammals (Liu et al., 2013), although others have argued that the increase in genome size is due to the inefficiency of selection against non-functional

elements as body size goes up and population size goes down (Lynch, 2007).

High-throughput sequencing analyses over the past decade have shown that the majority of mammalian genome is transcribed, often from both strands, and have revealed an extraordinarily complex landscape of overlapping and interlac-ing sense and antisense, alternatively spliced, protein-coding and non-protein-coding RNAs, the latter generally referred to as long noncoding RNAs (lncRNAs). Moreover, the repertoire of these lncRNAs is different in different cells (Carninci et al., 2005; Cheng et al., 2005; Birney et al., 2007; Mercer et al., 2012). While some transcripts may encode previously unrecognized small pro-teins, the function or otherwise of the vast majority of lncRNAs remains to be determined.

Because many lncRNAs appear to be expressed at low levels, and many have lower sequence conservation than messenger RNAs, one inter-pretation has been that these RNAs represent transcriptional noise from complex genomes cluttered with evolutionary debris. However, assessments of sequence conservation rely on assumptions about the non-functionality and representative distribution of reference sequences, which are not verified and cannot be directly tested (Pheasant and Mattick, 2007). Nonetheless, many lncRNAs show patches of relative sequence conservation (Derrien et al., 2012), and even more do so at the secondary structural level (Smith et al., 2013).

Expression analyses have shown that lncRNAs originate from all over the genome and are expressed at different times during differentia-tion and development (Dinger et al., 2008), often exhibiting highly cell-specific patterns

Copyright Mattick. This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

INSIGHT

GENETICS

Probing the phenomics of noncoding RNAGenetic knockout experiments on mice confirm that some long noncoding RNA molecules have developmental functions.

JOHN S MATTICK

Related research article Sauvageau M, Goff LA,

Lodato S, Bonev B, Groff AF, Gerhardinger C,

Sanchez-Gomez DB, Hacisuleyman E, Li E,

Spence M, Liapis SC, Mallard W, Morse M,

Swerdel MR, D’Ecclessis MF, Moore JC, Lai V,

Gong G, Yancopoulos GD, Frendewey D, Kellis M,

Hart RP, Valenzuela DM, Arlotta P, Rinn JL. 2013.

Multiple knockout mouse models reveal lincRNAs

are required for life and brain development. eLife

2:01749. doi: 10.7554/eLife.01749

Image A wild-type (left) and mutant (right) mouse

seven days after birth; this particular mutation

(Mdgt!/!) is partially lethal with growth defects in

survivors

elife.elifesciences.org

Sauvageau et al. eLife 2013;2:e01749. DOI: 10.7554/eLife.01749 1 of 24

Multiple knockout mouse models reveal lincRNAs are required for life and brain developmentMartin Sauvageau1,2†, Loyal A Goff1,2,3†, Simona Lodato1,2†, Boyan Bonev1,2, Abigail F Groff1,2, Chiara Gerhardinger1,2, Diana B Sanchez-Gomez1, Ezgi Hacisuleyman1,2, Eric Li1, Matthew Spence1, Stephen C Liapis1,2, William Mallard1,2, Michael Morse1,2, Mavis R Swerdel4, Michael F D’Ecclessis4, Jennifer C Moore5, Venus Lai6, Guochun Gong6, George D Yancopoulos6, David Frendewey6, Manolis Kellis2,3, Ronald P Hart4, David M Valenzuela6, Paola Arlotta1,2*, John L Rinn1,2,7*

1Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, United States; 2Broad Institute of MIT and Harvard, Cambridge, United States; 3Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, United States; 4Department of Cell Biology and Neuroscience, Rutgers, The State University of New Jersey, New Brunswick, United States; 5Department of Genetics, Rutgers, The State University of New Jersey, New Brunswick, United States; 6VelociGene, Regeneron Pharmaceuticals Inc., Tarrytown, United States; 7Department of Pathology, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, United States

Abstract Many studies are uncovering functional roles for long noncoding RNAs (lncRNAs), yet few have been tested for in vivo relevance through genetic ablation in animal models. To investigate the functional relevance of lncRNAs in various physiological conditions, we have developed a collection of 18 lncRNA knockout strains in which the locus is maintained transcriptionally active. Initial characterization revealed peri- and postnatal lethal phenotypes in three mutant strains (Fendrr, Peril, and Mdgt), the latter two exhibiting incomplete penetrance and growth defects in survivors. We also report growth defects for two additional mutant strains (linc–Brn1b and linc–Pint). Further analysis revealed defects in lung, gastrointestinal tract, and heart in Fendrr!/! neonates, whereas linc–Brn1b!/! mutants displayed distinct abnormalities in the generation of upper layer II–IV neurons in the neocortex. This study demonstrates that lncRNAs play critical roles in vivo and provides a framework and impetus for future larger-scale functional investigation into the roles of lncRNA molecules.DOI: 10.7554/eLife.01749.001

IntroductionMammalian genomes encode thousands of long noncoding RNAs (lncRNAs), which are emerging as key regulators of cellular processes (Rinn and Chang, 2012; Mercer and Mattick, 2013). Gain- and loss-of-function approaches in cell-based in vitro systems have been useful in uncovering important roles for lncRNAs, such as modulating chromatin states, maintaining cellular identity (i.e. pluripotency) and regulating cell cycle and translation (Tsai et al., 2010; Guttman et al., 2011; Wang et al., 2011; Yoon et al., 2012). Genome-wide association and expression profiling studies in humans have also identified correlations between lncRNA mutation, misregulation and disease states (Visel et al., 2010; Cabili et al., 2011; Brunner et al., 2012). Yet, direct in vivo, genetic evidence of the functional significance of lncRNAs as a class of transcripts remains elusive.

*For correspondence: [email protected] (PA); [email protected] (JLR)

†These authors contributed equally to this work

Competing interests: See page 20

Funding: See page 20

Received: 21 October 2013Accepted: 21 November 2013Published: 31 December 2013

Reviewing editor: Danny Reinberg, Howard Hughes Medical Institute, New York University School of Medicine, United States

Copyright Sauvageau et al. This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

RESEARCH ARTICLE

elife.elifesciences.org

Sauvageau et al. eLife 2013;2:e01749. DOI: 10.7554/eLife.01749 1 of 24

Multiple knockout mouse models reveal lincRNAs are required for life and brain developmentMartin Sauvageau1,2†, Loyal A Goff1,2,3†, Simona Lodato1,2†, Boyan Bonev1,2, Abigail F Groff1,2, Chiara Gerhardinger1,2, Diana B Sanchez-Gomez1, Ezgi Hacisuleyman1,2, Eric Li1, Matthew Spence1, Stephen C Liapis1,2, William Mallard1,2, Michael Morse1,2, Mavis R Swerdel4, Michael F D’Ecclessis4, Jennifer C Moore5, Venus Lai6, Guochun Gong6, George D Yancopoulos6, David Frendewey6, Manolis Kellis2,3, Ronald P Hart4, David M Valenzuela6, Paola Arlotta1,2*, John L Rinn1,2,7*

1Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, United States; 2Broad Institute of MIT and Harvard, Cambridge, United States; 3Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, United States; 4Department of Cell Biology and Neuroscience, Rutgers, The State University of New Jersey, New Brunswick, United States; 5Department of Genetics, Rutgers, The State University of New Jersey, New Brunswick, United States; 6VelociGene, Regeneron Pharmaceuticals Inc., Tarrytown, United States; 7Department of Pathology, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, United States

Abstract Many studies are uncovering functional roles for long noncoding RNAs (lncRNAs), yet few have been tested for in vivo relevance through genetic ablation in animal models. To investigate the functional relevance of lncRNAs in various physiological conditions, we have developed a collection of 18 lncRNA knockout strains in which the locus is maintained transcriptionally active. Initial characterization revealed peri- and postnatal lethal phenotypes in three mutant strains (Fendrr, Peril, and Mdgt), the latter two exhibiting incomplete penetrance and growth defects in survivors. We also report growth defects for two additional mutant strains (linc–Brn1b and linc–Pint). Further analysis revealed defects in lung, gastrointestinal tract, and heart in Fendrr!/! neonates, whereas linc–Brn1b!/! mutants displayed distinct abnormalities in the generation of upper layer II–IV neurons in the neocortex. This study demonstrates that lncRNAs play critical roles in vivo and provides a framework and impetus for future larger-scale functional investigation into the roles of lncRNA molecules.DOI: 10.7554/eLife.01749.001

IntroductionMammalian genomes encode thousands of long noncoding RNAs (lncRNAs), which are emerging as key regulators of cellular processes (Rinn and Chang, 2012; Mercer and Mattick, 2013). Gain- and loss-of-function approaches in cell-based in vitro systems have been useful in uncovering important roles for lncRNAs, such as modulating chromatin states, maintaining cellular identity (i.e. pluripotency) and regulating cell cycle and translation (Tsai et al., 2010; Guttman et al., 2011; Wang et al., 2011; Yoon et al., 2012). Genome-wide association and expression profiling studies in humans have also identified correlations between lncRNA mutation, misregulation and disease states (Visel et al., 2010; Cabili et al., 2011; Brunner et al., 2012). Yet, direct in vivo, genetic evidence of the functional significance of lncRNAs as a class of transcripts remains elusive.

*For correspondence: [email protected] (PA); [email protected] (JLR)

†These authors contributed equally to this work

Competing interests: See page 20

Funding: See page 20

Received: 21 October 2013Accepted: 21 November 2013Published: 31 December 2013

Reviewing editor: Danny Reinberg, Howard Hughes Medical Institute, New York University School of Medicine, United States

Copyright Sauvageau et al. This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

RESEARCH ARTICLE

elife.elifesciences.org

Sauvageau et al. eLife 2013;2:e01749. DOI: 10.7554/eLife.01749 1 of 24

Multiple knockout mouse models reveal lincRNAs are required for life and brain developmentMartin Sauvageau1,2†, Loyal A Goff1,2,3†, Simona Lodato1,2†, Boyan Bonev1,2, Abigail F Groff1,2, Chiara Gerhardinger1,2, Diana B Sanchez-Gomez1, Ezgi Hacisuleyman1,2, Eric Li1, Matthew Spence1, Stephen C Liapis1,2, William Mallard1,2, Michael Morse1,2, Mavis R Swerdel4, Michael F D’Ecclessis4, Jennifer C Moore5, Venus Lai6, Guochun Gong6, George D Yancopoulos6, David Frendewey6, Manolis Kellis2,3, Ronald P Hart4, David M Valenzuela6, Paola Arlotta1,2*, John L Rinn1,2,7*

1Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, United States; 2Broad Institute of MIT and Harvard, Cambridge, United States; 3Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, United States; 4Department of Cell Biology and Neuroscience, Rutgers, The State University of New Jersey, New Brunswick, United States; 5Department of Genetics, Rutgers, The State University of New Jersey, New Brunswick, United States; 6VelociGene, Regeneron Pharmaceuticals Inc., Tarrytown, United States; 7Department of Pathology, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, United States

Abstract Many studies are uncovering functional roles for long noncoding RNAs (lncRNAs), yet few have been tested for in vivo relevance through genetic ablation in animal models. To investigate the functional relevance of lncRNAs in various physiological conditions, we have developed a collection of 18 lncRNA knockout strains in which the locus is maintained transcriptionally active. Initial characterization revealed peri- and postnatal lethal phenotypes in three mutant strains (Fendrr, Peril, and Mdgt), the latter two exhibiting incomplete penetrance and growth defects in survivors. We also report growth defects for two additional mutant strains (linc–Brn1b and linc–Pint). Further analysis revealed defects in lung, gastrointestinal tract, and heart in Fendrr!/! neonates, whereas linc–Brn1b!/! mutants displayed distinct abnormalities in the generation of upper layer II–IV neurons in the neocortex. This study demonstrates that lncRNAs play critical roles in vivo and provides a framework and impetus for future larger-scale functional investigation into the roles of lncRNA molecules.DOI: 10.7554/eLife.01749.001

IntroductionMammalian genomes encode thousands of long noncoding RNAs (lncRNAs), which are emerging as key regulators of cellular processes (Rinn and Chang, 2012; Mercer and Mattick, 2013). Gain- and loss-of-function approaches in cell-based in vitro systems have been useful in uncovering important roles for lncRNAs, such as modulating chromatin states, maintaining cellular identity (i.e. pluripotency) and regulating cell cycle and translation (Tsai et al., 2010; Guttman et al., 2011; Wang et al., 2011; Yoon et al., 2012). Genome-wide association and expression profiling studies in humans have also identified correlations between lncRNA mutation, misregulation and disease states (Visel et al., 2010; Cabili et al., 2011; Brunner et al., 2012). Yet, direct in vivo, genetic evidence of the functional significance of lncRNAs as a class of transcripts remains elusive.

*For correspondence: [email protected] (PA); [email protected] (JLR)

†These authors contributed equally to this work

Competing interests: See page 20

Funding: See page 20

Received: 21 October 2013Accepted: 21 November 2013Published: 31 December 2013

Reviewing editor: Danny Reinberg, Howard Hughes Medical Institute, New York University School of Medicine, United States

Copyright Sauvageau et al. This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

RESEARCH ARTICLE

Developmental biology and stem cells | Genes and chromosomes

Sauvageau et al. eLife 2013;2:e01749. DOI: 10.7554/eLife.01749 7 of 24

Research article

Figure 3. Fendrr!/! pups have multiple defects in lung, heart and gastrointestinal tract. (A) Fendrr locus and targeting strategy. Arrows indicate location of the primers used for genotyping. (B) Genotyping results from heterozygote intercrosses at embryonic stages E14.5, E18.5 and at birth (P0). The p value is based on X2 test. (*) All newborns died within 24 hr after birth. (C) Fendrr!/! E18.5 embryos and wild-type littermates. (D) RNA-Seq expression profile for Fendrr across a panel of mouse tissues and cell types. (E and F) lacZ reporter stained organs and sections showing expression of Fendrr in specific regions of the lung (Lu), trachea (Tr) and esophagus (Es), but not in heart (H) in E14.5 and E18.5 embryo (E) and in the gut and stomach (St) (F). Sm, smooth muscle; Ep, Epithelia; Me, Mesenchyme; Ly, Lymphoid aggregates. Scale bars = 1 mm whole organ, 200 µm sections. (G) Number of E18.5 embryos successfully breathing after surgical delivery. (H) Size difference of Fendrr!/! lungs at E14.5 compared to wild-type littermates (n = 3 each). (I–K) Representative hematoxylin and eosin (H&E) stained sections showing unstructured vessels (arrow) in E14.5 Fendrr mutant lungs compared to wild type littermates (n = 3) (I, upper panels), alveolar defects at E18.5 (I, lower panel), thinner mesenchymal layer of the mucosa and external smooth muscle layer of the oesophagus (J) and ventricular septal defects in the heart (K) of Fendrr!/! E18.5 embryos compared to wild type (n = 3). Scale bars= 500 µm, 100 µm for esophagus. (L) RNA-Seq expression levels of Fendrr and the neighboring coding gene Foxf1a in E18.5 lung of homozygote mutant and wild-type littermates (n = 2 each). (M) GSEA of Fendrr!/! vs wild-type E18.5 lung and total brain RNA. Each tile is a significant (q<0.001; Mann-Whitney, BH) gene set from the Reactome collection at mSigDB, based on the Fendrr!/!/wild-type ranking of test-statistic values from a Cuffdiff2 differential analysis. Tiles are shaded based on the z-score of the test-statistic for genes within the given gene set, relative to all genes for a given condition to show direction of expression relative to wild-type.DOI: 10.7554/eLife.01749.008The following figure supplements are available for figure 3:

Figure supplement 1. Fendrr!/! embryos don’t have an omphalocele. DOI: 10.7554/eLife.01749.009

Figure supplement 2. Fendrr, linc–Brn1b, and Peril do not act as cis-enhancer elements. DOI: 10.7554/eLife.01749.010

Developmental biology and stem cells | Genes and chromosomes

Sauvageau et al. eLife 2013;2:e01749. DOI: 10.7554/eLife.01749 7 of 24

Research article

Figure 3. Fendrr!/! pups have multiple defects in lung, heart and gastrointestinal tract. (A) Fendrr locus and targeting strategy. Arrows indicate location of the primers used for genotyping. (B) Genotyping results from heterozygote intercrosses at embryonic stages E14.5, E18.5 and at birth (P0). The p value is based on X2 test. (*) All newborns died within 24 hr after birth. (C) Fendrr!/! E18.5 embryos and wild-type littermates. (D) RNA-Seq expression profile for Fendrr across a panel of mouse tissues and cell types. (E and F) lacZ reporter stained organs and sections showing expression of Fendrr in specific regions of the lung (Lu), trachea (Tr) and esophagus (Es), but not in heart (H) in E14.5 and E18.5 embryo (E) and in the gut and stomach (St) (F). Sm, smooth muscle; Ep, Epithelia; Me, Mesenchyme; Ly, Lymphoid aggregates. Scale bars = 1 mm whole organ, 200 µm sections. (G) Number of E18.5 embryos successfully breathing after surgical delivery. (H) Size difference of Fendrr!/! lungs at E14.5 compared to wild-type littermates (n = 3 each). (I–K) Representative hematoxylin and eosin (H&E) stained sections showing unstructured vessels (arrow) in E14.5 Fendrr mutant lungs compared to wild type littermates (n = 3) (I, upper panels), alveolar defects at E18.5 (I, lower panel), thinner mesenchymal layer of the mucosa and external smooth muscle layer of the oesophagus (J) and ventricular septal defects in the heart (K) of Fendrr!/! E18.5 embryos compared to wild type (n = 3). Scale bars= 500 µm, 100 µm for esophagus. (L) RNA-Seq expression levels of Fendrr and the neighboring coding gene Foxf1a in E18.5 lung of homozygote mutant and wild-type littermates (n = 2 each). (M) GSEA of Fendrr!/! vs wild-type E18.5 lung and total brain RNA. Each tile is a significant (q<0.001; Mann-Whitney, BH) gene set from the Reactome collection at mSigDB, based on the Fendrr!/!/wild-type ranking of test-statistic values from a Cuffdiff2 differential analysis. Tiles are shaded based on the z-score of the test-statistic for genes within the given gene set, relative to all genes for a given condition to show direction of expression relative to wild-type.DOI: 10.7554/eLife.01749.008The following figure supplements are available for figure 3:

Figure supplement 1. Fendrr!/! embryos don’t have an omphalocele. DOI: 10.7554/eLife.01749.009

Figure supplement 2. Fendrr, linc–Brn1b, and Peril do not act as cis-enhancer elements. DOI: 10.7554/eLife.01749.010

Developmental biology and stem cells | Genes and chromosomes

Sauvageau et al. eLife 2013;2:e01749. DOI: 10.7554/eLife.01749 7 of 24

Research article

Figure 3. Fendrr!/! pups have multiple defects in lung, heart and gastrointestinal tract. (A) Fendrr locus and targeting strategy. Arrows indicate location of the primers used for genotyping. (B) Genotyping results from heterozygote intercrosses at embryonic stages E14.5, E18.5 and at birth (P0). The p value is based on X2 test. (*) All newborns died within 24 hr after birth. (C) Fendrr!/! E18.5 embryos and wild-type littermates. (D) RNA-Seq expression profile for Fendrr across a panel of mouse tissues and cell types. (E and F) lacZ reporter stained organs and sections showing expression of Fendrr in specific regions of the lung (Lu), trachea (Tr) and esophagus (Es), but not in heart (H) in E14.5 and E18.5 embryo (E) and in the gut and stomach (St) (F). Sm, smooth muscle; Ep, Epithelia; Me, Mesenchyme; Ly, Lymphoid aggregates. Scale bars = 1 mm whole organ, 200 µm sections. (G) Number of E18.5 embryos successfully breathing after surgical delivery. (H) Size difference of Fendrr!/! lungs at E14.5 compared to wild-type littermates (n = 3 each). (I–K) Representative hematoxylin and eosin (H&E) stained sections showing unstructured vessels (arrow) in E14.5 Fendrr mutant lungs compared to wild type littermates (n = 3) (I, upper panels), alveolar defects at E18.5 (I, lower panel), thinner mesenchymal layer of the mucosa and external smooth muscle layer of the oesophagus (J) and ventricular septal defects in the heart (K) of Fendrr!/! E18.5 embryos compared to wild type (n = 3). Scale bars= 500 µm, 100 µm for esophagus. (L) RNA-Seq expression levels of Fendrr and the neighboring coding gene Foxf1a in E18.5 lung of homozygote mutant and wild-type littermates (n = 2 each). (M) GSEA of Fendrr!/! vs wild-type E18.5 lung and total brain RNA. Each tile is a significant (q<0.001; Mann-Whitney, BH) gene set from the Reactome collection at mSigDB, based on the Fendrr!/!/wild-type ranking of test-statistic values from a Cuffdiff2 differential analysis. Tiles are shaded based on the z-score of the test-statistic for genes within the given gene set, relative to all genes for a given condition to show direction of expression relative to wild-type.DOI: 10.7554/eLife.01749.008The following figure supplements are available for figure 3:

Figure supplement 1. Fendrr!/! embryos don’t have an omphalocele. DOI: 10.7554/eLife.01749.009

Figure supplement 2. Fendrr, linc–Brn1b, and Peril do not act as cis-enhancer elements. DOI: 10.7554/eLife.01749.010

Page 21: 13Non-Coding FNunes · Distribuição comparativa do tamanho dos genomas (barras vermelhas e valores no eixo vertical esquerdo) versus o número de genes (barras azuis e valores no

18/07/14

21

+,-#./01#2#304&&&&&&&&&&&&&&&&&15,&