Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
2/5/15
1
Programa de Verão LNCC 2015
Jornada Ciência de Dados
Fabio Porto ([email protected]) LNCC – CCC - DEXL Lab http://dexl.lncc.br
Gerenciamento de Grandes Volumes de Dados – Parte III
1
Particionameto de Dados
2
Intuição Q
Consultas e processos de workflow sobre dados particionados apresentam um tempo de processamento correspondente ao tamanho da maior partição
Q’ Q’’ Q’’’
3
Imagem de uma região do céu
2/5/15 LIneA - HQOOP 4
2/5/15
2
Distribuição Uniforme
5
Distribuição não uniforme
6
Particionamento de Dados
l Definição: – Dado um conjunto de dados D={ e1, e2,…, en} definir
uma função de distribuição – f (ei) -> v , v um número inteiro correspondente a um item de
alocação de dados (por exemplo um máquina na rede)
l Considerações: – sensível à carga de trabalho W={w1, w2,…, wn}
7
Particionamento x Carga de Trabalho
não local
local
8
2/5/15
3
Objetivo
l minimizar o volume de dados acessado pelas aplicações, em um contexto de execução paralela; – maximiza a localidade de acesso – idealmente, cada aplicação acessaria localmente
(i.e. sem transferência de dados pela rede)
9
Funções de distribuição
l Round-Robin l Histograma l chave l baseada em carga-trabalho
10
Round-Robin
l Dado D={ e1, e2,…, en} e N={ n1n2,…, nn} , onde ni é um nó de alocação de dados – f(ei) -> nj , tal que j=i , se i ≤ n, ou j=mod(i/n)
– Cada elemento é associado a um nó , segundo sua ordem de leitura
– aplicação não apresenta restrição quanto ao conjunto de dados processado em cada nó;
11
Por Intervalo de valores
l A função de distribuição associa um valor do objeto a um intervalo conhecido do domínio
l Seja A={1,2,3,.., k} – P1=[1,..,1000] – P2=[1001, ..,5000] – .. – Pn=[…, k]
l se ei(v) = 2000 então f(ei) = P2
12
2/5/15
4
Por Intervalo de Valores
l Interessante quando aplicações processam conjunto de dados segundo intervalos: – data inicio – data fim – intervalo de valor de venda de produto – intervalo de nomes
l É um caso especial de histogramas
13
Intervalo de Valores (ex bidimensional)
14
Intervalo de Valores (ex)
15
Intervalo de Valores (ex)
16
2/5/15
5
Distribuição não uniforme
17
Distribuição por Intervalo não uiformes
18
Histograma de igual largura
19
Histogramas: Largura x Altura
20
2/5/15
6
Uso do Histograma por altura
21
Particionamento ciente da carga de trabalho
R(a1,…,a9)
B2
B1
Bm
…
22
Problem statement l Given
– Single relation database R(a1,…,an), n ~1000 – Initial workload: set of k queries W0 = {q1,…,qk} – m empty fixed size blocks
l Assumptions – Accessing a block ≈ accessing all its tuples – Periodically new tuples and queries arrive – No privilege to a particular attribute
l Goal – Minimize the total block access during the execution of queries by:
l Optimal partitioning of R’s data in blocks l Optimal query execution
– Adapt to the arrival of new data and queries
23
Overview of the solution l Data partitioning : graph based algorithm
– Nodes: each data item (e.g. tuple) represent a node in the graph – Edges: an edge between two data items if are accessed by a
common query – Edge weight : the number of queries that access both data items – Goal: partition the graph into m equal size sub-graphs with minimum
edge cut l Use a min-cut algorithm
l Block explanation – Blocks are explained in terms of queries
l Each block is assigned an explaining query Bi = vi(R)
l Query processing – Queries are compared to explaining queries
– Matching blocks are selected (we haven’t worked on that yet)
24
2/5/15
7
Partitioning strategy
1
We create a node for each row
Schism: VLDB2010
25
1
2
We create a node for each row
Partitioning strategy
26
1
2
3
We create a node for each row
Partitioning strategy
27
1 2 3 4 5 6 7
1
2
3
4
5
6
7
We create a node for each row
For each vertical fragment
Partitioning strategy
28
2/5/15
8
1 2 3 4 5 6 7
1
2
3
4
5
6
1
1
1
7
We increment the arc weight when two rows are accessed together
q1
For each vertical fragment
Partitioning strategy
29
1 2 3 4 5 6 7
1
2
3
4
5
6
1
1
2 1
1
7
We increment the arc weight when two rows are accessed together
q2
For each vertical fragment
Partitioning strategy
30
1
2
3
4
5
6
1
1
7 3
2
7 2
5
4
7
1
1 2 3 4 5 6 7
We increment the arc weight when two rows are accessed together
W = {q1,…,qn}
For each vertical fragment
Partitioning strategy
31
1
2
3
4
5
6
1
1
7 3
2
7 5
4
7
1
1 2 3 4 5 6 7
We execute a min-cut algorithm
For each vertical fragment
2
Partitioning strategy
32
2/5/15
9
1 2 3 4 5 6 7
1 2 4
3 5 6 7
1
2
3
4
5
6
1
1
7 3
2
7 5
4
7
1
Each partition is assigned a block B1 B2
Catalog
2
Partitioning strategy
33
Partitioned data with queries
1
2
4
3
5
6
7
B1 B2
{q3,q5,…, q13}
{q1,q2,…, q14}
Each block is associated with the queries that access Some records of the block
For a given query q the number of accessed blocks is minimized
34
l New tuple arrival: [DEXA 2012] – Select the best block
l i.e. block to which the new tuple is more correlated – Challenges:
l How to select the best block with minimum effort? – Initial approach : find it based on the correlation of queries
to blocks – Define optimal allocation – Compute actual allocation efficiency – Compute block affinity
l What if the best block is full? – Initial approach: split the block
Adaptive Strategy
35
Allocation based on affinity to blocks
36
2/5/15
10
Elapsed-time of incrementing the DB as the size increases
0.1
1
10
100
1000
2M 4M 6M 8M 10M 12M 14M 16M 18M 20M
Execution
time(s)
DB size
staticDynPart, |D′| = 500 k
+
+ +
++
+
+
+
+
+
+
+
+
+
+
+
+ +
+
+
+ + + +
+
++ +
++ + + +
+
+ + ++
+
DynPart, |D′| = 1M
37
Experiment: - Sloan DR8 – 350 million tuples - workload- synthetic 27000 queries - PaToH – hyper-graph partitioner
By Key
l Partitioning is fine by how to avoid skewed distribution of data through nodes?
l Are there strategies to partition data into hundreds or thousands of nodes with balanced guaranties?
l Lessons learned from P2P systems key distribution strategies
38
P-Grid [Aberer04]
l Completely decentralized l Use randomized algorithm to create search
structure l Scale gracefully in the total number of nodes
and total number of data items;
39
System model l Set of peer nodes linked through neighbourhood
links; l Each peer stores a subset of the keys; l A virtual binary tree is built based on the key value
space l Each peer keeps a branch of an index tree pointing
to the nodes storing the keys the peer manages l Additionally, at each level of the tree pointers
indicate the nodes keeping the complementing key set;
l A search may start at any peer;
40
2/5/15
11
Key management
l Each peer is responsible for an interval of keys: – k= pi={0,1}, val(k)= ∑i=1,n 2-ipi
– I(k)=[val(k),val(k)+2n] l For each prefix kl of length l, a peer maintains
references to other peers that have the same prefix as l but a different value at position l+1 for the keys the peer is responsible for
l A binary tree orders the key space and points to peers managing keys with prefixes appearing at tree nodes;
l Managing the tree is independent of managing keys
p1 p2… pn
41
Example
root
0 1
00 01 10 11
1 6 2 3 4 5
0 - 3,4,1 00 - 2
Complement pointer
Each node in the tree keeps pointers to peers holding the complementary key: (p1 p2… pk) -> (p1 p2… pk)
index keys Starting with
00…
42
P-Grid Construction a)
a
Initially each peer points to the whole key space
b root root
a b
0 1
a(0)
b(1)
b)
a
Meet peers managing the same set of keys
b p1 p2… pk p1 p2… pk
a b
p1 p2… pk 0 p1 p2… pk1
a(p1 p2… pk 0)
b(p1 p2… pk1)
43
P-Grid Construction
c)
a
One peer manages a longer path
b p1 p2… pk p1 p2… pkpk+1.. pn
a b
p1 p2… pkpk+1 p1 p2… pkpk+1.. pn
a(p1 p2… pk pk+1)
b(p1 p2… pk pk+1.. pn)
d)
a
The keys peers manage intercept up to position k
b p1 p2… pkqk+1.. qm
p1 p2… pkpk+1.. pn
Ref1=refs(qk+1)/{b}
meet(ref1,{b}) Ref2=refs(pk+1)/{a}
meet(ref2,{a}) 44
2/5/15
12
Distributed BY Range
l Pgpool II – Implemented on top of PostgreSQL 9.1 – Central node coordinates data/query distribution/
replication – Requests distributed through nodes – Parallel query Processing
l data partitioning based on a table column range (e.g. id)
l For short queries, may reduce the number of accessed data
– Load Balance l Concurrent requests directed to different DB copies
45
Parallelism & LoadBalance
46
Parallel query Pgpool II
Replication Pgpool II
Replication Pgpool II
postgreSQL postgreSQL PostgreSQL PostgreSQL
Evaluation
l Strength – Extends PostgreSQL – Load balance queries from concurrent workflows – Scales up to 128 DB nodes
l Weaknesses – Lack of support to spatial functions – Partitioning based on a single column – Ingestion can’t use COPY
47
QServ - LSST
l Developed by the LSST DM team l Astronomy data management l Horizontal partitioning based on declination
zones (nodes) and data on each node distributed into chunks based on RA-chunk
l Approx. 1000 partitions l Native support to spatio-temporal functions l Built on top of MySQL
48
2/5/15
13
Evaluation
l Strong – Designed to support astronomy data surveys – Highly scalable: ~1000 nodes – First performance results are very promising – Alignment with the LSST project
l Weaknesses – Current culture based on PostgreSQL
49
Considerações Finais l Gerenciar grandes volumes de dados requer:
– estratégias escaláveis – foco do desenho que favoreça localidade de
dados – tratamento de incerteza – tratamento da heterogeneidade – particionamento dos dados – processos de otimização
50
Considerações Finais
l Framework atuais de processamento analíticos – MapReduce (Hadoop, PivotalHD, Cloudera,…) – Apache Spark – Sagitari (CEFET-RJ) – em breve disponível – HaQoop – em desenvolvimento no DEXL
– http://dexl.lncc.br
51