Transcript

Intrinsic Temporal Locality Properties of Web Request Streams

Rodrigo Fonseca (UC Berkeley)

Virgílio Almeida (UFMG)

Mark Crovella (Boston University)

Bruno Abrahao (UFMG)

IEEE INFOCOM 2003

2

• Network measurements in highly dynamic environments–Understand the behavior and find

intrinsic properties

–Online algorithms for driving self-organization of system agents

–Algorithms for dynamic adapting environments with high degree of uncertainty

Research Questions

3

Question

• Study temporal locality at different points of the Web

–Results: Framework + metrics:

• New insights on the causes of temporal locality across different points of the Web topology

4

Web Measurements

• Why?– Understand and improve current systems

– Design new systems

• Challenges– Complexity

– Rate of change

• Approach– Look for regularities, underlying principles

– Abstractions

5

Previous Studies

• Many advancements–Caching hierarchies

–Replacement Policies for caches

– Load distribution and balancing

• However...–Focus on individual components

–No structured view of the entire system

6

Stream centric view of the Web

• Focus on request streams

– Look for intrinsic properties

–How they are altered

–How their properties change in different points

–How their intrinsic properties can influence the designs of components and of collections of components

7

Tranformations on Request Streams

• Three types of transformations

–Aggregation

• Multiple sources

–Disaggregation

• Multiple destinations

–Filtering

• Resulting stream = subset of input stream

DAF

8

Transformations Abstraction

• Components may be abstracted by combinations of transformations

D

F

Clients

A

Server

D

F

A

Proxy Cache

9

Approach

• For a given intrinsic property of streams–Study the effects of the

transformations on the property

–Combine the effects to understand• Effects of components

• Effects of collections of components

• Properties at different points of the topology

10

Temporal Locality

• Need of intrinsic metrics for the streams– We use virtual time

• Two sources:– Popularity

...XAXBXCXDXE...

• Preserved with reordering

– Correlation...AABBCCDDEE...

• Lost with reordering

11

Measuring Popularity: Entropy

– Traditionally Zipf’s Law: pi i-

– We measure the deviation from Uniform popularity distribution

– Entropy:

• pi determined empirically from frequencies

• N is the number of distinct objects

• If uniform, tends to log2(N)

• If one object only is accessed, tends to 0

– Not the Entropy rate of the source

N

i

ii ppH1

2log

12

Entropy and Zipf’s Law

13

Entropy and Zip’f Law

14

Measuring Correlation: CV

• Look at IAT distribution per object– Number of references between 2 references

to the same object

• Behavior– Correlation tendency to shorter IATs

– No Correlation tends to geometric distribution

• We measure the deviation from the geometric distribution for IAT– Coefficient of Variation:

μ

σ

15

IAT Distribution

Original

Scrambled

16

Metrics Summary

• Entropy

–High concentration: Low Entropy

–Uniform distribution: High Entropy

• CV

–High Correlation: High CV

–No correlation: CV ~ 1

17

Entropy vs. Hit Ratio

• LRU cache simulation, size 5%

18

CV versus HR Difference

• Difference between scrambled and original streams

19

Putting it all together

• Effects of the transformations on the components of temporal locality

20

Effects of Filtering: Popularity

21

Effects of Filtering: Correlation

22

Effects of Aggregation and Disaggregation

CV

Entropy

23

Temporal Locality atDifferent Points of the topology

Servers

Proxies

Clients

24

Effects of

25

Conclusions

• Understanding of behavior

• Transformations Framework allows for structured study and understanding of Web workloads characteristics


Recommended