Papakonstantinou el farol

7/30/2019 Papakonstantinou el farol

1/66

The El Farol Bar Problem for next generation

systems

Athanasios Papakonstantinou

Dissertation submitted for the MSc in Mathematicswith Modern Applications

Department of Mathematics

August 2006

Supervisor Dr. Maziar Nekovee, BT Research


2/66


3/66

Contents

1. Chaos, Complexity and Irish Music 11.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2. The El Farol Bar Problem (EFBP) . . . . . . . . . . . . . . . . . 4

1.3. Modelling the original problem . . . . . . . . . . . . . . . . . . . 51.4. Game Theory Definitions . . . . . . . . . . . . . . . . . . . . . . . 7

2. Previous Approaches to the El Farol 92.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.2. Approaches to EFBP . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.2.1. Minority Game . . . . . . . . . . . . . . . . . . . . . . . . 92.2.2. Evolutionary Learning . . . . . . . . . . . . . . . . . . . . 112.2.3. Stochastic Adaptive Learning . . . . . . . . . . . . . . . . 13

3. Analysis and Extension of the Stochastic Algorithm 233.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233.2. Taxing/Payoffs Algorithms . . . . . . . . . . . . . . . . . . . . . . 23

3.2.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 233.2.2. Fairness and Efficiency . . . . . . . . . . . . . . . . . . . . 29

3.3. Budget Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . 313.3.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 313.3.2. Fairness and Efficiency . . . . . . . . . . . . . . . . . . . . 34

4. Case Study: Multiple Bars in Santa Fe 37

5. Conclusions 43

A. C Code for the El Farol Bar Problem 47

ii


4/66

List of Figures

1.1. Bar attendance in the first 100 weeks [1]. . . . . . . . . . . . . . . 5

2.1. Behaviour of the average attendance (Top) and of the fluctuations(bottom) in the El Farol problem with L = 60 seats, a = 1/2 and

m = 2, 3, 6 from left to right [2]. . . . . . . . . . . . . . . . . . . . 102.2. Mean weekly attendance for all 300 trials[3]. . . . . . . . . . . . . 122.3. The attendance in a typical trial[3]. . . . . . . . . . . . . . . . . . 122.4. The normalised one-step transition matrix[3]. . . . . . . . . . . . 132.5. The overall attendance and the probabilities for each of the M

agents for partial information, full information and signs algo-rithms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.6. histograms for partial and full info algorithms. . . . . . . . . . . . 182.7. fairness and efficiency plots for original algorithms. . . . . . . . . 202.8. standard deviation off attendance for original algorithms. . . . . . 21

3.1. histograms for partial and full info taxing algorithms. . . . . . . . 253.2. The overall attendance and the probabilities for each of the M

agents for partial information tax, modified tax algorithms andthe full information tax algorithm. . . . . . . . . . . . . . . . . . 26

3.3. fairness and efficiency plots for taxing algorithms. . . . . . . . . . 303.4. standard deviation off attendance for taxing algorithms. . . . . . . 313.5. histograms for budget algorithms (b = 10, w = 0/6). . . . . . . . . 323.6. The overall attendance and the probabilities for each of the M

agents for budget (budget= 10) and modified budget (budget= 10,

wait=3, 6) algorithms. . . . . . . . . . . . . . . . . . . . . . . . . 333.7. standard deviation off attendance for budget algorithms. . . . . . 343.8. fairness and efficiency plots for budget algorithms. . . . . . . . . . 35

4.1. The overall attendance and the probabilities for each of the Magents for each bar. . . . . . . . . . . . . . . . . . . . . . . . . . . 38

4.2. The cumulative attendance for all three bars. . . . . . . . . . . . . 394.3. Agent attendances for each bar . . . . . . . . . . . . . . . . . . . 404.4. payoff probabilities for each bar and cumulative payoff probability

for the 3-bar version . . . . . . . . . . . . . . . . . . . . . . . . . 41

iv


5/66

List of Tables

1.1. The Prisoners Dilemma in matrix form with utility pay-offs. . . . 8

2.1. Behaviour of the partial information algorithm. . . . . . . . . . . 162.2. Behaviour of the full information algorithm. . . . . . . . . . . . . 16

2.3. Behaviour of the signs algorithm. . . . . . . . . . . . . . . . . . 182.4. standard deviation and mean of payoff probabilities for original

algorithms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.1. Behaviour of the tax algorithm with ctax = 6. . . . . . . . . . . . 243.2. Simulation results for various values ofctax, cp. . . . . . . . . . . . 253.3. standard deviation and mean of payoff probabilities for taxing al-

gorithms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293.4. Behaviour of the modified budget algorithm for budget=10 and

staying at home=6. . . . . . . . . . . . . . . . . . . . . . . . . . . 32

3.5. Simulation results for budget algorithms. . . . . . . . . . . . . . . 323.6. standard deviation and mean of payoff probabilities for budgetalgorithms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

4.1. mean and std for all bars. . . . . . . . . . . . . . . . . . . . . . . 39

vi


6/66

Acknowledgements

I would like to dedicate this dissertation to my parents Costantinos and ValentiniPapakonstantinou. I want to thank them for giving me the opportunity to studyin the UK and for always supporting me in all my decisions.

Special thanks to my supervisor in BT Research, Maziar Nekovee, for his helpand guidance. He was a very valuable source of information and always availablewhen I needed him. Also Keith Briggs in BT Research was very helpful and thisis very much appreciated.

Futhermore, I would like to thank all the friends in York and in Ipswich forthis last year, as well as all friends in Greece.

vi


7/66

Chapter 1

Chaos, Complexity and IrishMusic

1.1 Overview

There is a common misconception that complexity and chaos are synonymous.Besides the nonlinearities that occur in both systems, the other properties thosetwo areas of mathematics share are disambiguity in definitions and having manyinteresting and different applications.

Although a definition of chaos that everyone would accept does not exist,

almost everyone agrees to three properties a chaotic system should have: Chaosis aperiodic long-term behaviour in a deterministic system that exhibits sensitivedependence on initial conditions[4]. A time evolving property such as the move oftectonic plates or planets, the temperature or any weather characteristic or eventhe price movements in stock markets or dreams and emotions [5] may displaychaotic behaviour.

Chaos is often related to complexity, but does not follow from it in all cases.Chaos might be occurring when studying phenomena as they progress in time,but when the same phenomena are examined from a microscopic point of viewthen, the interaction of the various parts of which the system is consisted createspatterns and not erratic chaotic behaviour. This is where complexity enters.

It is rather difficult to define complexity in mathematical terms, althoughthere is a measure of complexity there is no other way to give mathematicaldefinition. Dictionaries might be useful in this quest for defining complexity.According to an online Dictionary by Oxford University Press, complex is an ad-jective used to describe nouns which are consisted ofdifferentandconnectedparts. An even more precise definition is consisted of interconnected or interwo-ven parts.[6] That means, that in order to understand the behaviour of a complexsystem we should understand the behaviour of each part, as well as how they in-teract to form the behaviour of the whole. Our incapacity to describe the whole

1


8/66

2 CHAPTER 1. CHAOS, COMPLEXITY AND IRISH MUSIC

without describing each part combined with the necessity to relate each part withanother when describing makes the study of complex systems very difficult.

Finally, based on [6] an attempt to formalise all the above definitions can bemade: a complex system is a system formed out of many components whose be-haviour is emergent,that is, the behaviour of the system cannot be simply inferredonly from the behaviour of its components. The amount of information neededto describe the behaviour of such a system is a measure of its complexity. If thenumber of the possible states the system could have is and it is needed tospecify in which state it is in, then the number of binary digits needed to specifythis state is related to the number of the possible states:

I = log2() (1.1.1)

In order to realise which state the system is in, all the possible states mustbe examined. The fact that the unique representation of each state requiresa number equal to the number of the states, leads to the conclusion that thenumber of states of the representation is equal to the number possible states ofthe system. For a string of N bits, there are 2N possible states, therefore:

= 2N N = I (1.1.2)

There are many applications of complex systems, in statistical physics, me-teorology, geology, biology, engineering, economics, even social sciences and psy-chology. In all sciences we could find systems that could be dismantled in their

core components and study each part and the system as a whole simultaneously.It is very interesting trying to examine and forecast the behaviour of systemsthat consist of human beings, systems like a family, or a business, or even a gov-ernment. Humans have the capacity to learn and to constantly evolve, makingmodels that deal with them unrealistic and of no use if this property is not takeninto consideration. Therefore the need to model this human behaviour createdthe complex adaptive systems (CAS). The most common definition and univer-sally approved is the one given by one of its founders, John H. Holland of SantaFe Institute: A Complex Adaptive System (CAS) is a dynamic network of manyagents (which may represent cells, species, individuals, firms, nations) acting in

parallel, constantly acting and reacting to what the other agents are doing. Thecontrol of a CAS tends to be highly dispersed and decentralised. If there is tobe any coherent behaviour in the system, it has to arise from competition andcooperation among the agents themselves. The overall behaviour of the system isthe result of a huge number of decisions made every moment by many individualagents[7]. It is even more intriguing that CAS do not appear only in humannetworks but wherever there is a system with interacting elements. It could becells in a cellular automaton, ions in a spin glass or even cells in an immunesystem[8]. A complex adaptive system besides the property of complexity, hasthe properties of emergence and self-organisation.


9/66

1.1. OVERVIEW 3

Emergence occurs when agents that operate in the same environment startto interact which each other. The number of the interactions increases when

the number of agents increases, this leads to the appearance of new types ofbehaviour. This process can result to an increase of the complexity and since itis an internal property of the system and not managed by an outside source, itis a self-organised process.

The subject of this dissertation is the study and expansion of a famous com-plex adaptive system known as El Farol Bar Problem which was introduced bythe economist W. B. Arthur in 1994[1]. El Farol is a bar in Santa Fe in New Mex-ico which plays each Thursday Irish music. People enjoy visiting it and hearingsome quality music but eventually it becomes overcrowded, so people stop enjoy-ing themselves. Each customer decides independently whether to attend or not,based on a set of predictors. This scenario provides a simplified mathematicalmodel of a class of congestion and coordination problems that arise in modernInformation and Communications Technology (ICT) systems.

One application of great interest is networks of cognitive radios, where agentscompete with each other for the same resource (RF spectrum). Cognitive radiosare autonomous agents that have the ability to sense the external environment,learn from history and make intelligent decisions in order to optimise their perfor-mance and adjust better to the environment[9]. Another application is internetwhen a large number of people try to visit the same web page or access the sameftp server simultaneously and independently.

In this first chapter, the original Arthurs EFBP is defined and analysed. In

the end of the chapter, some basic Game Theory concepts are explained anddefined.

In chapter two, various different approaches to the El Farol Bar problem arereviewed. First it is viewed as a minority game and various techniques fromstatistical mechanics are implemented. Also strategies are redefined using a bi-nary approach as an attempt to reduce complexity. The next approach tries toovercome the restrictive strategies using an evolutionary learning algorithm andviewing the problem as a Markov stochastic process. The last approach sug-gests a very simple adaptive algorithm which is based on the maximisation of theprobability of attendance for each agent. There are no specific strategies to guide

agents during their decision process, only their intention to attend the bar.In chapter three, the last algorithm is analysed in depth. In this original work,

the stochastic adaptive learning algorithm is extended and several derivatives ofit are examined as an attempt to deal with the unfairness or low efficiency issuesthat occurred in some cases with the original algorithm. Considerable effort wasput in order to define the stationary state of one variation. Also fairness andefficiency are defined and measured both from the bar managements and agentspoint of view.

In chapter four, it is examined whether three bars in the same town wouldaffect the agents ways of entertainment. They attempt to enter the bars in a


10/66


random order, but their decision is made using the original algorithm.

1.2 The El Farol Bar Problem (EFBP)

Arthur in his paper, tries to predict the bars attendance. He assumed that thenumber of the resindents of Santa Fe or prospective clients of the bar is 100 and60 as the maximum number of the clients the bar should have so it could not beovercrowded. Then he answered the above question mentioning that if a personexpects fewer than or equal to 60 to show up, he attends the bar, otherwise hestays at home. Each person cannot communicate with others, so no one has anyinformation about the intentions of everybody else. They only have access to thenumber of the clients of the previous weeks.

But there is more than one model, based on the numbers of the previousweeks, which can be used to predict the current weeks attendance. This makesa rational solution impossible and the problem from the clients point of viewill-defined. But even if there was only one model, or due to mysterious reasonsthe clients managed to have common forecasts, the model would fail. If most ofthe people believe that the bar will be overcrowded, then they will not attendleaving the bar almost empty. The opposite will happen if most of them thinkthat the bar will have less than 60 customers.

In order to overcome problems such as this, Arthur issued the use of predic-tors. A predictor is a function that maps the information ofd-recent attendances

into a prediction for the next attendance. Arthur suggested that although thereare many predictors, each individual has a set ofk predictors in his disposal whichwill guide him through the decision process. Each client will decide whether togo to the bar or not, according to the most accurate predictor in his set, whichwill be called active predictor. Inaccurate predictors do not affect the long termbehaviour of the system, since they will rarely achieve the status of active pre-dictor, therefore will be rarely used from the clients. The predictors that wereactually used in the original problem are described in subsection 1.3.

The results of the computational experiment are shown in fig 1.1. These re-sults indicate a tendency of the mean attendance to converge to 60. It seems that

the predictors self-organise into an equilibrium pattern or ecology[1]. The activepredictors forecast above 60 with propability 0.4 and below 60 with propability0.6. In terms of game theory, the above mixed strategy is a Nash equilibrium.(Game theory terms like Nash equilibrium, strategies, repeated games and otherare defined in subsection 1.4.)

The EFBP is a congested resource problem, because the final decision of anagent (this is how clients, customers or individuals who decide whether or notthey should attend El Farol Bar are going to be referred from now on) dependson the decision of the other agents. In other words they compete for a resource.This congestion appears in many real life systems like the internet where the


11/66

1.3. MODELLING THE ORIGINAL PROBLEM 5

Figure 1.1: Bar attendance in the first 100 weeks [1].

users compete for bandwith or roads and highways. The source of congestion,in a deterministic framework like EFBP, is the inability of agents to coordinatetheir actions[10], since there is a lack of a centralised mechanism that could guidethem. In order to understand better and analyse such systems the traditionalperfect, logical, deductive rationality has given its place to bounded rationality.The agents make their decision based on incomplete knowledge, they know thatthey have access to limited information and do their best to fight this uncertainty

using a combination of rational rules and empirical evidence.

1.3 Modelling the original problem

Now that Arthurs original EFBP is explained (section 1.2) the model can besummarised as following:

Suppose that there are N agents that have to decide whether they will or notattend the bar, and L is the maximum number of clients the bar can accommo-date, before becoming overcrowded. Arthur wanted to predict the binary actionof the ith customer denoted by ai {0, 1}, where 1 stands for going to the bar

and 0 is not going. The total attendance is A =

ai.As mentioned above, the only information available to the agents is the num-

ber of the customers of the bar, the previous d weeks:

It = {A(t d), . . . , A(t 2), A(t 1)} (1.3.1)

A predictor is a function I [0, N]d [0, N]. There are (N + 1)(N+1)d

predictors [2] The number of the possible predictors that could be used is ratherlarge, that is why a selection of S predictors is used. Arthur used the followingpredictors:


12/66


1. The same attendace as k weeks ago:

A(It) = A(t k),

where k in the original models 1, 2 and 5. (k-period cycle detectors)

2. A mirror image around 50% of last weeks attendance:

A(It) = N A(t 1)

3. A fixed predictor:

A(It) = 67

4. A rounded average of the last four weeks:

A(It) = 1/44

r=1

A(t r)

5. A rounded and bound by 0 and N, last 8 weeks trend, computed using the

least squares method.:

A(It) = min([trend{A8}]+, N)[11]

Each predictor has a score associated to it, which evolves according to:

Ui,s(t + 1) = Ui,s(t) + {[Ai,s(It) L][A(t) L]}[2]

Where, s [1, S], Ai,s(It) the sth predictor for ith customer and is the Heaviside

function ((x) = 0 for x < 0 and (x) = 1 for x 0).Also the predictor s used by ith customer is given by:

si(t) = argmaxsUi,s(t)

and

ai(t) = [L Ai,si(t)(It)]

where, argmaxxf(x) is the value of x for which, f(x) gets its maximum value.


13/66

1.4. GAME THEORY DEFINITIONS 7

1.4 Game Theory Definitions

Games and solutions

A game is a description of strategic interaction that includes the constraints on theactions the players can take and the players interests, without actually specifyingthe actions that the players do take[12]. A solution is how rational people playthe game. A rational solution corresponds to a set of unique strategies (plansfor players actions) for each player that they will have no rational reasons toregret choosing[13].

Best reply strategy

A strategy for player, Ri is best reply(or best response) to Cs strategy Cj if itgives R the largest pay-off, provided that C has played the game.

Pure and mixed strategies

Pure strategy is the simplest kind of strategy, where someone chooses a specificcourse of action. However there might be a case where there is uncertaintyabout which best pure strategy to choose, due to lack of information or anyother reason. At those cases, the pure strategy is chosen following a randomprobability distribution. This type of strategy is called mixed strategy. A morestrict definition follows:

If a player has N available pure strategies (S1, S2, . . . , S N), a mixed strategy Mis defined by the probabilities (p1, p2, . . . , pN) of each strategy to be selected[13].

For M to be well defined the sum of the probabilities should be equal to one.Nash equilibrium

The outcome of strategies Ri for player R and Cj for player C is a Nash equilib-rium of the game, and thus a potential solution if Ri and Cj are the best solutionsto each other. In a way player R chooses Ri because he is expecting C to choseCj (and vice verse). When people select Nash equilibrium strategies there is noguarantee that they will be happy. There is however a guarantee that they willhave no reason to change it.

Nash equilibrium in pure strategies

Let G be a game, which involves N players. Each player chooses among a finiteset of strategies Si: That is, player i (i = 1, . . . , N ) has access to strategy setSi from which he/she chooses strategy i Si. A set of pure strategies S =(1, 2, . . . , i, . . . , N) constitutes a Nash equilibrium if and only if pure strategyi is a best reply to the combination of the strategies of all other players in S forall i = 1, . . . , N [13].

Nash equilibrium in mixed strategies

Mixed strategies are in Nash equilibrium, when there is not any strategy availablethe player could choose in order to improve his/her expected utility.


14/66


Example of Nash equilibrium:Prisoners Dilemma

This is a very known example in game theory about two arrested suspects for acrime. They are put in different cells and are promised that if someone confesses,he will be freed and used as a witness against the other, who will be sentencedto four years. If they both confess, they receive a three year sentence and ifnobody confesses they will be convicted for a year, due to lack of evidence. Theproblem is represented in a matrix form in table 1.1, using utility payoffs. For

Cooperate DefectCooperate 3,3 0,4

Defect 4,0 1,1

Table 1.1: The Prisoners Dilemma in matrix form with utility pay-offs.

each player action D (Defect) dominates action C (Cooperate). Comparingthe first numbers in each column or the second numbers in each row, shows thatno matter what player 1 (column) chooses, player 2 (row) will win by choosingDefect, since the reward (utility payoff) is higher. This demonstrates that action(D,D) is a unique Nash equilibrium.

Pareto improvement and efficiency

Pareto improvement is when an agent chooses a strategy that will have no nega-

tive effects on the others. A system is Pareto efficient or Pareto optimal, when noPareto improvements can be made. In other words, in a Pareto efficient systemno individual can make an improvement without worsening the others.

Repeated games

Features of one-shot games like Prisoners Dilemma are the total lack of coop-eration and the inability to study how each players actions affect the others astime progresses. The model of a repeated game is designed to examine long terminteraction, based on the idea that a player will take into account other players

estimated future behaviour, when planning his current strategy. This theory triesto isolate types of strategies that support mutually desirable outcomes in everygame and punishes players with undesirable behaviour. Folk theorems [12] givethe conditions under which the set of the payments, that are acquired when inequilibrium, will consist of nearly all reasonable payoff profiles. With the use ofthese theorems it is proved acceptable results cannot be sustained if player areshort-sighted and only look after their own interests.


15/66

Chapter 2

Previous Approaches to the ElFarol

2.1 Overview

In this section, various different approaches to El Farol Bar Problem are reviewed.Furthermore, three ways to extend the original model and analyse it using adifferent perspective, are introduced.

2.2 Approaches to EFBP

2.2.1 Minority Game

In this approach, results known from minority games and tools of statisticalmechanics are used for EFB model analysis. A minority game is a binary gamewhere N (N:odd) players must choose one of the two sides independently andthose on the minority side win. Players use a set of strategies, based on the past,to make their selections [14]. The greatest difference of this model with Arthursis the introduction of strategies instead of predictors. A strategy is a functiona(I) from [0, N]m like predictors, but to {0, 1} instead to [0, N]. In other wordsstrategies estimate if an agent should visit or not the bar, based on the previous

history of attendance but in terms of ones (below the level of attendance) andzeros (above it). This is of great importance since the number of strategies is

2N+1d

which is significantly less from the number of the predictors for large N.This could be denoted as:

as,i = [L As,i(It)] (2.2.1)

which will depend only on the information

(t) = {[L A(t 1)], . . . , [L A(t d)]} (2.2.2)

9


16/66

10 CHAPTER 2. PREVIOUS APPROACHES TO THE EL FAROL

There are only 22d

[2] strategies of this type, which is even less from 2N+1d

andindependent from N. Each agent is assigned S strategies drawn from the pool

with distribution: P(a) Prob{a

s,i} = a(a 1) + (1 a)(a). For this modelto work, it is considered that on average clients attend the bar with a frequencyL/N. This leads to a L/N, where a is the average of as,i.

Figure 2.1, illustrates what happens to the model when L, a and m remainfixed, while the number of agents N increases. The top graph shows that whenNa L, the attendance converges to the comfort level, while as m is increasingthe area where A L is shrinking.

Another feature of this approach is that it measures wasted resources whichappear when the bar is under or over utilised. Although A(t) equals L on aver-age, the amount of unexploited resources (A(t) < L) or over-exploited resources(A(t) > L), is equal to the distance |A(t) L|. Thus, the quality of the cooper-ation of the agents is measured by:

2 = (A L)2 (2.2.3)

where . . . is the average on the stationary state.

Figure 2.1: Behaviour of the average attendance (Top) and of the fluctuations(bottom) in the El Farol problem with L = 60 seats, a = 1/2 and m = 2, 3, 6from left to right [2].

On the bottom part of figure 2.1 it is shown that for small m, 2/N is at ispeak at the point where aN = L, while as m increases the maximum is gettingshallower until it is disappeared for m = 6. This leads to the conclusion that, forlarger values of m, the efficiency increases. More information about this modelcan be found in [2].


17/66

2.2. APPROACHES TO EFBP 11

2.2.2 Evolutionary Learning

This approach is based on the fact, that since EFBP is a case where inductive

reasoning and bounded rationality are experienced, models based on a closed setof strategies are inadequate. It introduces a stochastic element, since new modelsare created by randomly varying existing ones, as well as a selective process whicheliminates the ineffectual models.

Suppose that, like in Arthurs original experiment, each agent is given k = 10predictive models. For simplicity these models are autoregressive and their outputunsigned and rounded. According to [3] for the ith individual their jth predictorsoutput is given by:

xi

j(n) = roundaij(0) +

lijt=1

ai

j(t)x(n t) (2.2.4)

where x(n t) is the attendance on week (n t), lij is the number of lag terms inthe jth predictor of individual i, aij(t) is the coefficient for the lag t steps in thepast, and aij(0) is the constant term of the AR model.

The absolute value and the rounded output makes sure that no negative val-ues are assigned and all predictions above 100 were set to 100 according to theoriginal model. For each individual the number of lag terms is chosen uniformlyfrom the integers {1, . . . , 10} [3]. Before the prediction of the attendance of thecurrent week, each individual evolves its set of models for ten generations. This

procedure, analysed in [3], is synopsised in the following five steps:

1. As it was mentioned above, each agent chooses from 10 models. In this stagean offspring is created for each agents kth model. Lag in the offsprings fromparents j is set to be one or ten. If lij = 1 then l

ij 1 is not allowed, while if

it is equal to ten, lij + 1 is not allowed. The AR coefficients of the offspringare generated with the addition of a zero mean Gaussian variable withstandard deviation (std) equal to 0.1. Any newly generated AR coefficients,are chosen by sampling form N(0, 0.1). In the end of this stage, there areten parent and ten offspring AR models assigned to each individual.

2. In this stage, the 20 models assigned to each agent are evaluated basedon the sum of the squared errors made during the prediction of the barsattendance in the last 12 weeks.

3. For each agent, the ten models with the least error are selected and set asparents for the next generation.

4. If less than ten generations are conducted, then it starts over again fromstage 1. Otherwise the best model for each agent is used to predict currentweeks bar attendance.


18/66


5. If the maximum number of weeks is achieved the algorithm ends, if else,the predicted attendance is recorded and the simulations starts over again

from stage 1.

Figure 2.2: Mean weekly attendance for all 300 trials[3].

Figure 2.3: The attendance in a typical trial[3].

The results of this procedure being repeated 300 times, are shown in figure2.2. The mean weekly attendance for the first 12 weeks was 59.5, but for thenext 50 weeks, large oscillations appeared until week 100. From 100 to 982 (end)weeks, the behaviour of the model could be described as transient. The meanattendance was 56.3155 and std was 1.0456, which is statistically significantlydifferent (p < 0.01)[3] from the results of the Arthurs original paper. None ofthe 300 trials showed convergence to 60 and the results of each trial were similarwith those illustrated in figure 2.3.

The dynamics of this system do not provide useful results about the modelsoverall behaviour. That is why stochastic models based on Markov chains were


19/66


Figure 2.4: The normalised one-step transition matrix[3].

used. The weekly attendance is the systems state in a simple first-order randomprocess. Each of the attendance transitions from week to week for all the 300trials were tabulated and the transition matrix in fig 2.4 was formed. It was alsoproved that the system has the Markov property by executing 300 additionaltrials and recording each final weekly attendance at week 982. The cumulativedistribution of these attendances is similar to the cumulative distribution function

after the summing of the limiting probability masses calculated from raising thetransition matrix to a large number.

2.2.3 Stochastic Adaptive Learning

In Arthurs original paper each agent tries to predict how many others will attendEl Farol Bar. Each individual decides based on a set of strategies (predictors)which estimate the attendance of the Bar. Approaches 1 and 2 mentioned insubsections 2.2.1 and 2.2.2 also use similar methods, although they try to refinethe decision process. In this approach it is shown that there is no need for the

agents to use different strategies and change them trying to find which is the moreaccurate. The problem is considered in stochastic terms instead of deterministicand a simple adaptive learning algorithm is implemented. The main advantageof this method is that the algorithm is more simple and the decision process isless complex[10] since the agents do not decide based on the decision of all theothers but based only on their recent experiences in the Bar.

The agents have identical payoffs, b is the payoff for attending a noncrowdedbar, g for attending a crowded bar and f for staying at home. Without lossof generality h is considered to be zero. There are two strategies: either theagent attends the bar and receives payment b or g according to the attendance


20/66


of the bar, or he stays at home and receives no payment. In a mixed strategyequilibrium the expected payment of following one action is equal to the expected

payment of following the other. This could be denoted as following:

g P(Ni N 1) + b P(Ni > N 1) = 0

P(Ni N 1) =b

b g

where, b,g are the payoffs, M the total number of players, N the total observedattendance, Ni the observed attendance without agent i and N the maximumcapacity of an uncrowded bar.

Using a deterministic setting where agents would only use pure strategieshas the disadvantage that each agent must predict the attendance, based on the

predictions of the others. This generates results with a high level of noise and highdeviation. In this bounded rational model, the adaptive learning rule dependsonly on the history of the decisions of each single one agent.

To overcome this problem a common sense concept is taken into consideration:people in general prefer to experience good times, tend to repeat the enjoyableand minimise the unpleasant[10]. So according to this if an agent initially attendsthe bar p percent of time, he will increase this if the bar is uncrowded or willdecrease this the bar is crowded. As time goes by , agents gather informationabout the attendance of the bar, in the form of the parameter p which differsfor each ith agent. If k is the iteration counter (time), pi the probability that

ith

agent attends and the parameter that defines the degree of change of piaccording to the attendance, then the number of agents attending at time k isgiven by:

N(k) Mi=1

xi(k) (2.2.5)

where xi(k) are independent Bernoulli random variables that are equal to onewith probability pi(k) and equal to zero otherwise[10]. The following simplealgorithm describes the evolution of pi(k):

pi(k + 1) =

0, pi(k) (N(k) N)xi(k) < 0

1, pi(k) (N(k) N)xi(k) > 1

pi(k) (N(k) N)xi(k), otherwise(2.2.6)

At each timestep k the agent attends the Bar with a probability pi(k) aftertossing a biased coin. If the bar is uncrowded N(k)N is added to pi(k), while ifit is crowded N(k) N is subtracted. Also pi(k) [0, 1] since it is a probability.

If the agent does not attend the bar, xi(k) = 0 leads to pi(k + 1) = pi(k).Fornow on the algorithm 2.2.6 will be referred as partial information because theagents make their decision relying only on their previous experience. But this


21/66


algorithm could be modified in an attempt to generate results close to Arthursoriginal algorithm. In the following full information algorithm the decisions are

made after having a full record of attendance.

pi(k + 1) =

0, pi(k) (N(k) N) < 0

1, pi(k) (N(k) N) > 1

pi(k) (N(k) N), otherwise

(2.2.7)

Both these algorithms rely solely on attendance and not on payoffs. A way toimplement payoffs is setting the payoff for attending an uncrowded bar to and for attending a crowded bar. If the payoff for staying at home is set to 0, thefollowing algorithm depends on payoffs.

pi(k + 1) =

0, pi(k) sgn(N(k) N)xi(k) < 01, pi(k) sgn(N(k) N)xi(k) > 1

pi(k) sgn(N(k) N)xi(k), otherwise(2.2.8)

where sign is the following function:

sgn(N(k) N) =

1, (N(k) N) < 0

0, (N(k) N) = 0

1, (N(k) N) > 0

The general behaviour of the partial information algorithm can be seen inthe first two plots of figure 2.5. The simulation is run for M = 100, N = 60, = 0.01 and the initial probabilities follow a random uniform distribution. Aftermany iterations, the agents are separated into two groups with those of the firstgroup attending every day, while those of the other do not attend at all, or attendvery rarely. The observed attendance after many iterations is slightly below theNash equilibrium point which is always equal to the maximum capacity of theuncrowded bar. Partial information algorithm converges to a value near thatpoint N 1 and never reaches N, which is a Pareto efficient point for thatalgorithm since the only way for an agent to be in a better position is to worsen

somebody else[15].This algorithm is heavily dependant on the value of . Its convergence is

strongly affected by this parameter, for = 0.1, M = 100 and N = 60 itconverges after 800 iterations, while for = 0.001, 13000 iterations were notenough. In order to explore the behaviour and the limitations of the algorithma lighter version of the original C program was used. It is different from that inthe Appendix, since it produces no files for plots and the only outputs are theaverage and standard deviation. A conclusion that could be drawn only from thesimulations (Table: 2.1) is that the algorithm can handle relatively large numberssuch as M = 10000, N = 6000, provided that is very small, in expense of


22/66


convergence time . But the conclusion is that only hardware limitations affect itsbehaviour.

N/M 60/100 600/1000 6000/10000iterations 2 108 2 108 2 108

0.01 0.001 0.001average 59.00020 599.000142 5898.745966std 0.006524 0.052298 74.159661

Table 2.1: Behaviour of the partial information algorithm.

The comparison of the first two with the next two plots of figure 2.5 leads tothe conclusion that agents coordinate successfully when they have access to par-

tial information instead of full information. This happens due to the congestion,which comes as a result of the similar response of the agents who have availableall the information.

The results of the full information algorithm are very similar to Arthursoriginal simulation (Figure:1.1). The mean attendance is 60 but the variationnever settles down. Although the probabilities bounce randomly, they increaseor decrease simultaneously. This proves the assumption made above, that whenagents have access to full information, they tend to have the same behaviour.The behaviour of the algorithm is summarised in table 2.2

N/M 60/100 600/1000 6000/10000iterations 2 108 2 108 2 108

0.01 0.001 0.001average 60.00000 599.99999 5999.999994std 6.454943 18.371018 57.254445

Table 2.2: Behaviour of the full information algorithm.

The first and third plots in figure 2.5 show that partial information andsigns algorithm have similar behaviour. A close look in the probabilities for eachagent, reveals that the new algorithm inherits all the properties of the original.

The only difference is that standard deviation is slightly greater, but that can beexplained since the new algorithm converges after more iterations. With the sameconstants, more than 3000 iterations are needed for the algorithm to converge.The behaviour of the algorithm is summarised in table 2.3. It can be seen thatthis variation does not have any problems with large numbers, probably becauseit is more simple. Instead of using in every iteration the quantity N(k) N, ituses sign(N(k) N).

The histograms of the attendances (figure: 2.6) for partial and full informationalgorithms show that they have a completely different distribution and indicatethat attendances in full information algorithm may follow normal distribution.


23/66


0 500 1000 1500 2000 2500 30000

20

40

60

80

100

time (iterations)

attendance

0 500 1000 1500 2000 2500 30000.0

0.2

0.4

0.6

0.8

1.0

time (iterations)

probabities

foreach

agent

0 500 1000 1500 2000 2500 30000

20

40

60

80

100

time (iterations)

attendance

0 500 1000 1500 2000 2500 30000.0

0.2

0.4

0.6

0.8

1.0

time (iterations)

probabities

foreacha

gent

0 1000 2000 3000 4000 50000

20

40

60

80

100

time (iterations)

attendance

0 1000 2000 3000 4000 50000.0

0.2

0.4

0.6

0.8

1.0

time (iterations)

probabities

foreach

agent

Figure 2.5: The overall attendance and the probabilities for each of the M agentsfor partial information, full information and signs algorithms.


24/66


N/M 60/100 600/1000 6000/10000iterations 2 108 2 108 2 108

0.01 0.001 0.001average 59.000049 599.059126 5998.999914std 0.011685 0.286160 1.078615

Table 2.3: Behaviour of the signs algorithm.

Histogram of attendances in partial info

attendances

requency

50 55 60 65

0

500

1000

1500

2000

Histogram of attendances in full info

attendances

requency

40 50 60 70 80

0

50

100

150

200

Figure 2.6: histograms for partial and full info algorithms.

Except from the fast convergence and the ability to handle large numbers, itis very important that some of the quantitive characteristics of the algorithmsare examined too. It is necessary that tools to measure abstract terms like fair-ness and efficiency are developed. A fair outcome requires that agents with

similar utilities have similar probabilities to attend the bar. A more strict defi-nition would demand exact probabilities of attendance for agents with identicalpayoffs[16]. The algorithm, must also be efficient both for the agents and the barmanagement.

Before proceeding to the tools used, utility payoff, which as a term was firstintroduced in Prisoners Dilemma definition in section 1.4, must be defined. Util-ity payoffs are the rewards or penalties that each agent has, after following a purestrategy. In this case, agents who attend a not crowded bar get a reward of 1,those who attend a crowded bar get a penalty of 1, while those who stay athome get nothing (reward=0):

U(k) =

1, xi(k) = 1, (N(k) N) > 0

1, xi(k) = 1, (N(k) N) 0

0, xi(k) = 0

So, fairness and efficiency could be measured using the following methods basedon utility functions:

efficiency is determined by the average payoff. The higher the averagepayoff probability is, the higher is the average reward for each player


25/66


fairness is determined by the distribution of payoffs. A histogram whichshows the creation of groups is a clear indication that the algorithm is

unfair, similar conclusions could be drawn from the standard deviation.High values of std indicate that a significant number of agents has payoffsless than the average, while others have greater.

There are also other ways to measure fairness and efficiency, which have nothingto do with utility functions:

systems efficiency is determined by the attendances std. A system isconsidered to be efficient, when the attendances are really close to systemscapacity.

systems fairness can be also determined by the distribution of atten-

dances for each agent. At each iteration of the algorithm, the decision ofeach agent is recorded. In the end of the algorithm it easy to calculate howmany times each agent has attended the bar. Histogram or std could usedin this case also.

It is called systems efficiency, because most bar managements have interest inkeeping a stable attendance, with not many fluctuations which would leave thebar some days underutilised and some days overutilised. Although systems fair-ness gives a good picture of the choices of each agent, the original fairness is moreimportant since it is calculated from the utility payoffs. After all, it could be said,that what matters is the consequences of each agents action and the only way

to measure it, is in terms of reward or penalty.Considering systems fairness, it can be seen in figure 2.7 that both three

algorithms are not fair. Agents are divided in two categories: 41 of them rarelyattend the bar, and the rest of them attend it almost always. Although partialinformation with signs algorithm was designed as an improvement of the originalone it inherits all of its properties. One of those properties is minimal attendancefor 41 agents and maximal for the rest. It seems that from the agents point ofview, this algorithm is more fair (Table: 2.4, stdsigns < stdpartial ), but in realityit is not. This is a result of the lower profits for the always attending group andnot of a wider distribution of payoff probability.

std meanpartial 0.401591 0.4814233signs partial 0.3018023 0.36663full 0.02300045 0.00500668

Table 2.4: standard deviation and mean of payoff probabilities for original algo-rithms.

The most fair algorithm regarding the agents is the full information, indeedstandard deviation of payoff probability is nearly zero but so is average payoff


26/66


probability. Nobody makes profit, nobody is happy, but everybody is contentand they seem to have no intention of changing their strategy, all these are prop-

erties of Nash equilibrium points. But in terms of attendance full informationalgorithm is not absolutely fair, since again there is a classification of agents intwo groups, although this time casual bargoers have a much wider distribution.

partial information

probability of attendance

requency

0.0 0.2 0.4 0.6 0.8 1.0

0

10

20

30

40

partial information

payoff probability

requency

0.0 0.2 0.4 0.6 0.8

0

10

20

30

40

50

full information


requency

0.2 0.3 0.4 0.5 0.6 0.7 0.8

0

5

10

15

20

full information

payoff probability

requency

0.06 0.04 0.02 0.00 0.02 0.04 0.06

0

1

2

3

4

5

6

7

partial information with signs


requency

0.0 0.2 0.4 0.6 0.8 1.0

0

10

20

30

partial information with signs

payoff probability

requency

0.0 0.1 0.2 0.3 0.4 0.5 0.6

0

5

10

15

20

25

30

Figure 2.7: fairness and efficiency plots for original algorithms.

It is obvious that these results are open to different interpretations, and varyaccording to the objectives set each time. In general partial information is themost appropriate algorithm if the objective is to maximise the average payoff foreach agent. It might be the most unfair of all, but even full information is notfair enough to justify the minimal profits.

Considering systems efficiency, it can be seen in the cumulative plot of therunning std for the original algorithms (Figure: 2.8), that partial informationhas the least standard deviation and a negative slope. On the contrary full


27/66


information has a relatively high std, with no signs of improvement.

0 500 1000 1500 2000 2500 3000

0

1

2

3

4

5

6

original algorithms

iterations

stan

ar

ev

aton

partialfullsigned

Figure 2.8: standard deviation off attendance for original algorithms.


28/66



29/66

Chapter 3

Analysis and Extension of theStochastic Algorithm

3.1 Overview

None of the previous three variations was proved to be fair and efficient. Fairnessand efficiency seem to be contradicting terms. In this chapter, new variationsof the stochastic adaptive learning algorithms that were presented in subsec-tion2.2.3, are examined. The variations could be divided into two categories: thetaxing/payoff algorithms which have to do with an adaptive change of andthe budget algorithms, where the agents are forced not to attend the bar afterhaving attended it consecutively for a number of weeks, as a result of inadequateresources.

3.2 Taxing/Payoffs Algorithms

3.2.1 Overview

The basic idea behind this algorithm is that in the system there are three typesof agents. The selfish who attend a bar regardless if it is crowded or not, those

who attend an uncrowded bar and those who never attend the bar. Partial infor-mation maximises the probability of attendance incrementing a small quantityto the original probability when the bar is uncrowded or subtracting it from theoriginal probability when the bar is crowded. This quantity depends on whichis constant. In this version (equations: 3.2.1), the parameter tax is inserted sothat selfish behaviour can be penalised. When a selfish agent attends a crowdedbar, the quantity that is subtracted from the original probability is ctax timesgreater than in the partial information algorithm.

Simulations indicated that for rather large values of ctax (ctax 8) the baris underutilised (mean 58). The behaviour of this algorithm is summarised in

23


30/66

24 CHAPTER 3. ANALYSIS AND EXTENSION OF THE STOCHASTIC ALGORITHM

table 3.1 and in the first two plots of figure 3.2.

N/M 60/100 600/1000iterations 2 108 2 108

0.01 0.001average 59.072334 598.99924std 0.259638 0.052933

Table 3.1: Behaviour of the tax algorithm with ctax = 6.

This algorithm could be more fair for the agents if those who do not attendthe bar were encouraged to attend it. By definition of the algorithm this cannotbe done through the parameter , since there is no change for the probabilities

pi of those who choose or perhaps are forced not to attend.A rather aggressive way to change this, is to multiply these probabilities with

a number close to 1. This way in every iteration the attendance probability forall those that do not attend will slightly increase and finally they will decide toattend the bar. That leads to the following equations for the partial algorithmwith taxing:

pi(k + 1) =

0,

where ppi(k) tax(N(k) N)xi(k) < 0

1,

where ppi(k) tax(N(k) N)xi(k) > 1

pi(k) ppi(k) tax(N(k) N)xi(k), otherwise(3.2.1)

p and tax are defined as following:

p =

1, xi(k) = 1

cp, xi(k) = 0(3.2.2)

where i = 1 . . . M and c1 a constant which affects the degree pi is changing forthose agents that do not attend (xi(k) = 0)

tax =

, N(k) N

ctax, N(k) > N(3.2.3)

where c2 > 1 is a constant that defines the degree pi is changing for selfish agentswho insist to attend even if the bar was crowded the previous time.

The behaviour of this algorithm is summarised in the 3rd and 4th plots offigure 3.2. Simulations showed that in a reasonable taxed system (ctax 3),large values of mp result in an underutilised bar (Table:3.2. An interpretation ofthis is that, although people are encouraged to attend, taxation prevents them


31/66

3.2. TAXING/PAYOFFS ALGORITHMS 25

from doing so. Of course as someone would expect minimal taxation results inan overcrowded bar. Another interesting feature of this algorithm is, its almost

rapid convergence, even when it is compared with the original.ctax 6 8 6 1 6cp 1 1.01 4 4 n/a (full info)average 58.929000 59.682000 49.002333 70.489000 58.165750std 1.382448 2.200574 0.437992 4.642286 4.851578

Table 3.2: Simulation results for various values ofctax, cp.

For the full information tax algorithm there is no need of the parameter mp:

pi(k + 1) =

0, pi(k) tax(N(k) N) < 0

1, pi(k) tax(N(k) N) > 1

pi(k) tax(N(k) N), otherwise

(3.2.4)

The histograms of the attendances (figure 3.1) for partial and full informationtaxing algorithms have much in common with those of the original algorithmsand very few differences. Both full information algorithms seem to follow thenormal distribution, although in the partial information taxing algorithm, value59 is dominating.

Histogram of attendances in modified tax partial info

attendances

requency

10 20 30 40 50 60 70

0

500

1000

1500

2000

2500

3000

Histogram of attendances in tax full info

attendances

requency

40 45 50 55 60 65 70

0

50

100

150

200

250

Figure 3.1: histograms for partial and full info taxing algorithms.

In [10] the solutions and convergence properties for the original algorithmsare examined. After defining the equilibrium points, they are used to derive aset of deterministic ODEs. The results concerning the nature of p(k) in steadystate, are used in the analysis of the convergence behaviour and the stabilityproperties of those ODEs. Although the outputs of the simulations of the originalalgorithms can be reproduced and proved analytically, there are many reasonsfor making this procedure extremely difficult for the other variations.

From the last plot in figure 3.2 there is an indication that probabilities mightconverge to a stationary state around 0.6. Although full information algorithmsare simpler cases, the adaptive behaviour of tax(k) is a source of complexity.


32/66


0 500 1000 1500 2000 2500 30000

20

40

60

80

100

time (iterations)

attendance

0 500 1000 1500 2000 2500 30000.0

0.2

0.4

0.6

0.8

1.0

time (iterations)

probabities

foreacha

gent

0 500 1000 1500 2000 2500 30000

20

40

60

80

100

time (iterations)

attendance

0 500 1000 1500 2000 2500 30000.0

0.2

0.4

0.6

0.8

1.0

time (iterations)

probabities

foreach

agent

0 500 1000 1500 2000 2500 30000

20

40

60

80

100

time (iterations)

attendance

0 500 1000 1500 2000 2500 30000.0

0.2

0.4

0.6

0.8

1.0

time (iterations)

probabities

foreach

agent

Figure 3.2: The overall attendance and the probabilities for each of the M agentsfor partial information tax, modified tax algorithms and the full information taxalgorithm.


33/66


In the following a set of equations that describe the stationary state behaviourof the probabilities will be derived pi(k).

The number of agents attending at time k is N(k) =M

i=1 xi(k) and xi(k) areindependent Bernoulli trials given by:

xi(k) =

1, pi(k)

0, 1 pi(k)(3.2.5)

In the stationary state we have:

pi(k + 1) = pi(k) i N

Taking the expectation values results to:

E(pi(k + 1)) = E(pi(k)) E(pi(k + 1)) E(pi(k)) = 0

E(tax(N(k) N)) = 0

N(k) is the number of agents attending at time k is equal toM

i=1 xi(k), wherexi(k) are independent Bernoulli trials.

The expectation value is calculated using the following formula:

E(XY) =x

fx,y(x, y)

where X,Y random variables and fx,y(x, y) joined mass function. In this caseY = tax and X = N(k) N. By definition, Y = tax is a function of N(k). So,Y = g(X) and fx,y(x, y) = fx(x, g(x)) = fx(x). Hence:

E(tax(N(k)) (N(k) N)) =

N(k)N

(N(k) N) tax(N(k)) fN(k)(N(k))

c

N(k)>N

(N(k) N) fN(k)(N(k)) +

N(k)N

(N(k) N) fN(k)(N(k))

c N(k)>N

N(k) fN(k)(N(k)) cN N(k)>N

fN(k)(N(k))+

N(k)N

N(k) fN(k)(N(k)) N

N(k)N

fN(k)(N(k)) (3.2.6)

Kolmogorov-Smirnov test indicated that z = N(k) might follow Poisson distri-

bution f(z) =z

z!e, with = E(z). Hence:

E(f(z)) =z

zf(z) =zN

zf(z) +z>N

zf(z) =


34/66


zN

zf(z) = z>N

zf(z) (3.2.7)

and:z

f(z) = 1 zN

f(z) +zN

f(z) (3.2.8)

Thus, after applying equations (3.2.7) and (3.2.8) in (3.2.6) we get the following:

cz>N

zf(z) cNz>N

f(z) + ( z>N

zf(z)) N(1 z>N

f(z)) = 0 (3.2.9)

Since = 0, (3.2.9) becomes:z>N

zf(z)(c 1) Nz>N

f(z)(c 1) = N (3.2.10)

Since f(z) follows the Poisson distribution and after replacing z with N(k) and:

A(, M) =M

z>N

N(k)

N(k)!e

and

B(, M) =

Mz>N

NN(k)

N(k)! e

we get the final equation:

B(, M)(c 1) NA(, M)(c 1) = N (3.2.11)

After solving equation (3.2.11), a should be found, which for M = 100and c = 8 should be equal or close to the average of attendances. With anumerical approximation of available, the values of pi in equilibrium pointcould be determined following a very simple procedure:

= E(N(k) = E(

Mj=1

xj(k)) = E(

Mj=1

xj(k)) =

Mj=1

E(xj(k))

Since xi(k) are Bernoulli trials with = pi(k) then E(xi(k)) = pi(k). Hence:

=

Mj=1

pj(k) (3.2.12)

It must be noted that c = 1 in equation (3.2.11) leads toM

j=1pj = N. Sofor this value of c the original result, mentioned in [10] was recovered.


35/66


3.2.2 Fairness and Efficiency

Like partial information with taxing algorithm also here two groups of agents

are created. Again 41 of them hardly ever attend and 59 almost always and forthose who do attend the profits are very close to 1 while for those who do not,profits are close to 0. In the modified version, although not attending agents areencouraged to attend with a slight increase of the probability pi the results arenot significantly different. This time 40 stay at home almost always and 60 goto the bar. A larger pi will lead to more agents attending the bar, but there isalso the potential danger of overcrowding. The second algorithm has a slightlybetter behaviour since most agents have a marginal raise in the profits and thedifferences in std are of no significance (Table: 3.3). Although these algorithms

std meanpartial info with taxing 0.4748791 0.57201modified partial info with taxing 0.4863357 0.5930867full info with taxing 0.04937645 0.1766167

Table 3.3: standard deviation and mean of payoff probabilities for taxing algo-rithms.

seem to be as unfair as the original, there is an indication that they might beslightly more efficient. The average value of payoff probabilities are larger in thetaxing variation.

But the suprise comes from the full information with taxing algorithm. Asit can been seen in figure 3.3 one group has 99 agents with probabilities of atten-dance varying from 0.46 to 0.76 and only one agent never goes to the bar. Alsocompared to the original full information algorithm, this one is more efficientfrom the agents point of view, since the payoff probability for those who attendvaries from 0.08 to 0.27 and has an average of 0.1766167. On the contrary in theoriginal full information, the payoff probability varies from 0.05 to 0.04 with anaverage of 0.005 (figure: 2.7, table: 2.4). It can be said that the improvementis quite significant.

As expected, the first two algorithms are better than the third in terms of

systems efficiency, although are worse when compared with original ones. Thathappens because of the relatively high distribution. The two first have almostidentical behaviour, although the effects of the slight increase ofpi in the 2nd al-gorithm, are visible in (Figure: 3.4). The behaviour of the third taxing algorithmis almost identical with the behaviour of the original.


36/66


partial information with taxing


requ

ency

0.0 0.2 0.4 0.6 0.8 1.0

0

10

20

30

40

50

partial information with taxing

payoff probability

requ

ency

0.0 0.2 0.4 0.6 0.8 1.0

0

10

20

30

40

50

modified partial information with taxing


requency

0.0 0.2 0.4 0.6 0.8 1.0

0

10

20

30

40

50

modified partial information with taxing

payoff probability

requency

0.0 0.2 0.4 0.6 0.8 1.0

0

10

20

30

full information with taxing


requency

0.0 0.2 0.4 0.6 0.8

0

2

4

6

8

full information with taxing

payoff probability

requency

0.00 0.05 0.10 0.15 0.20 0.25

0

1

2

3

4

5

6

7

Figure 3.3: fairness and efficiency plots for taxing algorithms.


37/66

3.3. BUDGET ALGORITHMS 31

0 500 1000 1500 2000 2500 3000

0

5

10

15

taxing algorithms

iterations

stan

ar

eviation

taxing partialmodified taxing partialtaxing full

Figure 3.4: standard deviation off attendance for taxing algorithms.

3.3 Budget Algorithms

3.3.1 Overview

In this algorithm agents have a limited budget which does not allow them toattend the bar every night. In another version (modified budget algorithm) ofthis algorithm they are forced to stay at home after attending the bar severalconsecutive nights. This could be denoted as following:

i [1, M], k [1, n], ifb

j=0

xi(k j) = b

xi(k + 1) = = xi(k + w) = 0 (3.3.1)

where b N is the constant that determines how many nights i-agent can attendthe bar consecutively and w N the maximum nights an agent must stay indoorsafter having attended the bar for b nights

The algorithm settles down after almost 2000 iterations, but it is very CPUintensive. In table 3.4 there are some results as well as the time needed for them.Although the so far algoririthms have given results for 2 108 iterations in almost3 days, this one for 2106 after 2 weeks has not given any result yet. The reasonof this increase in CPU power is that instead of using an array of attendances in

terms of 1 and 0 which is is initialised in every iteration and its size is equal to thenumber of the agents, it uses a matrix which contains the attendances for everyiteration. That is a matrix with number of agentsiterations elements. Despitethe fact that this algorithm is the most fair and efficient of all (see subsection:3.3.2), it might not be suitable for networks of hundreds of nodes, or in cases oflarge numbers, because in every iteration the columns of this matrix are scannedfor sequences of 1s. The next element or sequence of elements in the modifiedbudget is set to 0 reflecting the inability of the agents to attend the bar.

As the ratio of budget to nights staying at home reaches 1, the bar becomesunderutilised. It is as if the town is hit by recession: when people experience


38/66


N/M 60/100 60/100 60/100iterations 5000 4 104 2 106

0.01 0.01 0.001average 59.133400 58.161900 n/a yetstd 0.999281 1.673139 n/a yettime 6 sec 5 min 2 weeks...

Table 3.4: Behaviour of the modified budget algorithm for budget=10 andstaying at home=6.

financial difficulties they do not attend bars. For values of this ratio much lessthan 1 the algorithm settles down almost instantly. If the budget is large theresults are similar to the original partial information algorithm (Table: 3.5 and

Figure: 3.6).

budget 10 100 10 10 10waiting 0 0 3 10 6average 59.385200 59.287200 59.292000 48.981200 58.165750std 2.916301 1.132950 1.861538 7.423848 4.851578

Table 3.5: Simulation results for budget algorithms.

Histogram of attendances in budget with b=10

attendances

requency

35 40 45 50 55 60 65 70

0

200

400

600

Histogram of attendances in modified budget with b=10,w=6

attendances

requency

40 50 60 70

0

200

400

600

800

1000

Figure 3.5: histograms for budget algorithms (b = 10, w = 0/6).

Although budget algorithms are a variation of the partial information algorithm,their histograms (figure: 3.5) are quite different. Especially the first which has awider distribution than the original one.


39/66


0 1000 2000 3000 4000 50000.0

0.2

0.4

0.6

0.8

1.0

time (iterations)

probabities

foreacha

gent

0 1000 2000 3000 4000 50000

20

40

60

80

100

time (iterations)

attendance

0 1000 2000 3000 40000.0

0.2

0.4

0.6

0.8

1.0

time (iterations)

probabities

foreach

agent

0 1000 2000 3000 40000

20

40

60

80

100

time (iterations)

attendance

0 1000 2000 3000 4000 50000.0

0.2

0.4

0.6

0.8

1.0

time (iterations)

probabities

foreach

agent

0 1000 2000 3000 4000 50000

20

40

60

80

100

time (iterations)

attendance

Figure 3.6: The overall attendance and the probabilities for each of the Magents for budget (budget= 10) and modified budget (budget= 10, wait=3 , 6)algorithms.


40/66


3.3.2 Fairness and Efficiency

In the first variation of the budget algorithms the formation of the two groups

persists, although it is clear that this is the fairest partial information algorithmconsidered so far, since only 34 agents decided to stay at home. The modifiedversion of this algorithm improves it by resolving most of fairness and efficiencyissues. Indeed, the more days agents have to stay at home in order to saveresources, more people have thew opportunity to attend. The number of thealways no attending agents goes from 23 in the 2nd example, to 6 in the 3rd one.

The improved version is also more efficient for the agents (Figure: 3.8). Forthe first time, an algorithm can become fairer without becoming less efficient andvice versa. As days at home increase, standard deviation diminishes and payoffprobability is raising (Table: 3.6). Like it was expected, forcing agents to stay

at home after systematic visits, gives a chance for those that would have neverthought to attend it and increases everybodys profits. The only setback seemsto be that, this increase of profits is made in expense of the bar management,since the average of attendance is around 58.

days waiting std mean0 0.0953156 0.1320563 0.232645 0.423366 0.1286092 0.471414

Table 3.6: standard deviation and mean of payoff probabilities for budget algo-

rithms.

0 1000 2000 3000 4000 5000

0

2

4

6

8

10

budget algorithms

iterations

stan

ar

eviation

0 days at home3 days at home6 days at home

Figure 3.7: standard deviation off attendance for budget algorithms.

The last two algorithms are more efficient when the interests of the bar areconsidered. They have almost identical behaviour, although in figure: 3.7 it canbe seen that the algorithm behaves slightly better for smaller values of the stayingat home constant. The std of the attendance for the original budget algorithm,seems to be stable, with no indication of improvement. The only problem is that


41/66


budget algorithm


requency

0.0 0.2 0.4 0.6 0.8

0

10

20

30

40

50

budget algorithm

payoff probability

requency

0.00 0.05 0.10 0.15 0.20

0

5

10

15

budget algorithm with 3 days at home


reque

ncy

0.0 0.2 0.4 0.6 0.8

0

10

20

30

40


payoff probability

reque

ncy

0.0 0.1 0.2 0.3 0.4 0.5 0.6

0

5

10

15

20

25

30

35



requency

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

0

10

20

30

40


payoff probability

requency

0.0 0.1 0.2 0.3 0.4 0.5

0

10

20

30

Figure 3.8: fairness and efficiency plots for budget algorithms.

although the fluctuations in standard deviations are lower when agents stay manydays at home, the bar is underutilised. Considering systems fairness and basedon the histograms of figure:3.5 the first case, where w = 0 is the fairest, since thedistribution of the attendances is wider.


42/66



43/66

Chapter 4

Case Study: Multiple Bars inSanta Fe

What if the citizens of El Farol had a chance to choose from a pool of more thanone bar? Would the same algorithms behave differently? In this chapter, the caseof three bars is studied. The main intention is to see, if the extremely efficientpartial information algorithm can become more fair when three bars instead ofone are available. Also this problem tends to be more realistic, since in most casesmore than one choices exist. The rule used in the decision process by each agent,is based on a rather simplistic assumption that everybody prefers going out tostaying at home. The decision is similar to the process described in subsection

2.2.3. Only in this case, instead of using a biased coin to decide whether to attendthe bar or not, the agent uses three distinct and slightly biased coins. Each coincorresponds to a bar, for example, if the first coin shows he should not attend,he tosses the second, if it shows not to attend again, he tosses the third and if itshows not to attend he stays at home.

Instead of using only one fixed sequence of the three bars, 3! were used. Ateach iteration of the algorithm, after the probabilities of attendance have beencalculated, a sequence of bars is chosen randomly.

Simulations showed similar results to the one bar version. In order to achieveconsistency the 6/10 ration is preserved and again is set to be 0.01. Santa Fe is

now consisted ofM = 300 citizens and has three bars, each one having a capacityof N = 60 for each one of them. Overall Nall = 180 which is the 60% of 300. Asit is seen in table 4 and in figures 4.1 and 4.2 the behaviour of the algorithm isalmost identical.

37


44/66

38 CHAPTER 4. CASE STUDY: MULTIPLE BARS IN SANTA FE

0 500 1000 1500 2000 2500 30000

20

40

60

80

100

time (iterations)

attendanceforBar1

0 500 1000 1500 2000 2500 30000.0

0.2

0.4

0.6

0.8

1.0

time (iterations)

probabities

foreach

agent

0 500 1000 1500 2000 2500 30000

20

40

60

80

100

time (iterations)

attendanceforBar2

0 500 1000 1500 2000 2500 30000.0

0.2

0.4

0.6

0.8

1.0

time (iterations)

probabities

foreach

agent

0 500 1000 1500 2000 2500 30000

20

40

60

80

100

time (iterations)

attendanceforBar3

0 500 1000 1500 2000 2500 30000.0

0.2

0.4

0.6

0.8

1.0

time (iterations)

probabities

foreach

agent

Figure 4.1: The overall attendance and the probabilities for each of the M agentsfor each bar.


45/66

39

0 500 1000 1500 2000 2500 30000

50

100

150

200

250

300

time (iterations)

overallattendance

Figure 4.2: The cumulative attendance for all three bars.

= 0.01 mean stdAll Bars 177.689333 3.052926

Bar 1 59.190000 2.360598Bar 2 59.201667 2.207228Bar 3 59.297667 2.524133

Table 4.1: mean and std for all bars.

The shared characteristics with the original algorithms continue when pro-

ceeding to the fairness and efficiency analysis. Indeed, figure 4.3 indicates thatthere are no agents that would want to attend more than one bar regularly. Eachbar has its set of very devoted customers, despite the randomised decision process.This plots (barplots) are used in [17] as a token of unfairness. This unfairness iseven more clear in payoff probability figures (fig:4.4), where it is seen that 120agents have payoff probability close to 0, while 179 are very close to 0 .8. It seemsthat even in a town with three bars, agents insist on behaving selfishly, whenusing a partial information based algorithm.


46/66


Bar 1

agents

attendance

probability

0.0

0.2

0.4

0.6

0.8

1.0

Bar 2

agents

attendance

probability

0.0

0.2

0.4

0.6

0.8

1.0

Bar 3

agents

attendance

probability

0.0

0.2

0.4

0.6

0.8

1.0

Figure 4.3: Agent attendances for each bar


47/66

41

Agent payoffs for Bar 1

payoff probability

requency

0.0 0.2 0.4 0.6 0.8

0

50

100

150

20

0


payoff probability

requency

0.0 0.2 0.4 0.6 0.8

0

50

100

150

2

00


payoff probability

requency

0.0 0.2 0.4 0.6 0.8

0

50

100

150

200

Agent payoffs for all bars

payoff probability

requency

0.0 0.2 0.4 0.6 0.8

0

20

40

60

80

Figure 4.4: payoff probabilities for each bar and cumulative payoff probabilityfor the 3-bar version


48/66



49/66

Chapter 5

Conclusions

As it was mentioned in the first chapter the objective of this dissertation wasto analyse the El Farol Bar Problem, view the differences between various ap-proaches and try to expand some of them. It is a typical complex adaptive system,in which agents interact with each other competing for the same resource.

The first three solutions insist on Arthurs original concept of predictors withminor variation. A closer look reveals, that the only difference they have isthe definition of predictors. Arthur uses a close set of predictors, while Challetet all [2] introduce the use of binary strategies in an attempt to simplify theproblem and Fogel et all [3] use evolutionary learning algorithms for a moreprecise, although slightly more complicated, definition of the predictors.

Arthurs belief that any solution should require agents that pursue differentstrategies[18] is abandoned in the stochastic adaptive solution, proposed by Bellet all in [10]. In partial information algorithms and their variations, there isno need for the agents to make prediction about the attendance of the Bar.They make their decisions based only on their own previous experience. In thefull information algorithm, agents do know the full record of attendances. Thebehaviour of this algorithm is similar to the first three, although it does notrequire the use of predictors

Partial information algorithms are more efficient and always converge witha very low standard deviation. Unfortunately this happens due to their unfair

nature, which resembles the greedy concept behind this algorithm. Agents whotake decisions based only on their previous experiences act extremely selfishlywhen competing for the same resource. Fairness and efficiency could be measuredwith the use of utility payoffs. The results showed that in partial information al-gorithms two groups of agents were formed. Those who almost always attended,having an average payoff very close to 1.0 and those who almost never attended,who had a payoff near 0. It is clear that these behaviour is very unfair, especiallywhen compared to full information results, but it is also efficient since the aver-age utility payoff for all agents was higher than the one of the full informationalgorithm.

43


50/66

44 CHAPTER 5. CONCLUSIONS

Another negative aspect of these algorithms, is that they converge to N 1,leaving always El Farol Bar slightly underutilised. But even in that case the losses

for the bar management are less than the full information algorithm because ofits almost zero standrad deviation. Partial information algorithms after theirconvergence have almost zero std and mean 59 while full information has a meanof 60 but also std of 6.5.

This algorithm could be divided in three parts. First comes the probability ofattendance, which is affected by the parameter and determines the attendances.In order to introduce fairness and efficiency, each single one of those parts wasaltered.

Directly changing the probabilities of attendance which exceeded a threshold,did not give better results. In this variation of the stochastic adaptive algorithmsafter several iterations another value of probability was assigned, if they werelarger than an upper bound, or lower than a low bound. Despite the fact that thisalgorithm had countless variations, none of the simulations showed significantlygood results. This aggressive way of changing the probabilities created moreunfairness and less efficiency and that is why the results are not included in thereport, although the code is included in the Appendix.

The results were much better when taxation was implemented. Althoughthe partial information algorithms did not become more fair with the use ofan adaptive , they become more effective. Full information algorithms weremore fair but also significantly more effective. But probably the best resultscame when agents were forced to stay indoors after a succession of attendances.

Partial information with budget algorithm gave almost perfectly fair results andslightly less efficient than the original one. Although the results of the partialinformation with budget were more than acceptable, this algorithm is very CPUintensive and demanding. Also the use of a budget meant that the system wasno more a decentralised one, where agents made their own decisions. Althoughagents still do not need access to full record, a central mechanism which willensure they do not spent their budget and that they will stay indoors as manydays as needed, is necessary.

Finally, the problem was reformulated, so it could include three bars. Eachagent was randomly assigned one sequence of bars, which changed in every iter-

ation of the algorithm. Instead of deciding whether to go to El Farol or stay athome, he had to decide whether to go to the first bar of the sequence, second,third or stay at home. Despite the randomised sequence, the results were almostsimilar to the previous results.

In all cases, it was shown that the original algorithms could be extended inmany ways. There is much work left to be done in this field. Papers [10], [17], [18]and [15] have mentioned only the first three variations of the stochastic adaptivelearning algorithms (partial, full, partial with signs) and examined their conver-gence properties. The most important is that they could be used accordingly inorder to fulfil a very versatile set of tasks. They could be combined or used to


51/66

45

calculate attendances in a network of many nodes (bars) which accepts dynami-cally evolving population. The C code used to generate all the results, is modular

enough so even more variations can be included. Also the mathematical formu-lation of the algorithms make it possible to study their convergence properties,following the steps defined in the original paper[10] and further extended in thisdissertation for the tax algorithms.


52/66

46 CHAPTER 5. CONCLUSIONS


53/66

Appendix A

C Code for the El Farol BarProblem

/***************************************************************************/

/* Program that solves El Farol Bar problem based in paper: */

/* Coordination Failure as a Source of Congestion in Information Networks */

/* Ann M. Bell, William A. Sethares, James A. Bucklew */

/* IEEE Trans. Signal Processing, 2003 */

/***************************************************************************/

/***************************************************************************/

/* This source code is a part of MSc dissertation: */

/* The El Farol Bar Problem for next generation systems */

/* by Athanasios Papakonstantinou *//***************************************************************************/

/* In order to compile and run this program, type the following from

the directory the source is saved:

gcc -Wall -lgsl -lgslcblas -lm demo3.c support.c modules.c debug.c */

/* Bugs and known issues:

i. Unfortunately the GSL C library is needed for the random generators.

You can install it from yours distribution package management system. This

remains to be fixed in the future, since this code intended to show quick

results of many algorithms. It is No 1 priority, because I want the program

to be portable.

ii. To generate plots you must run the ef.sh script provided, and install the

plotutils package.

iii. Some algorithms can be very demanding. Especially budget algorithms. You

should not use more than 20000 iterations. This is more a feature than a bug

iv. This is not optimised code, the results for 2*10^8 iterations were calculated

from a variation of this code, which does not produce any files for plotting

v. There is a total lack of substantial users interface. There was no time for

47


54/66

48 APPENDIX A. C CODE FOR THE EL FAROL BAR PROBLEM

that, see Bugs and known issues bullet "i". This will be fixed in the future but

only if the code is uploaded in the internet

vi. No other bugs are known. This code was compiled with gcc 4.1.1 in aGentoo GNU/Linux box with 2.6.17-gentoo-r4 kernel and with gcc 3.4.6 in

a MandrakeLinux 10.1 box with 2.6.8.1-12mdksmp kernel. */

#include

#include

#include

#include

#include

#include "elfarol.h"

/*** *** prototypes section *** ***/

/* function to write integers in files, used for plotting the attendaces,mean and running average of them */

void writei_mean (char[], int, int[]);

/* function to write floats in files, used for plotting probabilities */

void writed(char[], int,int, int, float[]);

/* function that is used in fairness plots! */

void write_fair(char[],int[], int, int);

/* module used in partial and full algorithms */

float 1eq5 (float, float, int, int);

/* module used in signed partial algorithm */

float eq7 (float, float, float, int, int);

/* module used in taxing algorithms */

float tax (float, float, float, int, int);

/* utility function - returns {-1 0 1} */

int utility (int, int,int);

/*** ***fi

Documents

Papakonstantinou el farol