Distribuição Beta

Embed Size (px)

Citation preview

  • 8/6/2019 Distribuio Beta

    1/14

    TECHNOMETRICS

    Maximum ikelihoodEstimationof theParameters f the Beta DistributionfromSmallestOrderStatisticsR. GNANADESIKAN, R. S. PINKHAM* AND LAURA P. HUGHES

    Bell Telephone Laboratories, Incorporated

    Numerical methods, useful with high-speed computers, are described for obtainingthe maximum likelihood estimates of the two (shape) parameters of a beta distribu-tion using the smallest M orderstatistics, 0 < ul < U2< *- < UM , in a randomsample of size K( >M). The maximum likelihood estimates are functions only of theratio, R = M/K, the Mth ordered observation, UM , and the two statistics,G1 = [II=l ui]l'm and G2 = [II (1 - ui)]liM. For the case of the complete sample(i.e., R = 1), however,the estimates arefunctionsonly of G, and G2, and hence, forthis case, explicit tables of the estimates are provided. When R < 1, the methodsdescribed depend crucially for their usefulness on the availability of a high-speedcomputer.Some examplesare given of the use of the proceduresdescribed or fitting beta dis-tributionsto sets of data. In one example,the fit is studied by using beta probabilityplots.

    1. INTRODUCTIONThe family of beta distributions is related to many of the common statisticaldistributions, including the t, F, binomial and negative binomial distributions.Also, the beta distribution has been used in certain Bayesian applicationsas a priordistribution for the binomial parameter, p. [See,for example, Anscombe(1961).] Chaddha has, in some unpublished reports, suggested the use of a

    special case of the beta distribution as a model in Queueing Theory and inreliability applications. The beta distribution may also serve as an appropriateapproximation to fit the distribution of the probability integral transformation,when estimates of parameters are used in the transformation so that the trans-formed variable may no longer have the uniform distribution. [See David andJohnson (1948).]The present paper is concerned with the maximum likelihood estimationof the two parameters of an underlying beta distribution using the smallestobservations in a random sample. The case of maximum likelihood estimationfrom the complete sample is included as a special case.The formulation of the problem in terms of smallest order statistics appearsto be natural for reliability applications where the data often arrive in anordered fashion starting with the smallest observation. Also, in many applicationsReceived June 1966; revised February 1967.* Professor R. S. Pinkham is now at Stevens Institute of Technology.

    607

    VOL.9, NO. 4 NOVEMBER1967

  • 8/6/2019 Distribuio Beta

    2/14

    R.GNANADESIKAN,. S. PINKHAM ND L.P. HUGHESof probability plotting, such as in the analysis of variance, when the data tobe plotted is also to provide the basis for estimating the necessary parameters,a reasonable and natural formulation of the estimation problem appears tobe one in terms of smallest order statistics. [cf., for example, Wilk andGnanadesikan (1961, 1964) and Wilk et al. (1963).] In other applications,one may wish to base the estimation on the "middle" or "larger"observationsin the sample. [cf. Wilk et al. (1966).]The results of the present paper are germane to an interest in fitting betadistributions. Section 2 of the paper gives a statement of the problem withnecessary notation. In Section 3 the likelihood equations are derived and inSection 4 numerical methods employed in solving these are described. Someexamples of application of the estimation procedure are given in Section 5.Section 6 consists of summary remarks and general discussion. An appendixcontains the numerical approximations used in solving the likelihood equations.A table of the maximum likelihood estimates for the complete sample case isincluded in Section 3.

    2. NOTATION AND STATEMENT OF THE PROBLEMConsider the ordered observations, 0 < U < 2 < * *..* < UK < 1, resultingfrom a random sample of size K from a beta distribution with density

    (U; a, f) = B( uA-1(1 - U)~-1 0 < u < 1, a > 0, F/ > 0. (1)It is desired to estimate a and f simultaneously, utilizing the M( 0,can be transformed to the above framework by setting u = v/(l + v).

    3. MAXIMUMLIKELIHOOD STIMATIONThe likelihood of a and B given the M smallest observations, u, u2, ***, UMis

    2(a, ) K B(f, ) u-(1 - u,)'- t- (1 - t)-1 dt (2)The likelihood equations, a log ?/aa = 0 and a log ?/0/o = 0, reduce to

    R In G,1= (a) - (a+ B) - (1 - R) I1(UM;, (3)?(uM ;d(a,3)

    608

  • 8/6/2019 Distribuio Beta

    3/14

    PARAMETERSFTHEBETADISTRIBUTIONROM MALLESTRDER TATISTICSand

    I(um ;a,f3)inG='J(G3) -I(a+13) -(1 -_R) I(uM ;a,/3) (4)where

    (M \1/M / M 1/MR =M/K, G,= u,) ,=2 I(1-- Ui))dx r) r(x)

    I(x; a, /3) = f t-(1 - t)-' dt, (0 < x < 1),Il(x; a, ) = ddI(x; , ) = t-'(1 t)'- In t dt, (0 < x < 1),

    andI2(x; a, A3) -(X a, ) = tl(f -t)o- in (1 -) dt, (0 < x < 1).

    Clearly the maximum likelihood estimates, a and /, depend on UM , G1, G2and R. In general, therefore, tabulation of the roots of equations (3) and (4)would involve four-way tables which would be too unwieldy for practicalpurposes. However, in the special case when R = 1, (i.e., the complete samplecase), the likelihood equations simplify to

    In G, = T(a)- ~T(a + ) (5)In G. = T03) - ~T(a + ). (6)

    A tabulation of the roots of these equations, in terms of (observed values)G( and G2 , is provided in Table 1. While the maximum likelihood estimatesfor specific values of G, and G2 , and some indication of the general patternof their values, may be gleaned from Table 1, yet the grid shown for G1 and G2is so coarse that linear interpolation in the table would yield accuracies of atmost only two significant digits and not even single-significant-digit accuracywhen G, and G2have extremely disparate values. More detailed tables, using agrid for G, and G2 which is fine enough for linear interpolation to be adequatefor three-significant-digit accuracy, are available but have not been includedhere. The problem of tabulating values of some simple single-valued functionsof the estimates, which are more nearly linear in G1 and G2 , instead of theestimates themselves is under continuing study.The singularities in the equations when either G1 or G, is zero, led to con-sidering 0.01 as the smallest value of G, and G2for tabulation purposes. Further,Gi + G2 < 1, with the equality holding if and only if the observations are allequal which is an unlikely and uninteresting occurrence in practice. Hence,the largest values of G1 (or G,) for a specified value of G2(G1)was chosen suchthat G, + G2 = 0.99. The condition, G, + G2 < 1, and the symmetry implied

    609

  • 8/6/2019 Distribuio Beta

    4/14

    610 R.GNANADESIKAN,.S. PINKHAM ND L.P. HUGHESTABLE 1

    Completeamplemaximum ikelihood stimatesof parameters f the betadistribution.Rangeof GI is .01, .1(.1).9. Rangeof G2 s .01, .1(.1).9. [Gi + G2 < 1.0]In eachcell thefirst entry s d and the secondentry s ~.G1G2 .01 .1 .2 .3 .4 .5 .6 .7 .8 .9 .98

    .01 .112 .192 .254 .318 .395 .495 .639 .877 1.376 3.162 42.128.112 .135 .147 .157 .168 .179 .192 .210 .237 .299 .850

    .1 .245 .337 .441 .576 .770 1.093 1.756 3.864 *.245 .278 .310 .345 .389 .451 .560 .846 *

    .2 .395 .537 .735 1.057 1.701 3.669 *.395 .456 .531 .640 .834 1.367 *

    .3 .647 .947 1.532 3.280 *.647 .804 1.086 1.869 *

    .4 1.320 2.832 *1.320 2.358 *.5 BY SYMMETRY** *

    *.6 *.7 * NOT POSSIBLE (Gi + G2 > 1).8 *.9 *.98

    * G1 + G2 = 1** For example, the estimates of a and 0 when G2 = 0.2 and G1 = 0.1 are respectively theestimates of /8and a when GI = 0.2 and G2 = 0.1, namely 0.278 and 0.337.by equations (5) and (6) with respect to G1 and G2 lead to Table 1 havingfewer entries than might be anticipated.

    4. METHODFORSOLVINGLIKELIHOOD QUATIONSThe expressions on the right-hand sides of the likelihood equations (3) and (4)may be denoted F1(a, ,0) and F (a, 0) respectively, and the likelihood equations

    rewritten asRln G, = F(a, ), (7)R In G2 = F2(a,/). (8)

    F1 and F2 involve R and UM in addition to being functions of a and 0. Giventhe sample of observations to be used in the estimation of a and 0, the quantities

  • 8/6/2019 Distribuio Beta

    5/14

  • 8/6/2019 Distribuio Beta

    6/14

    R.GNANADESIKAN,. S. PINKHAM ND L. P. HUGHESfor the quantile, x,, of a beta distribution

    In x, = -x2(p)/2N, (12)where x2(p) denotes the quantile corresponding to p, (0 < p < 1), of a x2distribution with 213degrees of freedom, and N = a + (13- 1)/2. Thus,corresponding to the orderstatistics, ul and UM one obtains from equation (12),the relationships

    In ul ~ 1/( +1), l UM_X(/K +) (MK + 1) (13)nu1i- ^2N UM 2NAs a further simplification, the well-known standard normal approximationto the distribution of (v2x' - V/2v - 1) was used with the relationshipsprovided by equation (13), and the following starting values, ao and /o , wereobtained:

    0o = 1(*2 + 1), ao = No - ?(/3o - 1), (14)where

    13=[{ln u} ZMK) - Z( ][1 -{KInIand

    No = _ x2o(1/K + 1)2 ln u,and z(D)denotes the standard normal quantile correspondingto the proportion p.Any of the available methods for computing the percentage points of a chi-square distribution (see for example, Wilk et al. (1962a)) might be used toobtain x2P (1/K + 1) and thence No . For R < 1, the starting values providedby equation (14) were used in the iterative scheme defined by equation (9)and the iterations repeated until JR In G? - F1(a? , 0n)| 10-4 andJRIn G2 F2(a,n 3)I < 10-4. The values, an, n , at the first stage of iterationwhen these criteria of convergence are met are then the desired maximum

    likelihood estimates, d and J.For the desired degree of convergence, the authors found, in some cases,that the number of iterations required was fairly large. In these cases, thepattern of convergence was one wherein within a few iterations (i.e., for small n),either JRIn G, - Fl(o, , n)I < 10-4 or JRIn G, - F2(aon /n) < 10-4, butnot both criteria were satisfied simultaneously for small n. To cut the numberof iterations (and the saving was found to be considerable in many cases),the authors used a scheme for adjusting the values of ai , 13,at certain stagesof the iteration. For instance, if F1is adequately close to R InG1and has remainedthus for a specified number of iterations (the value used was 5) including theith iteration, while F2 is not sufficiently close to R IlnG2 , then a,+, and /3,+iwere adjusted by the value of the ratio of R In G2to the computed value ofF2(ai,, O,). That is, ca+1= aiR In G2/F2(oa,, ,) and /i+1 = 13RIn G2/F2(ai, )13).Similarly, if F2 is adequately close to R In G2and has remained thus for fiveiterations including the ith one, while F1 is not sufficiently close to R In G ,then a,+1 = a,R In G,/F,(aci, I ) and i+1 = ,R In Gl/Fl(a;, ti).

    612

  • 8/6/2019 Distribuio Beta

    7/14

    PARAMETERSFTHEBETADISTRIBUTIONROM MALLESTRDER TATISTICS5. EXAMPLES OF APPLICATION

    The methods of Sections 3 and 4 are next illustrated by application to twosets of data.The first set consists of a computer-generated random sample of size K = 20from a beta distribution with parameters a = 1.5 and fi = 11.0. Table 2 gives

    TABLE *MaximumLikelihoodEstimationof Parameters f the Beta Distributionrom OrderStatisticsGeneratedData whereAlpha is 1.5 and Beta is 11.0Sample Size = 20 No Probability PlotsObservation Fraction Alpha BetaNumber OrderedObservation of Sample Estimate Estimate

    1 0.15396729E-012 0.32086748E-01 0.100 1.443 6.4723 0.40187541E-014 0.45033980E-01 0.200 1.472 8.1455 0.47815502E-016 0.52427629E-01 0.300 1.483 9.6027 0.79288867E-018 0.86755657E-01 0.400 1.564 9.7609 0.89401839E-0110 0.90071268E-01 0.500 1.672 11.76511 0.10152799E-0012 0.10534459E-00 0.600 1.774 13.17513 0.10610413E-0014 0.11928693E-00 0.700 1.902 14.99515 0.18714180E-0016 0.19774591E-00 0.800 1.732 12.20217 0.20310399E-0018 0.23729337E-00 0.900 1.800 12.92019 0.30387626E -0020 0.31532391E-00 1.000 1.793 12.781* Thetitles andformatof this tableare reproducedrom computer utput.

    the ordered sample and contains the maximum likelihood estimates of a and 8for R = 0.1(0.1)1. When R = 0.1, in this case, only two ordered observationsare being used to estimate the two parameters. The indications from thisexample, which are typical, are that the maximum likelihood estimates arestatistically reasonably behaved with biases being negative for small R, positivefor large R and with the estimates of the two parameters tending to be positivelycorrelated. In this example, the number of iterations involved in obtaining thesolutions of the likelihood equations ranged from 7 to 19, the former beingfor R = 1 and the latter for R = 0.4.

    The second set of data arose in a problem of talker identification (Beckeret al. (1965)) where there was an interest in selecting features whose variationsacross speakers relative to variations within speakers are large. The dataconsisted of ratios of "between-speakers" mean squares to "within-speakers"

    613

  • 8/6/2019 Distribuio Beta

    8/14

  • 8/6/2019 Distribuio Beta

    9/14

    PARAMETERSFTHEBETADISTRIBUTIONROM MALLESTRDER TATISTICSFor each set of estimates, a and 0, two types of probability plots were obtained.The first corresponds to plotting the ordered quantities with range 0 to 1

    against quantiles of the fitted beta type I distribution while the second isa plot of the ordered original ratios of mean squares against quantiles of thefitted beta type II distribution. Figure A, shows the beta type I plot based onthe estimates obtained when R = 0.5, and Figure A2 shows the correspondingbeta type II plot. In spite of the mildly ragged nature of the plots, the "largestpoint" (i.e., the one in the top right-hand corner) departs from the configurationindicated by the others. An interesting feature is that this departure is moreclearly indicated in the beta type II plot of Figure A2 than in the "equivalent"beta type I plot of Figure Al . Transforming to a distribution with infinitetails from one with finite tails may increase the "sensitivity" of the probabilityplot to departures in the tails.Figures A3 and A4 show beta types I and II plots for the same data usingthe maximum likelihood estimates when R = 1. In spite of the estimatesthemselves being very different from those when R = 0.5, the configurations ofthe probability plots are not very different. Again the largest point is indicated

    0.7

    0.6-0-> 0.5-w 00

    o 00.3-

    0.2* I0.2 .1 1 10.2 0.3 0.4 0.5BETA QUANTILES

    Data From Talker Identification ProblemBeta One Plot Alpha Estimate = 10.479 Beta Estimate = 19.378Number of Points on Graph = 20 Sample Size = 20 Parameters wereEstimated with 10 Smallest Order StatisticsFigure Al

    615

  • 8/6/2019 Distribuio Beta

    10/14

    6162.2

    1.8cnz0

    1.4

    a3oo 1.00::

    00.6

    V. L

    L R. SHENTONAND K. O. BOWMAN

    0

    0

    .0 *0

    .00 *9000

    .00

    0.2 0.4 0.6 0.8BETA QUANTILES1.0 1.2

    Data from Talker IdentificationProblemBeta Two Plot Alpha Estimate = 10.479 Beta Estimate = 19.376Number of Points on Graph = 20 Sample Size = 20 ParameterswereEstimated with 10 Smallest OrderStatisticsFigure A2

    as being overly large although, as one would expect, the indication of theaberrant point is not as clear in Figure A4 as it is in Figure A2.

    6. CONCLUDING REMARKSAnalytical methods based on the approach of Halperin (1952) could beused to study the large sample properties of the maximum likelihood estimatesobtained by the methods described in the preceding sections. Useful smallsample indications concerning bias, variance and covariance of these estimatescould also be obtained from a systematic Monte Carlo study using computer-generated random samples from beta distributions. It would also be of interest,in such a study, to assess the maximum likelihood estimates in terms of somecriterion of linearity of the configuration of the sample in the beta probabilityplot whose axis of quantiles is determined by using the estimates of a and ,Bfor that sample. Such a Monte Carlo study of various statistical properties

  • 8/6/2019 Distribuio Beta

    11/14

    PARAMETERSF THEBETADISTRIBUTIONROM MALLESTRDER TATISTICSof estimators obtained by different techniques for a wide class of distributions,including the gamma and beta distributions, was envisaged and initiated bythe present authors but has not been completed. The partial results, as typifiedby the first example of Section 5, indicated that the maximum likelihoodestimates obtained by the methods described in the present paper havereasonable statistical properties which justify publication of the methodsinvolved even though definitive information on the properties of the estimatesstill needs to be sought and provided.The techniques of the present paper are available for routine use on a high-speed computer and Tables 2 and 3 as well as the probability plots show typicaloutput from the programs.

    APPENDIXON NUMERICALMETHODS(1) The digamma function, T(x) = r'(x)/r(x), is evaluated using therelationship [See Wilk et al. (1962b)]0.7

    *

    0.6-V)z0orLd(/)00LJw00J0

    0

    0.5- .0 * 0

    00.4-00 *

    00.3

    n r

    00

    00 0.2 0.3 0.4 0.5 0

    BETA QUANTILESData from Talker Identification ProblemBeta One Plot Alpha Estimate = 6.543 Beta Estimate = 11.052Number of Points on Graph = 20 Sample Size = 20 Parameters wereEstimated with 20 Smallest Order Statistics

    Figure As

    .6. - I I

    617

  • 8/6/2019 Distribuio Beta

    12/14

    R.GNANADESIKAN,. S. PINKHAM ND L.P. HUGHES

    0

    S

    0S

    00 0~~0@0

    00 0

    I 1> 0.4 0.6 0.8 1.0 1.2 1.4BETA QUANTILES

    Data from Talker Identification ProblemBeta Two Plot Alpha Estimate = 6.543 Beta Estimate = 11.052Number of Points on Graph = 20 Sample size = 20 Parameters wereEstimated with 20 Smallest Order Statistics

    Figure A4

    1.6

    2 In [10 + xf)(11 + Xf)] + 610 + x +( l) ^ + x + 1+^10 +. 1 if 0 3).For x = 1.5, the absolute error is less than 8 X 10-6.(2) The incomplete beta function,I(x; a, f) = ta-1(1 - t)l-' dt,

    :?

    6182.2

    1.8-z0!-i> 1.4JcOm0

    L 1,0o0

    0.6

    0.210.I I I?2

    >

  • 8/6/2019 Distribuio Beta

    13/14

    PARAMETERSOF THEBETADISTRIBUTIONFROMSMALLESTORDERSTATISTICSis evaluated using the first few terms of a modified Laplacian expansion (Molina(1932), Gnanadesikan, Hughes and Pinkham (1966)),

    I(x; a, 13) 0 D(O + j, z)whereN =?+j--, z = -Nlnx,Ao = 1,A, = A3 = A5 = 0,

    A2~ 12 'A4 - (3 - 1)(53 - 7)240A =-(_ - 1)(3512 - 1121 + 93)4032

    andD(a, b) = f ta-e-b dt, a > 0, b > O.

    [NOTE: The integral, D(a, b), may be evaluated using the computationalprocedure described in Wilk et al. (1962a).] The above expression for I(x; a, 13)is used when N > 0, i.e., 2a + 13> 1. For values of N between - and 0, i.e.,0 < 2a + / < 1, the recurrencerelation,I(x; a, /) = I(x; a + 1, 3) + I(x; a, 3 + 1),

    may be used to obtain I(x; a, 13).(3) The partial derivatives, I,(x; a, 13)and I2(x; a, 13),are computed usingprocedures described in Gnanadesikan, Hughes and Pinkham (1966). Thefirst seven terms of the modified Laplacian expansion for Ii(x; a, 13) yield(x; a,) = a I(x; a, 3) - i-1 z iD+l + j + 1, ),

    where N, z, D and the coefficients Ai are the same as defined above.For small values of x, a and 13 he error in using the above series is worsethan for larger values. For instance, when x = 0.01, a = 1 and 13= 3, theabsolute error is 1 X 10-3 while the relative error is approximately 0.2%.The above expression for computing 11 is used when N > 0, and,if - 1 < N < 0, the recurrence relation,

    I(x; a, 13) = I(x; a + 1, 13) + Ii(x; a, + 1),The integral, I12(x; , 13), s obtained using the relationshipI2(x; a, 1) = a- I(x; a, 1) = B(a, 13)[,I(1) - T(a + 13)] - I( - x; , a).

    619

  • 8/6/2019 Distribuio Beta

    14/14

    R.GNANADESIKAN,. S. PINKHAM ND L. P. HUGHESACKNOWLEDGMENT

    It is a pleasure to thank M. B. Wilk for his comments on an earlier draft.REFERENCES

    1. ANSCOMBE, . J., 1961. Testing to establish a high degree of safety or reliability. Bull.Int. Statist. Inst., 39, 73-89.2. BECKER,MRS. M. H., GNANADESIKAN,., MATHEWS,M. V., PINKHAM,R. S., PRUZANSKY,Miss S., and WILK, M. B., 1965. Comparisonof some statistical distance measuresfortalker identification. UnpublishedBell TelephoneLaboratoriesMemo.3. DAVID, F. N., and JOHNSON,N. L., 1948. The probability integral transformation whenparametersare estimated from the sample. Biometrika,35, 182-90.4. GNANADESIKAN,., HUGHES,MRS. L. P. and PINKHAM,R. S., 1966. Numerical evaluationof partial derivatives of the incomplete beta integral. Unpublished Bell Telephone Lab-oratoriesMemo.5. HALPERIN,MAX, 1952. Maximumlikelihoodestimation in truncatedsamples.Ann. Math.Statist. 28, 226-38.6. MOLINA,. C., 1932. An expansionfor Laplacianintegralsin termsof incompletegammafunctions, and some applications. Bell Syst. Tech. J. 11, 563-75.7. WILK, M. B. and GNANADESIKAN, ., 1961. Graphical analysis of multiresponse experi-mental data using ordered distances. Proc. Nat. Acad. Sci. 47, 1209-12.8. WILK, M. B. and GNANADESIKAN,., 1964. Graphical methods for internal comparisonsin multiresponseexperiments.Ann. Math. Statist.35, 613-31.9. WILK, M. B., GNANADESIKAN, R. and FREENY,MRS. A. E., 1963. Estimation of errorvariancefrom smallest orderedcontrasts. J. Amer.Statist.Assoc., 58, 152-60.10. WILK, M. B., GNANADESIKAN, . and HUYETT,Miss M. J., 1962a. Probability plots forthe gamma distribution. Technometrics,, 1-20.11. WILK, M. B., GNANADESIKAN, R. and HUYETT,MARILYN ., 1962b.Estimation of param-eters of the gamma distribution using order statistics. Biometrika, 49, 525-45.12. WILK, M. B., GNANADESIKAN, R. and LAUH,ELIZABETH,966.Scale parameter estimationfrom the order statistics of unequal gamma components. Ann. Math. Statist., 37, 152-76.13. WISE, M. E., 1950. The incomplete beta function as a contour integral and a quicklyconvergingseries for its inverse. Biometrika,37, 208-18.

    620