When it comes to a discussion on how best to present material to students, teachers and lecturers tend to agree only to differ! Nevertheless, texts on Statistics are surprisingly uniform in their introduction to the concept of ‘mean’ or ‘expected value’. Having defined the sample mean in terms of observed frequencies, it is a natural and easy step to define E(X), the expected value of a discrete random variable X taking values x1,x2,... with probabilities p1, p2, … by the formula
Texts resort to ‘waffle’ to justify (2) as a continuous version of (1). An alternative approach is available, which unifies (but does not replace) (1) and (2). It is elementary with a high pictorial content and yet completely general.
The alternative
Let the random variable X have cumulative distribution function (c.d.f) F(x) = P(X < x).
Diagram 1 sketches a graph of a typical c.d.f. for a continuous random variable.
Diagram 1
Take any point con the x-axis, and let A be the area under F to the left of c, and B the area between F and the line y = 1 to the right of c. Provided that these two areas are finite E(X) exists and
Note that
(ii) E(X) is the unique choice of c for which A = B;
(iii) If X is a positive random variable then E(X) is the area between F and the line y = 1.
Equivalence of (1) and (3)
Suppose a random variable takes values x1, x2, x3, x4 with probabilities p1, p2, p3, p4 respectively with p1 + p2 + p3 + p4 = 1. The graph of F is now a step-function (see diagram 2) with a ‘step’ of height pi at xi. Applying (3) with c as shown it is clear that
B = p3(x3 — c) + p4(x4 — c)
Diagram 2
The values in bold when multiplied by 2 (the distance between adjacent values) give the values A and B and hence E(X) = 6 + 2(.7 — .4) = 6.6
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
If instead of c = 6 we had chosen c = 8 then the last two rows of table 1 would have been
c.p. .1 .3 .6 .3
cep. .1 .4 1.0 .3
and hence E(X) = 8 + 2(.3 — 1) = 6.6
Grouped data in class intervals of equal width
Table 2 is a frequency table resulting from grouping data into five intervals of equal width. The class representatives and relative frequencies correspond exactly with the first two rows of table 1 and, with minor alterations (remembering to divide by the total frequency, and choosing the mid-point of an interval for c) the calculation of sample mean proceeds as in table 1.
class interval | 1—3 | 3—5 | 5—7 | 7—9 | 9—11 |
class rep. | 2 | 4 | 6 | 8 | 10 |
freq. | 10 | 20 | 30 | 10 | 30 |
c.f. | 10 | 30 | 40 | 30 | |
c.c.f. | 10 | 40 | 70 | 30 |
From a statistical point of view, however, since the raw data are not available, calculating a sample mean can only be done after assumptions about the raw data are made. The calculation in table 2 assumes that all data in a class interval equal the class representative, so that the sample c.d.f. is a step-function (see diagram 3). A more realistic assumption is to take the known values of the sample c.d.f. (at the points 1, 3, 5, 7, 9, 11) and interpolate linearly (see also diagram 3). Our new definition (3) applied to both these sample c.d.f’s shows immediately that both give the same answer (look at the area above the c.d.f.’s bounded by the line y = 1). The computations using the step-function c.d.f. or its frequency equivalent (as in table 2) are easier than using the linearly interpolated c.d.f. but the interested reader will have no difficulty in constructing the latter.
Diagram 3
E(X) for a mixed-type random variable
Formulas (1) and (2) do not apply to a random variable which is neither discrete nor continuous. Such a random variable has a c.d.f. which is neither a step-function nor a continuous function. Nevertheless the areas A and Bin (3) can always be evaluated.
For example suppose X has c.d.f. given by
F(x) = x 0 < x < 1/4
F(x) = (x+1)/2 1/4 < x <1
The graph of F has jump discontinuity at x = 1/4, so taking c = 1/4 and applying (3), areas A and B are triangles with areas 1/32 and 9/64 respectively so E(X) = 1/4 + 9/64 — 1/32 which equals 23/64.
Markov’s inequality
Although this topic is rarely covered in school statistics Markov's inequality and its corollary Chebyshev’s inequality are used to prove the weak law of large numbers. For a positive random variable X with mean E(X) Markov’s inequality states that for any positive ?,
Summary
The limitations of the approach suggested here are not difficult to discover— calculating the mean of a Poisson distribution using (3) would be suicidal, and in general calculating E(X2) would require the evaluation of the c.d.f. of X2. Nevertheless, (3) and its applications are elementary, pictorial, and succeed in unifying the schism between formulae for the discrete/continuous case. Worthy of a mention in elementary texts?
Reference
Feller, W. (1950). Introduction to Probability and its Applications (vol. 2), page 148. Wiley.
University College, Swansea
Back to contents of The Best of Teaching Statistics
Back to main Teaching Statistics page