An Alternative Approach to the
Mean
ALAN W. SYKES

When it comes to a discussion on how best to present material to students, teachers and lecturers tend to agree only to differ! Nevertheless, texts on Statistics are surprisingly uniform in their introduction to the concept of ‘mean’ or ‘expected value’. Having defined the sample mean in terms of observed frequencies, it is a natural and easy step to define E(X), the expected value of a discrete random variable X taking values x1,x2,... with probabilities p1, p2, … by the formula

E(X)  = p1x1 +p2x2 + … (1)
 
With the introduction of a continuous random variable X with a distribution specified by a probability density function f(x), a new definition of E(X) is required, namely
 
E(X) = Sxf(x)dx (2)

Texts resort to ‘waffle’ to justify (2) as a continuous version of (1). An alternative approach is available, which unifies (but does not replace) (1) and (2). It is elementary with a high pictorial content and yet completely general.

The alternative

Let the random variable X have cumulative distribution function (c.d.f)   F(x) = P(X < x).

Diagram 1 sketches a graph of a typical c.d.f. for a continuous random variable.

Diagram 1

 

Take any point con the x-axis, and let A be the area under F to the left of c, and B the area between F and the line y = 1 to the right of c. Provided that these two areas are finite E(X) exists and

E(X) = c + BA (3)

Note that

(i) the definition is independent of the choice of c as it must be;

(ii) E(X) is the unique choice of c for which A = B;

(iii) If X is a positive random variable then E(X) is the area between F and the line y = 1.
 
 

The definition (3) appears to be overlooked by statistics texts although the special case (iii) can be found in Feller. In the subsequent sections we examine briefly some aspects of this approach.

Equivalence of (1) and (3)

Suppose a random variable takes values x1, x2, x3, x4 with probabilities p1, p2, p3, p4 respectively with p1 + p2 + p3 + p4 = 1. The graph of F is now a step-function (see diagram 2) with a ‘step’ of height pi at xi. Applying (3) with c as shown it is clear that

A =p1(cx1) + p2(cx2)

B = p3(x3c) + p4(x4c)

and hence c + BA = (after rearrangement) p1x1 + p2x2 + p3x3+ p4x4 which is of course (1) in this case.

Diagram 2


 
 

Suppose X has a discrete distribution given by the first two lines of table 1. If we choose c = 6, one of the values that X can take, to apply our new approach we must accumulate probabilities from left and right, giving row 3 — c.p. Repeat this procedure with line 3 to get row 4 — c.c.p. In each case terms corresponding to the value of c chosen (6 in this case) are omitted.

The values in bold when multiplied by 2 (the distance between adjacent values) give the values A and B and hence E(X) = 6 + 2(.7 — .4) = 6.6

TABLE 1
 
 
value of X
2
4
6
8
10
probability
.1
.2
.3
.1
.3
c.p.
.1
.3
 
.4
.3
c.c.p
.1
.4
 
.7
.3

 

If instead of c = 6 we had chosen c = 8 then the last two rows of table 1 would have been

c.p. .1 .3 .6 .3

cep. .1 .4 1.0 .3

and hence E(X) = 8 + 2(.3 — 1) = 6.6
 
 

Grouped data in class intervals of equal width

Table 2 is a frequency table resulting from grouping data into five intervals of equal width. The class representatives and relative frequencies correspond exactly with the first two rows of table 1 and, with minor alterations (remembering to divide by the total frequency, and choosing the mid-point of an interval for c) the calculation of sample mean proceeds as in table 1.

TABLE 2
class interval 1—3 3—5 5—7 7—9 9—11
class rep. 2 4 6 8 10
freq. 10 20 30 10 30
c.f. 10 30   40 30
c.c.f. 10 40   70 30

 

sample mean = 6 + 2(70 — 40)/100 = 6.6

From a statistical point of view, however, since the raw data are not available, calculating a sample mean can only be done after assumptions about the raw data are made. The calculation in table 2 assumes that all data in a class interval equal the class representative, so that the sample c.d.f. is a step-function (see diagram 3). A more realistic assumption is to take the known values of the sample c.d.f. (at the points 1, 3, 5, 7, 9, 11) and interpolate linearly (see also diagram 3). Our new definition (3) applied to both these sample c.d.f’s shows immediately that both give the same answer (look at the area above the c.d.f.’s bounded by the line y = 1). The computations using the step-function c.d.f. or its frequency equivalent (as in table 2) are easier than using the linearly interpolated c.d.f. but the interested reader will have no difficulty in constructing the latter.
 
 

Diagram 3

E(X) for a mixed-type random variable

Formulas (1) and (2) do not apply to a random variable which is neither discrete nor continuous. Such a random variable has a c.d.f. which is neither a step-function nor a continuous function. Nevertheless the areas A and Bin (3) can always be evaluated.

For example suppose X has c.d.f. given by

F(x) = x             0 < x < 1/4

F(x) = (x+1)/2    1/4  <  <1

The graph of F has jump discontinuity at x = 1/4, so taking c = 1/4 and applying (3), areas A and B are triangles with areas 1/32 and 9/64 respectively so E(X) = 1/4 + 9/64 — 1/32 which equals 23/64.

Markov’s inequality

Although this topic is rarely covered in school statistics Markov's inequality and its corollary Chebyshev’s inequality are used to prove the weak law of large numbers. For a positive random variable X with mean E(X) Markov’s inequality states that for any positive ?,

P(X > l) <E(X)/ l
Since by (3), E(X) is the area above F bounded by the line y = 1 and since l P(X> l) is the area of a rectangle included in this area, the inequality is obvious!
 
 

Summary

The limitations of the approach suggested here are not difficult to discover— calculating the mean of a Poisson distribution using (3) would be suicidal, and in general calculating E(X2) would require the evaluation of the c.d.f. of X2. Nevertheless, (3) and its applications are elementary, pictorial, and succeed in unifying the schism between formulae for the discrete/continuous case. Worthy of a mention in elementary texts?

Reference

Feller, W. (1950). Introduction to Probability and its Applications (vol. 2), page 148. Wiley.

University College, Swansea

Back to top

Back to contents of The Best of Teaching Statistics
Back to main Teaching Statistics page