A full understanding of correlation requires an appreciation of bivariate distributions, but increasingly rank correlation coeffjicients are being used as a measure of agreement with pupils for whom such appreciation is not possible. How can we justify the formula used?
Although the formula for Spearman’s Coefficient of Rank Correlation is being increasingly used in school courses in Geography and other subjects, thc justification for its use is rarely available. That the Spearman formula is the result of finding the product moment correlation for the ranks, although bestowing some credibility on the formula, is not helpful since the product moment coefficient is not usually known at this level. The Schools Council publication Mathematics across the Curriculum (Blackie) remarks (p. 104) "Kendall’s coefficient has an advantage for teaching purposes over Spearman’s, in that it is more easily explained as a reasonable measure". Whatever the validity of that remark Spearman’s coefficient is the one which is commonly used and in this article I try to explain how its algebraic structure arises.
Assuming that readers understand the principle of ranking, I propose that it is desirable that any coefficient of correlation should both give an indication of the extent to which two sets of ranks differ (or agree) and also should be standardised so as to be consistent with other measures of correlation in that its range should be between -1 and +1.
An Example
Suppose we measure two characteristics A and B of eight towns. Let A be the density of public houses and B the density of places of worship, in each case given as the number per 10000 of the population.
Town | P | Q | R | S | T | U | V | W |
41 | 36 | 26 | 45 | 48 | 35 | 51 | 43 | |
22 | 7 | 14 | 21 | 13 | 11 | 17 | 20 |
Town | P | Q | R | S | T | U | V | W |
Rank of A | 4 | 3 | 1 | 6 | 7 | 2 | 8 | 5 |
Rank of B | 8 | 1 | 4 | 7 | 3 | 2 | 5 | 6 |
Town | R | U | Q | P | W | S | T | V |
Rank of A | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
Rank of B | 4 | 2 | 1 | 8 | 6 | 7 | 3 | 5 |
Thus
Sd2 = 32 + 02 + 22 + 42+ 12 + 12 + 42 + 32 = 56
(It is reasonable to ask whether other treatments of the differences in ranks could provide a suitable coefficient, e.g. the sum of the absolute values ·S|d|, but that is another article.)
In general this measure is small when there is a high agreement between the ranks and only for complete agreement does it take its minimum value (it is obvious that · Sd2 cannot be negative and that 0 will be its smallest value).
Rank of A | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
Rank of B | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
This measure is large when there is a high disagreement between the ranks and only for complete disagreement does it take its maximum value (this is not obvious although intuitively reasonable).
Rank of A | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
Rank of B | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 |
Thus our coefficient does seem to discriminate between different degrees of agreement by taking values in the range 0 to 168.
We can use this approach to generate Spearman’s formula for n pairs of values but first we need to calculate the maximum value of ·Sd2 which, as we have seen, occurs when there is complete disagreement.
Rank of A | 1 | 2 | 3 | ... | ... | ... | n-1 | n |
Rank of B | n | n-1 | n-2 | ... | ... | ... | 2 | 1 |
= (n3 — n)/3
Standardisation takes place as follows:
Back to Contents of The Best of Teaching Statistics
Home
Back to main Teaching Statistics Page