The range is defined as the difference between the maximum and minimum data values.
Example: Farmer Green has measured the height (in meters) of his calves and has the following data set, arranged from smallest to biggest:
0,7; 0,8; 1,3; 1,5; 1,5
The range of heights is calculated by deducting the lowest from the highest.
1,5m minus 0,7m = 0,8m
Most measurements are distributed randomly in the range that they were collected in, but most of them concentrate around some sort of “central tendency” or “average”. Three measures of central tendency are the mean, median and mode.
The Main Characteristics of the Mode, the Median, and the Mean
Fact no. |
Mode |
Median |
Mean |
1 |
It is the value that appears the most often. |
It is the middle-most value. |
Add all the values and divide the total by the number of items. |
2 |
A distribution may have 2 or more modes. On the other hand, there can also be no mode |
Each array has one and only one median. |
An array has one and only one mean. |
3 |
It cannot be manipulated algebraically. |
It cannot be manipulated algebraically. |
Means may be manipulated algebraically. |
4 |
Individual values need to know to calculate the mode |
Individual values need to know to calculate the mode |
You can calculate it even if you do not know the individual values. You need to know the total and sample size |
5 |
Values must be arranged from smallest to biggest. |
Values must be arranged from smallest to biggest. |
Values need not be ordered or grouped for this calculation. |
6 |
Tells you what score occurs the most often. |
Provides a better measure of location than the mean when there are some extremely large or small observations. Median income is used as the measure for the SA household income. |
Very easy to calculate and used the most often when there is a list of numbers. |
Sometimes data are not spread equally (i.e. they are skewed), making the mean senseless. In this case the median is a better representation of the central tendency. To get an idea of the spread of data, we calculate quartiles and percentiles.
Example: Consider the following list of data: 1; 2; 3; 4; 5; 6; 7; 8; 9; 10; 11
The median is the middle most value.
The median = 6 1; 2; 3; 4; 5; 6; 7; 8; 9; 10; 11
The first quartile is the value that lies in the middle of the group of data below the median.
The first quartile = 3 1; 2; 3; 4; 5; 6; 7; 8; 9; 10; 11
You can quite clearly see that the median is the same as the second quartile.
The second quartile = 6 1; 2; 3; 4; 5; 6; 7; 8; 9; 10; 11
The third quartile is the value that lies in the middle of the group of data above the median.
The third quartile is 9 1; 2; 3; 4; 5; 6; 7; 8; 9; 10; 11
The inter-quartile range is the 3rd quartile – 2nd quartile.
Here it is 9-3 = 6.
The Interquartile range tells us that 50% 0f data lie between the values of 3 and 9.
We can summarise the data set in a 5-number system:
Example: Let us prepare a 5-number summary for the data set above:
The minimum value of the data set |
1 |
1st quartile |
3 |
2nd quartile = median |
6 |
3rd quartile |
9 |
The maximum value of the data set |
11 |
Click here to download a handout with examples of the application of Quartiles.
You saw in the previous activity that the calculations of large data sets is very cumbersome. There is a quicker way, namely by using cumulative frequency tables. First, let us revise frequency tables.
Example: In a certain rural area, a survey was done to see how many dogs people had. The results are listed below. Now arrange these data in a frequency table:
3; 5; 1; 3; 7; 5; 6; 5; 9; 5; 2; 4; 4; 5; 5; 8
Step 1: Draw up a tally table – indicating a tally for every time the value occurs
Step 2: The tallies are counted and recorded in the frequency column.
Frequency table:
Below is an example of a cumulative frequency table.
The cumulative frequency is calculated by adding the previous cumulative frequency total to the current frequency. The very last total will always balance with the frequency colum total.
Example: Cumulative frequency table:
To calculate the median, we use a simple formula. There is an even number of data, thus we know that the median is going to lie between two values. To find out between which values it lies, we say (16+1)/2 = 8,5. This means that the median lies between score number 8 and score number 9. Let us look at the table we constructed in more detail:
3; 5; 1; 3; 7; 5; 6; 5; 9; 5; 2; 4; 4; 5; 5; 8
1st quartile |
First quartile calculation: 1st quartile lies between (16+1)/4 = 4,25 i.e. between items 4 and 5 From the table we can see that item 4 is 3 and item 5 is 4 Thus, the 1st quartile is (3+4)/2 = 3,5 |
2nd quartile = median |
5 |
3rd quartile |
3rd quartile calculation: The 3rd quartile lies between 3(16+1)/4 = 12,75 i.e. between items 12 and 13. The 3rd quartile is 5,5. |
Let us plot the ogive of the data in the table above.
Ogive for Vet visits in the past year:
Why Ogives are useful:
The measure of how measures are scattered around the mean are variance and standard deviation.
Step 1: Calculate MEAN
Example: A farmer wanted to see which plot of land was better for growing trees, plot A or plot B. He measured the height of 7 young trees on each plot.
Plot A: 364cm, 372cm, 364cm, 368cm, 370cm, 368cm, 370cm
Plot B: 304cm, 388cm, 332cm, 432cm, 400cm, 352cm, 368cm
We calculate the mean of each group:
Plot A: 2576/7 = 368cm
Plot B: 2576/7 = 368cm
The means for both plots were the same, so we cannot gain much insight that way.
Step 2: We now calculate the deviation from the mean for each data point:
Add up all the deviations for Plot A: 4 – 4 + 4 + 0 – 2 + 0 – 2 = 0
Add up all the deviations for Plot B: 65 – 20 + 36 – 64 – 32 + 16 + 0 = 0
This is also not helping so we need to go one step further.
Step 3: Determine the square of deviation, also called the Variance.
We re-write the table and add a column where we square the deviations from the mean.
This is called the variance. Variance is defined as the mean of the square of the deviations from the mean.
Step 4: Determine the standard deviation.
If we now take the square root of the variance, we get the standard deviation.
The standard deviation for Plot A is 2,83 and for Plot B it is 37.55.
The standard deviation is defined as the square root of the mean .
Let us summarise the steps:
Let us take a closer look at the farmer’s results for his trees.
Both the variance and the standard deviation show us that there was very little variation in the trees from Plot A. The trees from Plot B, however, had hugely different growths. If the farmer needs to sell fairly uniform trees to logging companies, he would be better off planting on Plot A. He could do further analysis to see why the trees on Plot B are so very different to one another, or he could use the land for another purpose.