The Measures Of Dispersion In Statistics

Dispersion measures are important because they tell us about the variability that we find in a given sample or population. When we talk about a sample, this dispersion is important because it conditions the error that we are going to have when making inferences for measures of central tendency, such as the mean.
Measures of dispersion in statistics

In a data distribution, measures of dispersion play a very important role. These measures complement those of central position, characterizing the variability of the data.

Thus, measures of central tendency indicate values ​​with respect to which the data appear to cluster. They are recommended to infer the behavior of variables in populations and samples. Some examples of them are the arithmetic mean, the mode or the median (1).

Measures of dispersion complement these measures of central tendency. Furthermore, they are essential in a data distribution. This is because they characterize the variability of the data. Its relevance in statistical training has been pointed out by Wild and Pfannkuch (1999).

In these measures, the perception of the variability of the data is one of the basic components in statistical thinking. The perception of the variability of the data gives us information about the dispersion of the data with respect to an average or mean.

The arithmetic mean is widely used in practice, but can often be misinterpreted. This will happen when the values ​​of the variable are very dispersed. On these occasions it is necessary to follow the mean of the dispersion measures (2).

In measures of dispersion, there are three important components related to random variability (2):

  • The perception of its ubiquity in the world around us.
  • The competition for explanation.
  • The ability to count it (which implies understanding and knowing how to apply the concept of dispersion).
man in front of graphs representing strategic thinking, dispersion measures

What are dispersion measures for?

In a statistical study, when generalizing the data from a sample of a population, the dispersion measures are very important since they directly condition the error with which we work. Thus, the more dispersion we collect in a sample, the more size we will need to work with the same error.

On the other hand, these measures help us determine if our data is too far from the core value. With this, they give us information on whether this central value is adequate to represent the study population. This is very useful for comparing distributions and understanding risks in decision making (1).

These measures are very useful for comparing distributions and understanding risks in decision making. The greater the dispersion, the less representative is the central value. These are the most used:

  • Travel or range.
  • The mean deviation.
  • Variance
  • The standard or standard deviation.
  • The coefficient of variation.

Functions of each of the measures of dispersion

Rank

First, the range is recommended for a primary comparison. In this way, consider only the two extreme observations. Therefore it is recommended only for small samples (1). It is defined as the difference between the last value of the variable and the first (3).

This measurement is easily calculated. However, it has the disadvantage that it does not really express the concentration of the data, presenting cases in which exaggerated intervals are obtained when in reality the series has a high concentration, but its extreme values ​​differ greatly from the rest of the values ​​of the series.

Statistical deviation

For its part, the mean deviation indicates where the data would be concentrated if all were at the same distance from the arithmetic mean (1). We consider the deviation of a value of the variable as the difference in absolute value between that value of the variable and the arithmetic mean of the series. Thus, it is considered as the arithmetic mean of the deviations (3).

Variance

The variance is an algebraic function of all values, appropriate for inferential statistics tasks (1). It can be defined as the squared deviations (3).

Some remarks on variance

  • The variance, like the mean, is a highly sensitive index to extreme scores.
  • In cases where the mean cannot be found, it will not be possible to find the variance either.
  • The variance is not expressed in the same units as the data, since the deviations are squared.

Standard or standard deviation

For samples drawn from the same population, the standard deviation is one of the most used (1). It is the square root of the variance (3).

It is the dispersion measure that best provides the variation of the data with respect to the arithmetic mean. Its value is directly related to the dispersion of the data, the greater the dispersion of them, the greater the standard deviation and the lower the dispersion, the lower the standard deviation.

Observations on the standard deviation

  • The standard deviation, like the mean and the variance, is a highly sensitive index to extreme scores.
  • In cases where the mean cannot be found, it will not be possible to find the standard deviation either.
  • The smaller the standard deviation, the greater the concentration of data around the mean.

Coefficient of variation

It is a measure used primarily to compare the variation between two sets of data measured in different units. For example, height and body weight of students in a sample. Thus, it is used to determine in which distribution the data are more clustered and the mean is more representative (1).

Man reading about descriptive statistic on tablet

The coefficient of variation is a more representative dispersion measure than the previous ones, because it is an abstract number. That is, it is independent of the units in which the values ​​of the variable divide. In general, this coefficient of variation is usually expressed as a percentage (3).

Thus, these measures of dispersion will indicate, on the one hand, the degree of variability in the sample. On the other hand, they will indicate the representativeness of the central value, since if a small value is obtained, it will mean that the values ​​are concentrated around that center.

This would mean that there is little variability in the data and the center represents everyone well. On the other hand, if a large value is obtained, it will mean that the values ​​are not concentrated, but dispersed. This will mean that there is a lot of variability and the center will not be very representative. On the other hand, when making inferences we will need a larger sample size if we want to reduce the error, increased precisely by the increase in variability.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *


Back to top button