Quality Concepts Matter

Basic QA Statistics Series(Part 3)- Basic Measures of Dispersion and Statistical Notation

Gary Cox is a great Quality resource in addition to being very funny! gcox@barringtongrp.ca

REFLECTION: FOR STUDENTS: “It is not possible to know what you need to learn.” -Philip Crosby

FOR ACADEMICS: “Quality is the result of a carefully constructed cultural environment. It has to be the fabric of the organization, not part of the fabric.”-Philip Crosby

FOR PROFESSIONALS/PRACTITIONERS: “Quality has to be caused, not controlled.”-Philip Crosby

Foundation

Before we go further, this post will give you the basic notation for simple statistics so we can communicate more efficiently. It will also make understanding instructions from textbooks much less challenging. Please don’t give up here. These notations are just a secret code mathematicians use. If you learn it, you will begin to see that statistics is quite accessible. After the code is passed on, we will move on to the Measures of Dispersion.

Review: Part 1 and 2 covered the definition of Population, Sample, and how the terms Parameter and Statistic relate to Population and Sample, respectively. Also, we covered the concept of what data is, as well as the different kinds of data that exist, and the measurement scales used to analyze measurement data.

STATISTICAL NOTATION

Typically, capital letters and Greek letters are used to refer to population parameters, and lower-case or Roman letters are used to note sample statistics. 

I will be providing information in the table below specifically for this post. As posts are added in the series, more tables will be added to address any other notations referenced in the future. This post will become the notation reference page to allow any who are new to statistical notation an easy reference. 

(Kubiak, 2017)

MEASURES OF DISPERSION

There are three primary Measures of Dispersion- Range, Variance, and Standard Deviation. I will address each and explain them plainly. If you are new to statistics, I will avoid mathematics as much as possible, but alas, you will find it inescapable.

First comes RANGE.  Range is probably the most well known and most easily understood. Range is simply the difference between the largest (Maximum or MAX) value and the smallest (Minimum or MIN) value in a data set.

Example: 24, 36, 54, 89, 12, 14, 44, 55, 75, 86

Min 12, Max, 89

Range (R)= Max-Min = 77

Though Range is easy to use, it is not always as useful as the other measures of dispersion, because sometimes two separate data sets can have very similar ranges, with the other measures looking nothing alike. On that note, comes something a bit more complicated. 

At first, it sounds pretty simple: 

VARIANCE- This is the measure of how far off the data values are from the mean over-all. Obtaining this measurement by hand can be painful. You have to find the difference between the mean and each data point in the population or sample, square the differences, and then find the average of those squared differences.

Variance RoadMap

  1. Calculate the mean of all the data points Calculate the difference between the mean and each data point(Xi – μ or x ̅), Xi being a representation ith value of variable X. 
  2. Square the calculated differences for all data points
  3. Add these Squared values together
  4. Divide that number by N if the data set is a population (N), or divide by n-1 if the data is a sample

Follow the underlined statements above, and the formula for Variance below is achieved, but most stat software will calculate Variance with minimal effort.

Sample                

Population

Standard Deviation (SD)

A negative of Variance is though you can measure the relative spread of the data, it is not representative of the same scale because it has been squared. For example- data collected in inches or seconds and then checked for variance is effectively square inches or seconds squared. 

Standard Deviation is more useful because the units of Standard Deviation end up on the same scale and are directly comparable to the mean of the population or sample. Standard Deviation is the Square Root of the Variance and can be described as the average distance from each data point to the mean. The lower the SD, the less spread out the data is. The larger the spread of data, the higher the SD. Once again, most Stats programs and calculators will provide SD with no problem. The SD helps you understand how much your data is varying from the mean.

Two Examples: (using sample sets)

Set 1: 35, 61, 15, 14, 1

               Mean(Set 1): 25.2

Set 2: 45, 48, 50, 43, 40

                Mean(Set 2): 45.2

S=√(((45-45.2)²+(48-45.2)²+(50-45.2)²+(43-45.2)²+(40-45.2)²)/4)=3.96

When you first glance at the small sample of data, set one looks like it has a much larger spread from the average than set two. When you run the numbers, the SD results back up your “gut feeling.” An analysis is always better than a “gut feeling,” no matter how intuitive you are. The larger the sample set you are looking at, the more the initial appearance of the data can mislead you, so always run those numbers!

Conclusion

To recap, Range is the most well-known and straightforward Measure of Dispersion, but only describes the dispersion of the extremes of the data, and therefore may not always provide much new information. Range is usually most useful with smaller data sets. I should also mention a term known as the Interquartile Range (IQR). I will be dedicating a separate post to IQR next post.
Variance is an overall measure of the variation occurring around the mean using the Sum of Squares methodology. Remember, variance does not relate directly to the mean, so you cannot evaluate a variance number directly, so you should use variance to see how individual numbers relate to each other within a data set. Outliers (data points far from the mean) gain added significance with variance as well. Standard Deviation tends to be the most useful Measure of Dispersion, as it relates directly to the mean, and can be used to compare the spreads of various data sets. Remember, your stats programs will help you, and many online resources will walk you through any calculation. If you have any questions, shoot me a comment, and I will answer it for you. See you next time as we dig a bit deeper into IQR. 😊

Processing…
Success! You're on the list.
Try Amazon Prime 30-Day Free Trial
Create Amazon Business Account

Bibliography

Kubiak, T. a. (2017). The Certified Six Sigma Black Belt Handbook Third Edition. Milwaukee: ASQ Quality Press.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.