Basic QA Statistics Series(Part 5)- Basic Histogram

REFLECTION: FOR STUDENTS: When that graph pops up showing you data in histogram, pay closer attention to everything the graph is conveying, because effective conveyance of data is the future
FOR ACADEMICS: Teach your students how to use visual data graphics, and correct them when they slip up. From teachers to the boardroom, being able to construct a histogram for a presentation is a vital skill for information conveyance.
FOR PROFESSIONALS/PRACTITIONERS: Excel and Minitab do the job, but always remember the underlying theory behind the graphs for when the software goes down, or you need to do it quickly without a computer.
Foundation
As Promised from the last post from this series, we will now delve a little bit into histograms. The primary purpose of a histogram is to provide a straightforward graphic representation of the distribution of data. I’m sure everyone has heard the term “a picture speaks a thousand words.” To demonstrate this, I will show you three histograms and think you will see before you read any caption which histogram looks like useful data. Sample data should appear pretty much like a bell curve to be declared “normal.”

When the “Tail” is to the left the data is left skewed- and look at that clear outlier bin

A histogram with an almost perfectly normal distribution

When the “Tail” is to the left the data is right skewed
The histogram is a quick communication of the state of the data. When you see the strong left or right skew, you must investigate the outliers and determine why you have so many.
Constructing a Histogram from your Data
To construct a histogram from a continuous variable, you need to determine the amount of data to be used. If you were researching problems with a production line, Cost would be your horizontal, split into bins, with the recommended number of bins equal to √n (n being the number of samples), and the bins having set boundaries. Fifty data points should be your minimum. Each bin will separate the data into classes based upon frequency, but the histogram will not show you the raw data, only represent the frequency distribution. I would suggest familiarizing yourself with your company’s statistical software so that everyone uniformly performs the analysis. Having the statistical guidelines per the software will save you in some auditing situations. Minitab, Excel, and many others provide straightforward access to histogram construction. (Kubiak, 2017) Most software equalize the width of the bars, but the way I have seen the width determined by hand most often is:
- Determine # of Bars to use based upon the sample size
- Sample size of 100 or less: 7-10 Bars
- Sample size of 100-200: 11-15 Bars
- Sample size of 201 or more: 13-20 Bars
- Choose # of Bars to use
- Width(W) = Overall Range of Data (R) / # of Bars(B)
- W=R/B.
- Keep adding W to the previous bar to find the lower edge of the next bar, starting from 0
(Tague, 2005)
Conclusion
Histograms are kind of like a way to count your data frequency of occurrence within set boundaries, and then show graphically how your data is distributed. Always remember that if a histogram is constructed with too many or too few bins, it can be manipulated misleadingly. Always check the numbers yourself! This tool is one of the Seven Basic Quality Tools and meant to be used to help flag issues like outliers or non-normal data. It is not something that can solve a problem on its own, but a tool that enables you to understand what the data is telling you. The next post we cover will talk about another visual stat tool- the box and whisker diagram (for any cat lovers 😊).
Bibliography
Kubiak, T. a. (2017). The Certified Six Sigma Black Belt Handbook Third Edition. Milwaukee: ASQ Quality Press.
Tague, N. R. (2005). The Quality Tool Box. Milwaukee: Quality Press.
Hypothesis Testing

REFLECTION: FOR STUDENTS: Always remember to use data and analysis to make decisions. Rely upon your own critical thinking skills, and do not allow your team to be derailed by groupthink.
FOR ACADEMICS: Teach critical thinking skills first, not expected behaviors. Without each individual able to independently and confidently voice objection to the group view, there can be no growth.
FOR PROFESSIONALS/PRACTITIONERS: Choosing the right hypothesis test can be daunting. It’s always best to understand, but if you are using statistical software like Minitab, never hesitate to use the software to choose your path based on the data and then double-check with a statistician to be certain you are taking the correct path.
Basic Terminology
(Don’t worry- not going to do a deep dive into statistics, just a basic application of hypothesis testing)
H0: is a test of statistical significance called the null hypothesis. The test of significance is designed to assess the strength of the evidence against the null hypothesis. Usually, the null hypothesis is a statement of “no effect” or “no difference” symbolized as H0. (The one we hope to disprove and usually commonly accepted)
Ha: This symbol represents the alternative hypothesis- the one for which we want to develop supporting evidence and prove and should usually be the opposite (inverse) of the null hypothesis.
α-value: alpha level or “significance level”- By definition, the alpha level is the probability of rejecting the null hypothesis when the null hypothesis is correct. Translation: It’s the probability of making a wrong decision.
Confidence Interval (also CI): CI provides the boundaries for an unknown parameter of a population with a specified degree of confidence that the parameter falls within the interval. CI is equal to 1-α and the typical levels of confidence used to test a hypothesis are 0.99, 0.95, and 0.90
Parameter: summary description of a fixed characteristic or measure of the target population. A Parameter denotes the actual value that would be obtained if a census rather than a sample were undertaken.
[Ex: Mean (μ), Variance (σ²), Standard Deviation (σ), Proportion (π)]
Population: Population is a collection of objects that we want to study/test. The collection of objects could be Cities, Students, Factories, Parts, etc. It depends on the study at hand.
In the real world, it isn’t effortless to get complete information about a population. Therefore, we draw a sample out of that population and derive the same statistical measures mentioned above. These measures are called Sample Statistics.
Statistic– a summary description of a characteristic or measure of the sample. The Sample Statistic is used as an estimate of the population parameter.
[Ex: Sample Mean (x̄), Sample Variance (S²), Sample Standard Deviation (S), Sample Proportion (p)]
p-value: Probability of obtaining a result as extreme as, or more extreme than, the result obtained when the null hypothesis is correct- Ranges from 0 to 1 (obtained as a result of several different types of hypothesis tests) (Kubiak, 2017) (Crossley, 2008)(Minitab Editor, 2012)
Testing
Fundamentally, Hypothesis testing is a test of significance and tests whether events occur by chance or not. Statistically, a sample is drawn from a population, and a statistic is computed from that sample. If that statistic is a mean, the hypothesis tests whether the mean occurred by chance at some specified level of significance. There are many different testing methods available based on the data available. Still, there is always a chance that even with a flawless analysis of a sample, the conclusion will yield a false result relative to the population. There are two types of errors that can arise when testing a hypothesis-
Type I Error: Occurs when we reject the null hypothesis that is true (probability of Type I Error is equal to α).
Type II Error: Occurs when we fail to reject a false null hypothesis (probability of Type II Error is equal to 1-α).

H0 is “false but not rejected Type II or ß error
Interpreting Hypothesis Test Statistics
Confidence level + alpha = 1
As you increase alpha, you both increase the probability of incorrectly rejecting the null hypothesis and decrease your confidence level.
If the p-value is low, the null must go.
If the p-value is below the alpha—the risk you’re willing to take of making a wrong decision—then you reject the null hypothesis “if the p-value is low, the null must go.” If the p-value exceeds alpha, we fail to reject the null hypothesis. Another way to remember it is, “if the p-value is high, the null will fly.”
The confidence interval and p-value will always lead you to the same conclusion-
If the p-value is less than alpha (it is significant), then the confidence interval will NOT contain the hypothesized mean/variance; however, if the p-value is greater than alpha (it is not significant), then the confidence interval will include the hypothesized mean/variance. (Kubiak, 2017) (Crossley, 2008)(Minitab Editor, 2012)
Deciding upon the correct test method:

It is always best to understand the potentially daunting task of hypothesis testing, and sometimes critical, cut never fear. Most modern statistical software (even many Excel add-ons) will help guide you down the proper path as long as you have the data, know what kind of data you have, and have determined if it is normal or non-normal.
Conclusion
As promised, this was not a deep dive. The more you know about statistics, the more likely you will draw the correct conclusion when you evaluate your test statistics against your hypothesis. It is critical to remember that while you are doing mathematical gymnastics or navigating Minitab, the hypotheses are not really about the data; instead, you should think about the processes producing the data. Always understand the implications of the hypothesis test on the associated process(es) in order to take the correct actions.
Bibliography
Crossley, M. L. (2008). The Desk Reference of Statistical Quality Methods (2nd Ed). Milwaukee, WI: ASQ Quality Press.
CSSBB Primer. (2014). West Terre Haute , Indiana: Quality Council of Indiana.
Kubiak, T. a. (2017). The Certified Six Sigma Black Belt Handbook Third Edition. Milwaukee: ASQ Quality Press.
Minitab Editor. (2012, October 01). https://blog.minitab.com/blog/alphas-p-values-confidence-intervals-oh-my. Retrieved from Minitab Blog: https://blog.minitab.com/blog/alphas-p-values-confidence-intervals-oh-my