Monday, 25 March 2013

ANOVA





In 1920, Sir Ronald A. Fisher invented a statistical way to compare data sets. Fisher called his method the analysis of variance, which was later dubbed an ANOVA. This method eventually evolved into Six Sigma data set comparisons. 

An ANOVA can be, and ought to be, used to evaluate differences between data sets. It can be used with any number of data sets, recorded from any process. The data sets need not be equal in size. Data sets suitable for an ANOVA can be as small as three or four numbers, to infinitely large sets of numbers. 

The difficulty of calculating ANOVAs by hand prevented most people from using this Six Sigma tool until the 1990s. Now, using software like Microsoft Excel, anyone and everyone can quickly determine whether differences in a set of counts or measurements were most likely due to chance variation. Or, can we say it should be more likely attributed to a “combination of factors.” These variables are often labeled factor X, Y, or Z. 




In practice, Analysis of Variance is a statistical test used to determine if more than two population means are equal. 

The test uses the F-distribution (probability distribution) function and information about the variances of each population (within) and grouping of populations (between) to help decide if variability between and within each populations are significantly different.So the method of ANOVA test the hypotheses that: 


H0: μ1=μ2=μ3=...=μk or Ha: Not all the means are equal 


a) Know the purpose of the analysis of variance test: 

The analysis of variance (ANOVA) test statistics is used to test if more than 2 population means are equal. 

b) Know the difference between the within-sample estimate of the variance and the between-sample estimate of the variance and how to calculate them: 

When comparing two or more populations there are several ways to estimate the variance: 
  • The within-sample or treatment variance or variation. 
  • The within-sample variance (often called the unexplained variation). 
  • The between-sample variance or error. 
  • The between-sample variance is associated with the explained variation of our experiment. 
  • The F-Distribution is a continuous probability distribution and arises frequently as the null distribution of a test statistic. 

c) Know the properties of an F-Distribution: 

There is an infinite number of F-Distribution based on the combination of alpha significance level the degree of freedom (df1) of the within-sample variance and the degree of freedom (df1) of the between-sample variance. 

The F-Distribution is the ratio of the between-sample estimate of σ2 and the within-sample estimate. If there are k number of population and n number of data values of the all the sample, then the degree of freedom of the within-sample variance, df1=k-1 and the degrees of freedom of the between-sample variance is given has df2=n-k.The graph of an F probability distribution starts a 0 and extends indefinitely to the right. 

It is skewed to the right similar to the graph shown below: 




d) Know how to interpret the data in the ANOVA table against the null hypothesis: 
  • The ANOVA table program computes the necessary statistics for evaluating the null hypothesis that the means are equal: H0: μ1=μ2=μ3=...=μk 
  • Use the degrees of freedom and an alpha significance level to obtain the expected F-Distribution statistics from the lookup table or from the ANOVA program. 
  • Acceptance Criteria for the Null Hypothesis: If the F-statistics computed in the ANOVA table is less than the F-table statistics or the P-value if greater than the alpha level of significance, then there is not reason to reject the null hypothesis that all the means are the same: That is accept H0 if: 
  • F-Statistics < F-table or P-value > α 


e) Know the procedure for testing the null hypothesis that the mean for more than two populations are equal: 
  • Formulate Hypotheses: H0: μ1=μ2=μ3=...=μk and Ha: Not all the means are equal. 
  • Select the F-Statistics Test for equality of more than two means. 
  • Obtain or decide on a significance level for alpha, say α=0.05. 
  • Compute the test statistics from the ANOVA table. 
  • Identify the critical Region: The region of rejection of H0 is obtained from the F-table with alpha and degrees of freedom (k-1, n-k). 
  • Make a decision: That is accept H0 if: 
  • F-Statistics < F-table or P-value > α 

You can try out this statistic test here or check its performance visually here

— Sources: pindling.org, University of Utah, Isixsigma, ScienceDirect.



No comments: