Contingency tables are a very good way to summarize discrete data. They are quite easy to construct and reasonably easy to understand. However, there are many nuances with tables and care should be taken when making conclusions related to the data. Here are just a few thoughts on the topic.
Dealing with sparse data
On one end of the spectrum there is a struggle to deal with the mass amounts of available data and trying to make sense of data in the petabyte (and larger) range. At the other end of the spectrum lacking sufficient data has its own problems.
Collapsing row or column levels
Sometimes when working with discrete data if certain factor levels lack adequate data then it may be possible to combine the factor levels together. This may be done fairly easily with 4 and 5-scale Likert scales and, in fact, happens quite frequently. Taking this approach may allow for sufficient data to make conclusions without violating underlying assumption. The following tables of some random data show how cells can be collapse such that the basic assumptions are met.
18-29 | 30-45 | 46-59 | 60+ | |
Very Strongly Agree | 1 | 1 | 12 | 10 |
Strongly Agree | 17 | 13 | 16 | 18 |
Undecided | 13 | 6 | 15 | 2 |
Strongly Disagree | 10 | 7 | 8 | 19 |
Very Strongly Disagree | 0 | 11 | 10 | 2 |
18-45 | 46+ | |
Agree | 32 | 56 |
Undecided | 19 | 17 |
Disagree | 28 | 39 |
Sanity Checks
Though it’s not a formal test just doing some sanity checks and some other sensitivity analysis is a good way to check the stability of the test. If one can take a single observation and move it to a different cell and that changes the decision then one should reevaluate the criteria for making a conclusion.
Using a basic chi square test (though other statistical tests would help with this problem, including the chi square correction for continuity) gives a p-value of .0363 for the following table of some made-up data and would be considered significant at . However, by simply moving one observation from the Cold/Fast group to the Cold/Slow group the p-value of .1575 is no longer significant at . The volatility of the data is suspect and decisions should be taken with caution.
Fast | Slow | Total | |
Hot | 7 | 2 | 9 |
Cold | 1 | 4 | 5 |
Total | 8 | 6 | 14 |
Fast | Slow | Total | |
Hot | 7 | 2 | 9 |
Cold | 2 | 3 | 5 |
Total | 9 | 5 | 14 |
Sparse Data Tests
There are many tests to handle many different categorical data situations. Listed here are a few of the common approaches.
Chi Square With Yates Correction
Sometimes decreasing the chi square statistics (and increasing the p-value) is sufficient for the specific case of a 2 x 2 table. In R, for example, this is applied by default.
Fisher’s Exact Test
This is often the immediate fall back for 2 x 2 tables. This test is based on the hypergeometric distribution. However, one important rule for this test is that it is conditioned on the marginal totals. An example counter to this rule is to take a random sample of 15 people. Suppose 5 are male and 10 are female. Here the chi square starts to break down. But the other problem is that Fishers Exact Test calls for the marginals to be fixed and that is not the case. If another random sample of 15 people is selected we could get a different number of males and females.
Fisher’s Exact Test was developed in a time (1934) when a computer just wasn’t available to play around with really complex hypergeometric distributions. A 2 x 2 table was really the only feasible sized table. Consequently, Fisher’s Exact Test was designed for 2 x 2 tables but can be used on any m x n sized table.
So why not always use Fisher’s Exact Test? At some point the two begin to converge and using the exact test may just be too exact. Alan Agresti and Brent Coull write an article here (pdf) that discusses this topic in the context of interval estimation.
Barnard Test
Is similar to the Fisher Test but this test overcomes the problem of conditioning on the marginal. Like many tests this applies only to a 2 x 2 table.
McNemar Exact Test
McNemar’s test is used when the data are correlated. For example, a matched pairs design where there is a before and after treatment or when a question is asked on a repeat survey. Like Fishers Test this provide options for smaller sample sizes.
These are only a few of the options available and great care should be taken in any analysis but smaller sample sizes add a bit more complexity. There are many other options including logistic regression as well as other non-parametric tests that are available for small tables.
Examples
Here is some R code that shows some of the tests I described: Chi Square, Fishers Exact Test, Barnards Test, and McNemars Exact Test.
x1 = matrix(c(7, 1, 2, 4), nrow = 2, dimnames = list(c("Hot", "Cold"), c("Fast", "Slow"))) x2 = matrix(c(7, 2, 2, 3), nrow = 2, dimnames = list(c("Hot", "Cold"), c("Fast", "Slow"))) chisq.test(x1, correct=FALSE) chisq.test(x2, correct=FALSE) fisher.test(x1, alternative="two.sided", conf.int=TRUE, conf.level=0.95) fisher.test(x2, alternative="two.sided", conf.int=TRUE, conf.level=0.95) x3 = matrix(c(50, 5, 3, 15), nrow = 2, dimnames = list("Replication 1" = c("Hot", "Cold"), "Replication 2" = c("Hot", "Cold"))) mcnemar.test(x3, correct=TRUE) library(Barnard) barnardw.test(x1[1,1],x1[1,2],x1[2,1],x1[2,2]) library(ade4) table.cont(x1, csi = 2, col.labels = colnames(x1), clabel.r = 1.5, clabel.c = 1.5)