chisq.test {stats} | R Documentation |
chisq.test
performs chi-squared tests on contingency tables.
chisq.test(x, y = NULL, correct = TRUE, p = rep(1/length(x), length(x)), simulate.p.value = FALSE, B = 2000)
x |
a vector or matrix. |
y |
a vector; ignored if x is a matrix. |
correct |
a logical indicating whether to apply continuity correction when computing the test statistic. |
p |
a vector of probabilities of the same length of x . |
simulate.p.value |
a logical indicating whether to compute p-values by Monte Carlo simulation. |
B |
an integer specifying the number of replicates used in the Monte Carlo simulation. |
If x
is a matrix with one row or column, or if x
is a
vector and y
is not given, x
is treated as a
one-dimensional contingency table. In this case, the hypothesis
tested is whether the population probabilities equal those in
p
, or are all equal if p
is not given.
If x
is a matrix with at least two rows and columns, it is
taken as a two-dimensional contingency table, and hence its entries
should be nonnegative integers. Otherwise, x
and y
must
be vectors or factors of the same length; incomplete cases are
removed, the objects are coerced into factor objects, and the
contingency table is computed from these. Then, Pearson's chi-squared
test of the null that the joint distribution of the cell counts in a
2-dimensional contingency table is the product of the row and column
marginals is performed. If simulate.p.value
is FALSE
,
the p-value is computed from the asymptotic chi-squared distribution
of the test statistic; continuity correction is only used in the
2-by-2 case if correct
is TRUE
. Otherwise, if
simulate.p.value
is TRUE
, the p-value is computed by
Monte Carlo simulation with B
replicates. This is done by
random sampling from the set of all contingency tables with given
marginals, and works only if the marginals are positive. (A C
translation of the algorithm of Patefield (1981) is used.)
A list with class "htest"
containing the following
components:
statistic |
the value the chi-squared test statistic. |
parameter |
the degrees of freedom of the approximate
chi-squared distribution of the test statistic, NA if the
p-value is computed by Monte Carlo simulation. |
p.value |
the p-value for the test. |
method |
a character string indicating the type of test performed, and whether Monte Carlo simulation or continuity correction was used. |
data.name |
a character string giving the name(s) of the data. |
observed |
the observed counts. |
expected |
the expected counts under the null hypothesis. |
residuals |
the Pearson residuals, (observed - expected)
/ sqrt(expected) . |
Patefield, W. M. (1981) Algorithm AS159. An efficient method of generating r x c tables with given row and column totals. Applied Statistics 30, 91–97.
## Not really a good example chisq.test(InsectSprays$count > 7, InsectSprays$spray) # Prints test summary chisq.test(InsectSprays$count > 7, InsectSprays$spray)$obs # Counts observed chisq.test(InsectSprays$count > 7, InsectSprays$spray)$exp # Counts expected under the null ## Effect of simulating p-values x <- matrix(c(12, 5, 7, 7), nc = 2) chisq.test(x)$p.value # 0.4233 chisq.test(x, simulate.p.value = TRUE, B = 10000)$p.value # around 0.29! ## Testing for population probabilities ## Case A. Tabulated data x <- c(A = 20, B = 15, C = 25) chisq.test(x) chisq.test(as.table(x)) # the same ## Case B. Raw data x <- trunc(5 * runif(100)) chisq.test(table(x)) # NOT 'chisq.test(x)'!