fanny {cluster} | R Documentation |
Computes a fuzzy clustering of the data into k
clusters.
fanny(x, k, diss = inherits(x, "dist"), metric = "euclidean", stand = FALSE)
x |
data matrix or data frame, or dissimilarity matrix, depending on the
value of the diss argument.
In case of a matrix or data frame, each row corresponds to an observation, and each column corresponds to a variable. All variables must be numeric. Missing values (NAs) are allowed. In case of a dissimilarity matrix, x is typically the output
of daisy or dist . Also a vector of
length n*(n-1)/2 is allowed (where n is the number of observations),
and will be interpreted in the same way as the output of the
above-mentioned functions. Missing values (NAs) are not allowed.
|
k |
integer giving the desired number of clusters. It is required that 0 < k < n/2 where n is the number of observations. |
diss |
logical flag: if TRUE (default for dist or
dissimilarity objects), then x is assumed to be a
dissimilarity matrix. If FALSE, then x is treated as
a matrix of observations by variables.
|
metric |
character string specifying the metric to be used for calculating
dissimilarities between observations.
The currently available options are "euclidean" and "manhattan".
Euclidean distances are root sum-of-squares of differences, and
manhattan distances are the sum of absolute differences.
If x is already a dissimilarity matrix, then this argument will
be ignored.
|
stand |
logical; if true, the measurements in x are
standardized before calculating the dissimilarities. Measurements
are standardized for each variable (column), by subtracting the
variable's mean value and dividing by the variable's mean absolute
deviation. If x is already a dissimilarity matrix, then this
argument will be ignored. |
In a fuzzy clustering, each observation is ``spread out'' over the various
clusters. Denote by u(i,v) the membership of observation i to cluster v.
The memberships are nonnegative, and for a fixed observation i they sum to 1.
The particular method fanny
stems from chapter 4 of
Kaufman and Rousseeuw (1990).
Compared to other fuzzy clustering methods, fanny
has the following
features: (a) it also accepts a dissimilarity matrix; (b) it is
more robust to the spherical cluster
assumption; (c) it provides
a novel graphical display, the silhouette plot (see
plot.partition
).
Fanny aims to minimize the objective function
SUM_[v=1..k] (SUM_(i,j) u(i,v)^2 u(j,v)^2 d(i,j)) / (2 SUM_j u(j,v)^2)
where n is the number of observations, k is the number of clusters and d(i,j) is the dissimilarity between observations i and j.
an object of class "fanny"
representing the clustering.
See fanny.object
for details.
agnes
for background and references;
fanny.object
, partition.object
,
plot.partition
, daisy
, dist
.
## generate 25 objects, divided into two clusters, and 3 objects lying ## between those clusters. x <- rbind(cbind(rnorm(10, 0, 0.5), rnorm(10, 0, 0.5)), cbind(rnorm(15, 5, 0.5), rnorm(15, 5, 0.5)), cbind(rnorm( 3,3.5,0.5), rnorm( 3,3.5,0.5))) fannyx <- fanny(x, 2) fannyx summary(fannyx) plot(fannyx) data(ruspini) ## Plot similar to Figure 6 in Stryuf et al (1996) plot(fanny(ruspini, 5))