sits_clustering {sits}R Documentation

Find clusters in time series samples

Description

These functions support hierarchical agglomerative clustering in sits. They provide support from creating a dendrogram and using it for cleaning samples.

sits_cluster_dendro() takes a tibble containing time series and produces a sits tibble with an added "cluster" column. The function first calculates a dendrogram and obtains a validity index for best clustering using the adjusted Rand Index. After cutting the dendrogram using the chosen validity index, it assigns a cluster to each sample.

sits_cluster_frequency() computes the contingency table between labels and clusters and produces a matrix It needs as input a tibble produced by sits_cluster_dendro().

sits_cluster_clean() takes a tibble with time series that has an additional 'cluster' produced by sits_cluster_dendro() and removes labels that are minority in each cluster.

Usage

sits_cluster_dendro(
  samples = NULL,
  bands = NULL,
  dist_method = "dtw_basic",
  linkage = "ward.D2",
  k = NULL,
  palette = "RdYlGn",
  .plot = TRUE,
  ...
)

sits_cluster_frequency(samples)

sits_cluster_clean(samples)

Arguments

samples

A tibble with input set of time series

bands

Bands to be used in the clustering

dist_method

String with one of the supported distances.

linkage

String with agglomeration method to be used. Can be any 'hclust' method (see 'hclust'). Default is 'ward.D2'.

k

Desired number of clusters (overrides default value)

palette

Color palette as per 'grDevices::hcl.pals()' function.

.plot

Plot the dendrogram?

...

Additional parameters to be passed to dtwclust::tsclust() function.

Value

sits_cluster_dendro() takes a tibble containing time series and produces a sits tibble with an added "cluster" column.

sits_cluster_frequency() returns a matrix containing all frequencies of labels in clusters.

sits_cluster_clean() takes a tibble with time series that has an additional 'cluster' produced by sits_cluster_dendro() and removes labels that are minority in each cluster.

Author(s)

Rolf Simoes, rolf.simoes@inpe.br

References

"dtwclust" package (https://CRAN.R-project.org/package=dtwclust)

Examples

## Not run: 
# load a simple data set with two classes
data(cerrado_2classes)
# calculate the dendrogram and the best clusters
clusters <- sits_cluster_dendro(cerrado_2classes, bands = c("NDVI", "EVI"))
# show clusters samples frequency
sits_cluster_frequency(clusters)
# remove cluster 3 from the samples
clusters_new <- dplyr::filter(clusters, cluster != 3)
# show clusters samples frequency of the new data set
sits_cluster_frequency(clusters_new)
# clean all remaining clusters
cleaned <- sits_cluster_clean(clusters_new)
# show clusters samples frequency
sits_cluster_frequency(cleaned)

## End(Not run)

[Package sits version 0.16.2 Index]