seqdistmc {TraMineR}R Documentation

Multichannel distances between sequences

Description

Compute multichannel pairwise optimal matching (OM) distances between sequences by deriving the substitution costs from the costs of the single channels. Works with OM and its following variants: distance based on longest common subsequence (LCS), Hamming distance (HAM), and Dynamic Hamming distance (DHD).

Usage

seqdistmc(channels, method, norm="none", indel=1, sm=NULL,
     with.missing=FALSE, full.matrix=TRUE, link="sum", cval=2,
     miss.cost=2, cweight=NULL, what="diss") 

Arguments

channels

A list of state sequence objects defined with the seqdef function, each state sequence object corresponding to a "channel".

method

a character string indicating the metric to be used. One of "OM" (Optimal Matching), "LCS" (Longest Common Subsequence), "HAM" (Hamming distance), "DHD" (Dynamic Hamming distance).

norm

String. Default: "none". The normalization method to use. See seqdist.

indel

A vector with an insertion/deletion cost for each channel (OM method).

sm

A list with a substitution-cost matrix for each channel (OM, HAM and DHD method) or a list of method names for generating the substitution-costs (see seqsubm).

with.missing

Must be set to TRUE when sequences contain non deleted gaps (missing values) or when channels are of different length. See details.

full.matrix

If TRUE (default), the full distance matrix is returned. If FALSE, an object of class dist is returned.

link

One of "sum" or "mean". Method to compute the "link" between channels. Default is to sum the substitution costs.

cval

Substitution cost for "CONSTANT" matrix, see seqsubm.

miss.cost

Missing values substitution cost, see seqsubm.

cweight

A vector of channel weights. Default is 1 (same weight for each channel).

what

Character string. What output should be returned? One of "diss", "sm", "seqmc".

Details

The seqdistmc function builds a state sequence by combining the channels, derives the multichannel indel and substitution costs from the indel and substitution costs of each channel (following the strategy proposed by Pollock, 2007), and computes the multichannel distances using the multichannel distances. The available metrics (see 'method' option) are optimal matching ("OM"), longest common subsequence ("LCS"), Hamming distance ("HAM"), and Dynamic Hamming Distance ("DHD"). See seqdist for more information about distances between sequences.

Normalization may be useful when dealing with sequences that are not all of the same length. For details on the applied normalization, see seqdist.

Value

When what="diss", a matrix of pairwise distances between multichannel sequences.
When what="sm", the matrix of substitution costs with three attributes: indel the indel, alphabet the alphabet of the combined state sequences, and cweight the channel weight used.
When seqmc, the combined state sequence object.

Author(s)

Matthias Studer (with Gilbert Ritschard for the help page)

References

Pollock, Gary (2007) Holistic trajectories: a study of combined employment, housing and family careers by using multiple-sequence analysis. Journal of the Royal Statistical Society: Series A 170, Part 1, 167–183.

See Also

seqsubm, seqdef, seqdist.

Examples

data(biofam)

## Building one channel per type of event left, children or married
bf <- as.matrix(biofam[, 10:25])
children <-  bf==4 | bf==5 | bf==6
married <- bf == 2 | bf== 3 | bf==6
left <- bf==1 | bf==3 | bf==5 | bf==6

## Building sequence objects
child.seq <- seqdef(children)
marr.seq <- seqdef(married)
left.seq <- seqdef(left)

## Using transition rates to compute substitution costs on each channel
mcdist <- seqdistmc(channels=list(child.seq, marr.seq, left.seq),
 	method="OM", sm =list("TRATE", "TRATE", "TRATE"))

## Using a weight of 2 for children channel and specifying substitution-cost
smatrix <- list()
smatrix[[1]] <- seqsubm(child.seq, method="CONSTANT")
smatrix[[2]] <- seqsubm(marr.seq, method="CONSTANT")
smatrix[[3]] <- seqsubm(left.seq, method="TRATE")
mcdist2 <- seqdistmc(channels=list(child.seq, marr.seq, left.seq),
	method="OM", sm =smatrix, cweight=c(2,1,1)) 

[Package TraMineR version 2.2-2 Index]