ztnb.mincount {preseqR}R Documentation

Estimating the expected number of species represented r or more times

Description

The function estimates the expected number of species represented at least r times in a random sample based on the initial sample using zero truncated negative binomial model.

Usage

ztnb.mincount(n, r=1, size=SIZE.INIT, mu=MU.INIT)

Arguments

n

A two-column matrix. The first column is the frequency j = 1,2,…; and the second column is n_j, the number of species with each species represented j times in the initial sample. The first column must be sorted in an ascending order.

r

A vector of positive integers. Default is 1.

size

A positive double setting the initial value of the parameter size in a negative binomial distribution for the EM algorithm. Default value is 1.

mu

A positive double setting the initial value of the parameter mu in a negative binomial distribution for the EM algorithm. Default value is 0.5.

Details

The statistical assumption is that for each species the number of individuals in a sample follows a Poisson distribution. The Poisson rate lambda obeys a latent gamma distribution. So the random variable X, which is the number of species represented x (x > 0) times, follows a zero-truncated negative binomial distribution. The unknown parameters are estimated by the function preseqR.ztnb.em. Based on the estimated distribution, we calculate the expected number of species in a random sample. Details of the estimation procedure see supplement of Daley T. and Smith AD. (2013).

Value

The constructed estimator for the number of species represneted at least r times in a sample. The input of the estimator is a vector of sampling efforts t, i.e. the relative sample sizes comparing with the initial sample. For example, t = 2 means the sample is twice the size of the initial sample.

Author(s)

Chao Deng

References

Daley, T., & Smith, A. D. (2013). Predicting the molecular complexity of sequencing libraries. Nature methods, 10(4), 325-327.

See Also

preseqR.ztnb.em

Examples

## load library
library(preseqR)

## import data
data(FisherButterflyHist)

## construct the estimator for the number of species
## represented at least once, twice or three times in a sample
ztnb.estimator <- ztnb.mincount(FisherButterflyHist, r=1:3)

## The number of species represented at least once, twice or three times
## when the sample size is 10 or 20 times of the initial sample
ztnb.estimator(c(10, 20))

[Package preseqR version 3.1.2 Index]