| set_na {sjmisc} | R Documentation |
This function replaces specific values of variables with NA.
set_na_if() is a scoped variant of set_na(), where values
will be replaced only with NA's for those variables that match the logical
condition of predicate.
set_na(x, ..., na, drop.levels = TRUE, as.tag = FALSE) set_na_if(x, predicate, na, drop.levels = TRUE, as.tag = FALSE)
x |
A vector or data frame. |
... |
Optional, unquoted names of variables that should be selected for
further processing. Required, if |
na |
Numeric vector with values that should be replaced with NA values, or a character vector if values of factors or character vectors should be replaced. For labelled vectors, may also be the name of a value label. In this case, the associated values for the value labels in each vector will be replaced with NA (see 'Examples'). |
drop.levels |
Logical, if |
as.tag |
Logical, if |
predicate |
A predicate function to be applied to the columns. The
variables for which |
set_na() converts all values defined in na with
a related NA or tagged NA value (see tagged_na).
Tagged NAs work exactly like regular R missing values
except that they store one additional byte of information: a tag,
which is usually a letter ("a" to "z") or character number ("0" to "9").
Furthermore, see also 'Details' in get_na.
x, with all values in na being replaced by NA.
If x is a data frame, the complete data frame x will
be returned, with NA's set for variables specified in ...;
if ... is not specified, applies to all variables in the
data frame.
Labels from values that are replaced with NA and no longer used will be
removed from x, however, other value and variable label
attributes are preserved. For more details on labelled data,
see vignette Labelled Data and the sjlabelled-Package.
replace_na to replace NA's with specific
values, rec for general recoding of variables and
recode_to for re-shifting value ranges. See
get_na to get values of missing values in
labelled vectors.
# create random variable
dummy <- sample(1:8, 100, replace = TRUE)
# show value distribution
table(dummy)
# set value 1 and 8 as missings
dummy <- set_na(dummy, na = c(1, 8))
# show value distribution, including missings
table(dummy, useNA = "always")
# add named vector as further missing value
set_na(dummy, na = c("Refused" = 5), as.tag = TRUE)
# see different missing types
library(haven)
library(sjlabelled)
print_tagged_na(set_na(dummy, na = c("Refused" = 5), as.tag = TRUE))
# create sample data frame
dummy <- data.frame(var1 = sample(1:8, 100, replace = TRUE),
var2 = sample(1:10, 100, replace = TRUE),
var3 = sample(1:6, 100, replace = TRUE))
# set value 2 and 4 as missings
dummy %>% set_na(na = c(2, 4)) %>% head()
dummy %>% set_na(na = c(2, 4), as.tag = TRUE) %>% get_na()
dummy %>% set_na(na = c(2, 4), as.tag = TRUE) %>% get_values()
data(efc)
dummy <- data.frame(
var1 = efc$c82cop1,
var2 = efc$c83cop2,
var3 = efc$c84cop3
)
# check original distribution of categories
lapply(dummy, table, useNA = "always")
# set 3 to NA for two variables
lapply(set_na(dummy, var1, var3, na = 3), table, useNA = "always")
# drop unused factor levels when being set to NA
x <- factor(c("a", "b", "c"))
x
set_na(x, na = "b", as.tag = TRUE)
set_na(x, na = "b", drop.levels = FALSE, as.tag = TRUE)
# set_na() can also remove a missing by defining the value label
# of the value that should be replaced with NA. This is in particular
# helpful if a certain category should be set as NA, however, this category
# is assigned with different values accross variables
x1 <- sample(1:4, 20, replace = TRUE)
x2 <- sample(1:7, 20, replace = TRUE)
x1 <- set_labels(x1, labels = c("Refused" = 3, "No answer" = 4))
x2 <- set_labels(x2, labels = c("Refused" = 6, "No answer" = 7))
tmp <- data.frame(x1, x2)
get_labels(tmp)
table(tmp, useNA = "always")
get_labels(set_na(tmp, na = "No answer"))
table(set_na(tmp, na = "No answer"), useNA = "always")
# show values
tmp
set_na(tmp, na = c("Refused", "No answer"))