spark.fpGrowth {SparkR} | R Documentation |
A parallel FP-growth algorithm to mine frequent itemsets.
spark.fpGrowth
fits a FP-growth model on a SparkDataFrame. Users can
spark.freqItemsets
to get frequent itemsets, spark.associationRules
to get
association rules, predict
to make predictions on new data based on generated association
rules, and write.ml
/read.ml
to save/load fitted models.
For more details, see
FP-growth.
spark.fpGrowth(data, ...) spark.freqItemsets(object) spark.associationRules(object) ## S4 method for signature 'SparkDataFrame' spark.fpGrowth(data, minSupport = 0.3, minConfidence = 0.8, itemsCol = "items", numPartitions = NULL) ## S4 method for signature 'FPGrowthModel' spark.freqItemsets(object) ## S4 method for signature 'FPGrowthModel' spark.associationRules(object) ## S4 method for signature 'FPGrowthModel' predict(object, newData) ## S4 method for signature 'FPGrowthModel,character' write.ml(object, path, overwrite = FALSE)
data |
A SparkDataFrame for training. |
... |
additional argument(s) passed to the method. |
object |
a fitted FPGrowth model. |
minSupport |
Minimal support level. |
minConfidence |
Minimal confidence level. |
itemsCol |
Features column name. |
numPartitions |
Number of partitions used for fitting. |
newData |
a SparkDataFrame for testing. |
path |
the directory where the model is saved. |
overwrite |
logical value indicating whether to overwrite if the output path already exists. Default is FALSE which means throw exception if the output path exists. |
spark.fpGrowth
returns a fitted FPGrowth model.
A SparkDataFrame
with frequent itemsets.
The SparkDataFrame
contains two columns:
items
(an array of the same type as the input column)
and freq
(frequency of the itemset).
A SparkDataFrame
with association rules.
The SparkDataFrame
contains three columns:
antecedent
(an array of the same type as the input column),
consequent
(an array of the same type as the input column),
and condfidence
(confidence).
predict
returns a SparkDataFrame containing predicted values.
spark.fpGrowth since 2.2.0
spark.freqItemsets(FPGrowthModel) since 2.2.0
spark.associationRules(FPGrowthModel) since 2.2.0
predict(FPGrowthModel) since 2.2.0
write.ml(FPGrowthModel, character) since 2.2.0
## Not run: raw_data <- read.df( "data/mllib/sample_fpgrowth.txt", source = "csv", schema = structType(structField("raw_items", "string"))) data <- selectExpr(raw_data, "split(raw_items, ' ') as items") model <- spark.fpGrowth(data) # Show frequent itemsets frequent_itemsets <- spark.freqItemsets(model) showDF(frequent_itemsets) # Show association rules association_rules <- spark.associationRules(model) showDF(association_rules) # Predict on new data new_itemsets <- data.frame(items = c("t", "t,s")) new_data <- selectExpr(createDataFrame(new_itemsets), "split(items, ',') as items") predict(model, new_data) # Save and load model path <- "/path/to/model" write.ml(model, path) read.ml(path) # Optional arguments baskets_data <- selectExpr(createDataFrame(itemsets), "split(items, ',') as baskets") another_model <- spark.fpGrowth(data, minSupport = 0.1, minConfidence = 0.5, itemsCol = "baskets", numPartitions = 10) ## End(Not run)