- Introduction
- Motivation
- Design
- Templates
- Getting started
July 22, 2015
systemPipeRsystemPipeRWorkflow steps with input/output file operations are controlled by SYSargs objects.
Each SYSargs instance is constructed from a targets file and a param file.
Only input provided by user is initial targets file. Subsequent targets instances are created automatically.
Any number of predefined or custom workflow steps are supported.
systemPipeRdata: template workflowssystemPipeR.rsubread, Bowtie2/Tophat2
edgeR or DESeq2gsnap, bwaVariantTools, GATK, BCFtoolsVariantTools and VariantAnnotationVariantAnnotationrsubread, Bowtie2MACS2, BayesPeakTophat2 (or any other RNA-Seq aligner)Workflow templates for:
Install required packages
source("http://bioconductor.org/biocLite.R")
biocLite("systemPipeR")
biocLite("tgirke/systemPipeRdata", build_vignettes=TRUE, dependencies=TRUE) # From github
Load packages and accessing help
library("systemPipeR")
library("systemPipeRdata")
Access help
library(help="systemPipeR")
vignette("systemPipeR")
Targets file organizes samplesStructure of targets file for single-end (SE) library
targetspath <- system.file("extdata", "targets.txt", package="systemPipeR")
read.delim(targetspath, comment.char = "#")[1:3,1:5]
## FileName SampleName Factor SampleLong Experiment ## 1 ./data/SRR446027_1.fastq M1A M1 Mock.1h.A 1 ## 2 ./data/SRR446028_1.fastq M1B M1 Mock.1h.B 1 ## 3 ./data/SRR446029_1.fastq A1A A1 Avr.1h.A 1
Structure of targets file for paired-end (PE) library
targetspath <- system.file("extdata", "targetsPE.txt", package="systemPipeR")
read.delim(targetspath, comment.char = "#")[1:3,1:4]
## FileName1 FileName2 SampleName Factor ## 1 ./data/SRR446027_1.fastq ./data/SRR446027_2.fastq M1A M1 ## 2 ./data/SRR446028_1.fastq ./data/SRR446028_2.fastq M1B M1 ## 3 ./data/SRR446029_1.fastq ./data/SRR446029_2.fastq A1A A1
SYSargs: targets & paramSYSargs instances are constructed from a targets file and a param file. The param file contains the settings for running command-line software.
parampath <- system.file("extdata", "tophat.param", package="systemPipeR")
(args <- suppressWarnings(systemArgs(sysma=parampath, mytargets=targetspath)))
## An instance of 'SYSargs' for running 'tophat' on 18 samples
Slots and accessor functions have the same names
names(args)[c(5,8,13)]
## [1] "software" "reference" "sysargs"
Return command-line arguments for given software, here Tophat2 for 1st sample.
sysargs(args)[1]
## tophat -p 4 -o SRR446027_1.fastq.tophat tair10.fasta SRR446027_1.fastq .SRR446027_2.fastq
Run command-line tool, here Tophat2, on single machine. Command-line tool needs to be installed for this.
runCommandline(args)
Submit command-line or R processes to a computer cluster with a queueing system.
clusterRun(args, ...)
The last step requires additional resource allocation arguments. For details please visit the main manual here.
Generate workflow template, e.g. "rnaseq", "varseq" or "chipseq"
genWorkenvir(workflow="varseq", mydirname=NULL)
setwd("varseq")
Command-line alternative for generating workflow environments
$ echo 'library(systemPipeRdata);
genWorkenvir(workflow="varseq", mydirname=NULL)' | R --slave
The workflow templates generated by genWorkenvir contain the following preconfigured directory structure:
workflow_name/ # *.Rnw/*.Rmd scripts, targets file, etc.
param/ # parameter files for command-line software
data/ # inputs e.g. FASTQ, reference, annotations
results/ # analysis result files
The above structure can be customized as needed, but for first-time users it is easier to keep changes to a minimum.
*.Rnw template file (or *.Rmd or *.R versions).$ make -B
Analysis reports in PDF or HTML format are autogenerated when running a workflow using standard R resources for scientific report generation including knitr and rmarkdown, respectively.
Integration of ReportingTools is also straightforward.
.param filesGirke, Thomas. 2014. “systemPipeR: NGS Workflow and Report Generation Environment.” UC Riverside. https://github.com/tgirke/systemPipeR.
Huber, Wolfgang, Vincent J Carey, Robert Gentleman, Simon Anders, Marc Carlson, Benilton S Carvalho, Hector Corrada Bravo, et al. 2015. “Orchestrating High-Throughput Genomic Analysis with Bioconductor.” Nat. Methods 12 (2): 115–21. doi:10.1038/nmeth.3252.