Watchdog
1 Introduction2 Install and Configure Watchdog
2.1 Requirements
2.2 Installation
2.3 E-Mail Server Configuration
2.4 SSH Configuration
2.5 Cluster Configuration
3 Watchdog Overview
3.1 Modules
3.2 Basic XML structure
4 Detailed XML format explanation
4.1 Process blocks
4.2 Dependencies
4.3 Execution environments
4.4 Global constants
4.5 Environment variables
4.6 Mail notification
4.7 Standard streams and working directory
4.8 Task actions
4.9 Simple calculations
4.10 Multiple module search folders
4.11 Custom success and error checker
5 Creating custom modules
5.1 Input parameter definition
5.2 Output parameter definition
5.3 Binary call command and other settings
5.4 Assign a name to the new module
5.5 Putting it all together
5.6 Other matters
6 Extend Watchdog's functionality
6.1 Virtual file systems for task actions
6.2 XML Plugins
7 Docker
7.1 Install the Watchdog Docker image
7.2 Sharing of files
7.3 Port forwarding
7.4 How to use the Docker Watchdog image
7.5 Use Docker in modules
1 Introduction Here, we present Watchdog, a WMS for the automated and distributed analysis of large-scale experimental data. Watchdog is implemented in Java and is thus platform-independent.
Main features include:
- straightforward processing of replicate data
- support for distributed computer systems
- remote storage support
- customizable error detection
- manual intervention into workflow execution
- a GUI for workflow construction using pre-defined modules
- a helper script for creating new module definitions
- no restriction to specific programming languages
- provides a flexible plugin system for extending without modifying the original sources
helper_scripts/dependencyTest.shas described below. 2.2 Installation The installation of Watchdog is very easy. Simply extract the provided archive into a folder of your choice using
tar xfvz watchdog.tar.gzThe folder must be accessible for remote or cluster executors if you plan to use some.
Alternatively Watchdog can be installed automatically via conda using
conda install -c bioconda watchdog-wms. In that case be binaries are named
watchdog-cmdand
watchdog-guiwhile the rest of the files is located in
${PREFIX}/share/watchdog-wms-${VERSION}. If you want to use Watchdog with Docker, read section 7.
In the next few lines the content of each folder is explained:
- core_lib: some core functions that can be used in bash module scripts
- doc: contains Watchdog's documentation in html-format
- examples: contains the examples that are also presented in the documentation
- helper_scripts: scripts for generating new modules, configure the examples or testing of all modules
- modules: must contain all modules that should be used in workflows
- test_data: contains some test data that is used by multiple modules
- tmp: is used for Watchdog's temporary files
- webserver_data: data which is accessed by the internal webserver
- xsd: definition of the module and workflow in xsd format
helper_scripts/dependencyTest.shcan be executed. It checks if all requirements are installed that are also enforced by the module itself. Only dependencies that are defined in the bash script itself stored in the variable named
$USED_TOOLSare detected by the script. The system must be able to locate the dependencies by using the PATH variable. Moreover, all R and perl packages that are used in scripts are checked.
In order to test if all modules that provide tests work as expected on your system you can run
helper_scripts/moduleTest.sh. If you want to test the examples which are discussed in this manual, you can configure them by running:
helper_scripts/configureExamples.sh -i /path/to/install/folder/of/watchdog [-m your@mail-adress.com](mail attribute (-m) is optional)
Afterwards the configured examples will be located in
/path/to/install/folder/of/watchdog/examples/and can be executed (from the watchdog installation directory) using the following command:
./watchdog.sh -x examples/filename.xmlor alternatively
java -jar jars/watchdog.jar -x examples/filename.xml
For instance:
./watchdog.sh -x examples/workflow1_basic_information_extraction.xml
If you want to use the workflow designer (GUI), you can start it by using (from the watchdog installation directory):
./workflowDesigner.shor alternatively
java -jar jars/WatchdogDesigner.jar2.3 E-Mail Server Configuration As Watchdog will send e-mails it needs a working mail configuration. If you don't want Watchdog to send e-mails, simply don't use the
taskstag. In that case the content of the mails with be printed to the standard output stream.
By default a server listening on SMTP port 25 is expected that accepts mails without authentication. In order to use another configuration the parameter
-mailConfigof Watchdog can be used. It expects a tab-separated file that contains information on how to connect to the mail server using the SMTP protocol. If the mail server expects some authentication we strongly suggest to use a mail account that was explicitely created for the use with Watchdog as the password is stored unencrypted.
Example 1: Example mail config for a gmail account
1 mail.smtp.auth true
2 mail.smtp.host smtp.googlemail.com
3 mail.smtp.port 587
4 mail.smtp.user johns_watchdog@gmail.com
5 mail.smtp.pw r9x74l(klsab
6 mail.smtp.from johns_watchdog@gmail.com
7 mail.smtp.starttls.enable true
examples/mail_configonce the examples are configured as described above. 2.4 SSH Configuration Watchdog supports execution of tasks via ssh on remote hosts. In order to use that feature a private ssh key must be provided. It is strongly recommended that the private key is protected by a passphrase. In that case the passphrase must be entered after Watchdog was started and will be hold encrypted in memory until the passphrase is needed.
A key pair that can be used for ssh authentification can be generated using the tool
ssh-keygenthat is part of
openssh. If you need further information you can find many online tutorials that explain how to use a private key for ssh authentication. E.g. How To Set Up SSH Keys and SSH/OpenSSH/Keys 2.5 Cluster Configuration Watchdog supports cluster solutions which provide a
DRMAAjava binding. By default it is bundled with a DRMAA binding for the
sun grid engine(SGE 6.1).
The following environment variables must be set correctly in order to communicate with the SGE:
- SGE_ROOT: path to the installation folder of the SGE
- LD_LIBRARY_PATH: path to the library path of the SGE; in most cases it will be
$SGE_ROOT/lib/lx24-amd64
or$SGE_ROOT/lib/lx24-x86
- dynamically by adding arguments to the jar invocation:
- set class name of DRMAA Sessionfactory via -Dorg.ggf.drmaa.SessionFactory=classname
- add DRMAA java binding to class path via -cp
/path/to/drmaaImplementation.jar
- permanently by changing Watchdog's jar file:
- jar files can be opened and edited with every tool that supports zip files
- replace name of DRMAA Sessionfactory stored in
/META-INF/services/org.ggf.drmaa.SessionFactory
- add class files of the DRMAA java binding to Watchdog's jar file
modulesdirectory located in the root folder of the Watchdog installation. Each module is stored in its own folder and consists at least of an XSD file with the name of the module. The XSD file contains a definition of the parameters which can be set in the XML format and the tools which are executed in the background when the module is used.
Example 2: XSD definition of the sleep module
1 <?xml version="1.0" encoding="UTF-8" ?>
2 <x:schema xmlns:x="http://www.w3.org/2001/XMLSchema" xmlns:vc="http://www.w3.org/2007/XMLSchema-versioning" vc:minVersion="1.1" xmlns:xerces="http://xerces.apache.org">
3
4 <!-- definition of the task parameters -->
5 <x:complexType name="sleepTaskParameterType">
6 <x:all>
7 <x:element name="wait" type="paramWait_sleep" minOccurs="1" maxOccurs="1" />
8 </x:all>
9 </x:complexType>
10
11 ...
12
13 <!-- make task definition availible via substitution group -->
14 <x:element name="sleepTask" type="sleepTaskType" substitutionGroup="abstractTask" />
15
16 <!-- module specific parameter types -->
17 <x:complexType name="paramWait_sleep">
18 <x:simpleContent>
19 <x:restriction base="paramString">
20 <x:assertion test="matches($value, '(${[A-Za-z_]+})|($(.+))|([[({]($[A-Za-z_]+(,s*){0,1}){0,1}([0-9]+(,S*){0,1}){0,1}[])}])') or matches($value, '^[0-9]+[smhd]{0,1}$')" xerces:message="Parameter with name '{$tag}' must match [0-9]+[smhd]{0,1}." />
21 </x:restriction>
22 </x:simpleContent>
23 </x:complexType>
24
25 </x:schema>
sleepTaskParameterType(5-9).
waitis defined that must occur exactly
matches()function whereby the user must take care that the replaced value is valid with regard to the second part of the
The attribute
nameof the element in line 14 defines how the module can be referenced in the XML file. In this example the module can be called using the name
sleepTask. 3.2 Basic XML structure Tasks which should be executed by Watchdog must be defined in an XML file. In the following the structure of the XML file is presented. The expression
?Taskis used to refer to a task which is not further specified. In general this syntax is used if some attributes are valid for all classes that inherit from that class type. Within the following examples these variables are user-specific and contain therefore no concrete values:
{%INSTALL%} - path to the root installation directory of Watchdog
{%MAIL%} - email adress of the user
{%EXAMPLE_DATA%} - path to the folder in which the example data is located
You already have configured your examples by calling the script
helper_scripts/configureExamples.shas described in 2.
Example 3: Most basic XML input for Watchdog
1 <?xml version="1.0" encoding="UTF-8"?>
2 <watchdog xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="watchdog.xsd" watchdogBase="{%INSTALL%}" isTemplate="true">
3
4 <!-- begin task block and use that mail to inform the user on success or failure -->
5 <tasks mail="{%MAIL%}">
6
7 <!-- definition a simple sleep task -->
8 <sleepTask id="1" name="sleep">
9 <parameter>
10 <wait>30s</wait>
11 </parameter>
12 </sleepTask>
13 </tasks>
14 </watchdog>
sleep. Every XML file that should be parsed by Watchdog must contain a
watchdogelement as root element
watchdogBaseof it must refer to the folder in which Watchdog was installed. The attribute
isTemplateprevents Watchdog from executing workflows that contain variables that must be set by the user and is removed automatically by the configure script. Afterwards as childs of
tasksthe tasks which should be executed must be defined
idand
nameattribute
parameterelement, values can be assigned to the parameters of the task, which have to be specified in the XSD file of the module. Flags are activated by using
flagNametrue
/flagName
flagName1
/flagName
paramName
value
/paramName(10)
Table 1: Attributes in the context of Watchdog
element | attribute | type | function |
---|---|---|---|
watchdog | watchdogBase | string | path to the install path of watchdog |
watchdog | isTemplate | boolean | prevents Watchdog from executing unconfigured workflow templates; default: false |
tasks | [mail] | string | mail which is used for notification; if not set, the content of the mails with be printed to the standard output stream; default: not set |
tasks | [projectName] | string | name of the complete process; default: not set |
?Task | [id] | integer | numeric id of the task; if not set all id's will be automatically generated; default: not set |
?Task | name | string | name of the task |
?Task | processBlock | string | processBlock as source of varying parameters (see 4.1) |
?Task | executor | string | execution environment on which the task is executed (see 4.3) |
?Task | environment | string | use globally defined environment variables (see 4.5) |
?Task | maxRunning | integer | maximal number of simultaneously running tasks; default: not restricted |
?Task | notify | enum | notification of the user via mail on success; enabled: release complete task at once when all subtasks are finished; subtask: release every subtask separately; default: disabled(see 4.6) |
?Task | checkpoint | enum | does not schedule tasks which depend on this task until manually released by the user; enabled: release complete task at once when all subtasks are finished; subtask: release every subtask separately; default: disabled |
?Task | confirmParam | enum | allows the user to modify the parameters before the task is scheduled; enabled: task will not be scheduled until the user checks the parameter; default: disabled |
settingselement before the
taskselement begins and are valid within the complete XML file:
processBlock
- process a task with varying parameters (see 4.1)executors
- define different executor environments (see 4.3)constants
- defines constants that substitute placeholders (see 4.4)environments
- define or update environment variables (see 4.5)modules
- define multiple module include directories (see 4.10)
parameterelement, the following elements are allowed in
?Taskelements:
environment
- define or update environment variables (see 4.5)dependencies
- define dependencies between tasks (see 4.2)streams
- define location of standard streams and set a working directory (see 4.7)checkers
- usage of custom success or error checkers (see 4.11)actions
- define task actions that are performed before or after tasks execution (see 4.8)
processBlockelement:
processSequence
- argument is numericprocessFolder
- argument is a path to a fileprocessInput
- multiple arguments obtained from dependenciesprocessTable
- multiple arguments stored in a tab-separated file with names of variables stored in the first line
processBlockattribute of a task is set the argument of the process folder or sequence is substituted at run time within
parameter,
streams,
checkers,
actionsand
environmentelements in the following manner:
processSequence
- []/{}/() -> numberprocessFolder
- {} -> absolute path to the fileprocessFolder
- () -> absolute path to the parent folder of the fileprocessFolder
- [] -> name of the fileprocessFolder
- [n
]/{n
} ->n
suffixes of the filename are truncated using . as separatorprocessFolder
- (n
) ->n
suffixes of the parent folder are truncated using / as separatorprocessFolder
- ([{n
,sep
}]) -> suffixes of the value are truncated usingsep
as separator (might also be a regex)processTable
- ([{$COL_NAME
}]) -> value stored in the column named$COL_NAME
processTable
- ([{$COL_NAME
,n
,sep
}]) -> value stored in the column named$COL_NAME
but with suffix truncation as described aboveprocessInput
- ([{$RET_NAME
}]) -> return value of a dependency with the name$RET_NAME
processInput
- ([{$RET_NAME
,n
,sep
}]) -> return value of a dependency with the name$RET_NAME
but with suffix truncation as described above
Example 4: Definition of different process blocks
1 <?xml version="1.0" encoding="UTF-8"?>
2 <watchdog xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="watchdog.xsd" watchdogBase="{%INSTALL%}" isTemplate="true">
3
4 <settings>
5 <!-- definition of different process blocks -->
6 <processBlock>
7 <processSequence name="qualities" start="1" end="9" step="2" />
8 <processFolder name="specialFiles" folder="{%EXAMPLE_DATA%}/spec/" pattern="*.spec" />
9 <baseFolder folder="{%EXAMPLE_DATA%}/">
10 <processFolder name="txtFiles" folder="txt/" pattern="*.txt" />
11 <processFolder name="txtFiles" folder="other_txt/" pattern="*.txt" append="true" maxDepth="1" />
12 <processFolder name="gzFiles" folder="txt_zipped/" pattern="*.gz" disableExistenceCheck="true" />
13 <processTable name="sleepTable" table="processTable.input.txt" />
14 </baseFolder>
15 </processBlock>
16 </settings>
17
18 <tasks mail="{%MAIL%}">
19 <!-- compress all files with *.txt ending in /some/base/folder/TXT -->
20 <gzipTask id="1" name="compress files" processBlock="txtFiles" checkpoint="enabled">
21 <parameter>
22 <input>{}</input>
23 <output>{%EXAMPLE_DATA%}/txt_zipped/[1].gz</output>
24 </parameter>
25 </gzipTask>
26
27 <!-- test quality values 1,3,5,7 and 9 -->
28 <gzipTask id="2" name="quality test" processBlock="qualities" checkpoint="subtask">
29 <dependencies>
30 <depends>1</depends>
31 </dependencies>
32 <parameter>
33 <input>{%EXAMPLE_DATA%}/txt/txtFile1.txt</input>
34 <output>{%EXAMPLE_DATA%}/qualityTest/txtFile1_q[].gz</output>
35 <quality>[]</quality>
36 </parameter>
37 <environment>
38 <var name="QUALITY">{}</var>
39 </environment>
40 </gzipTask>
41
42 <!-- sleep tasks which are created based on a process table -->
43 <sleepTask id="3" name="table sleep" processBlock="sleepTable">
44 <dependencies>
45 <depends>2</depends>
46 </dependencies>
47 <streams>
48 <stdout>{$OUT, 1}</stdout>
49 </streams>
50 <parameter>
51 <wait>{$DURATION}</wait>
52 </parameter>
53 <environment>
54 <var name="IMPORTANT_ID_RAW">[$IMPORTANT_ID]</var>
55 <var name="IMPORTANT_ID_CALC">$([$IMPORTANT_ID]*3)</var>
56 </environment>
57 </sleepTask>
58 </tasks>
59 </watchdog>
processSequencenamed
qualitiesis defined that creates the
processFolderis defined that will process all files stored in
{%EXAMPLE_DATA%}/specthat end
.spec(8).
patternattribute is the same as in bash. If a
processFolderis a child element of a
baseFolder, the
folderattribute of the
processFolderwill be prefixed with the
folderattribute of the
baseFolder(9-14).
disableExistenceCheckthat is enabled for the
processFolderwith the name
gzFilescauses Watchdog not to force the existence of the folder when it is started
The task with
id
1will compress all
.txtfiles in the folders
{%EXAMPLE_DATA%}/txtand
{%EXAMPLE_DATA%}/other_txtand store them in
{%EXAMPLE_DATA%}/txt_zipped(20-25).
id
2will compress a file with different quality
txtFile1_qas prefix and the used quality as suffix
{%EXAMPLE_DATA%}/qualityTest(34).
QUALITYis set which also contains the set
processTablecan be used as
Table 2: Attributes in the context of process blocks
element | attribute | type | function |
---|---|---|---|
?ProcessBlock | name | string | is used as reference in the processBlockattribute of a task |
?Task | processBlock | string | name of a ?ProcessBlockelement |
?ProcessBlock | [append] | boolean | if set to true, two or more process blocks of the same type can be merged; supported by processSequence and ProcessFolder; default: false |
processSequence | start | double | inclusive start of the numeric series |
processSequence | [step] | double | number that is added until the value is greater than end; default: 1 |
processSequence | end | double | break condition, might be inclusive |
processFolder | folder | integer | absolute or relative to a baseFolderpath to a folder |
processFolder | pattern | string | pattern selecting files that should be substituted; syntax as in bash |
processFolder | [ignore] | string | files matching that pattern will be ignored; syntax as in bash; default: not set |
processFolder | [disable | boolean | folder must not exist when Watchdog is started; default: false |
processFolder | [maxDepth] | integer | a positive integer will cause that maxDepthlevels of subdirectories are traversed while by default only the parent folder is processed; default: 0 |
baseFolder | folder | string | absolute path which is used as prefix before the path of the processFolderis added |
baseFolder | [maxDepth] | integer | see description of processFolder [maxDepth]; if both are set, the value of the processFolderelement is set; default: 0 |
processTable | table | string | path to a tab-separated file with header; the column names must consist out of [A-Za-z_] |
processTable | [disable | boolean | table file must not exist when Watchdog is started; default: false |
processTable | [compareName] | column | name that should be used to compare names of separate dependencies; default: complete line |
processInput | sep | string | separator which is used to join multiple values of global dependencies together; default: : |
processInput | [compareName] | string | name of return value that should be used to compare names of separate dependencies; default: name of precursor node |
dependselement that expects as value the id or name of an already defined task. The element must be a child of a
dependencieselement. Without any arguments the task will not be scheduled until all (sub)tasks of the dependencies have finished successfully. By setting the
separateargument to
truea subtask can depend only on the corresponding subtask the task depends on. This option is only meaningful if both tasks are process block tasks and work on the same input set or a transformed version of it.
Example 5: Definition of dependencies
1 <?xml version="1.0" encoding="UTF-8"?>
2 <watchdog xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="watchdog.xsd" watchdogBase="{%INSTALL%}" isTemplate="true">
3
4 <settings>
5 <!-- definition of two process folders -->
6 <processBlock>
7 <baseFolder folder="{%EXAMPLE_DATA%}/">
8 <processFolder name="txtFiles" folder="txt/" pattern="*.txt" />
9 <processFolder name="gzFiles" folder="txt_zipped/" pattern="*.gz" disableExistenceCheck="true" />
10 </baseFolder>
11 </processBlock>
12 </settings>
13
14 <tasks mail="{%MAIL%}">
15 <!-- definition a simple sleep task -->
16 <sleepTask id="1" name="sleep">
17 <parameter>
18 <wait>30s</wait>
19 </parameter>
20 </sleepTask>
21
22 <!-- compress all files with *.txt ending in /some/base/folder/TXT -->
23 <gzipTask id="2" name="compress" processBlock="txtFiles">
24 <parameter>
25 <input>{}</input>
26 <output>{%EXAMPLE_DATA%}/txt_zipped/[1].gz</output>
27 </parameter>
28 <!-- dependency definition -->
29 <dependencies>
30 <depends>1</depends>
31 </dependencies>
32 </gzipTask>
33
34 <!-- decompress all files with *.gz ending in /some/base/folder/TXT_ZIPPED -->
35 <gzipTask id="3" name="decompress" processBlock="gzFiles">
36 <parameter>
37 <input>{}</input>
38 <output>{%EXAMPLE_DATA%}/txt_decompressed/[1].txt</output>
39 <decompress>true</decompress>
40 </parameter>
41 <!-- dependency definition -->
42 <dependencies>
43 <depends separate="true" prefixName="[1]">2</depends>
44 </dependencies>
45 </gzipTask>
46 </tasks>
47 </watchdog>
compresstask with id
2is defined which depends on the before defined
sleep
separateattribute is set
true(43)
.txtending of the original filename was cropped and a
.gzending was added, only the first part of the filename is considered as specified in the
prefixNameattribute (26,43).
Table 3: Attributes in the context of dependencies
element | attribute | type | function |
---|---|---|---|
dependencies | parent of dependselements and child of ?Task | ||
depends | integer | already defined task id on which the task should depend on | |
depends | [separate] | boolean | if set to trueeach subtask depends only on its corresponding subtask; default: false |
depends | [keep4Slave] | boolean | if set to truea executor in slave mode will wait until all tasks with that id, which are running on that slave, are finished; only valid for separate dependencies; default: false |
depends | [prefixName] | [[0-9]*] | only meaningful if separateis set to true; defines in which manner the variables of the two process blocks must be equal to each other: []/[0]: complete variables of the subtasks are compared [ n]: it is checked if the variable of a subtask begins with the prefix of the finished subtask this task depends on; the first nparts are taken was prefix whereby '.' is used as separator; default: [] |
depends | [sep] | string | separator which is used together with prefixName; default: . |
executorselement. Possible environments:
local
- task is executed on the local hostremote
- task is executed on a remote host using sshcluster
- task is executed on a computer cluster
Example 6: Definition of different execution environments
1 <?xml version="1.0" encoding="UTF-8"?>
2 <watchdog xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="watchdog.xsd" watchdogBase="{%INSTALL%}" isTemplate="true">
3
4 <settings>
5 <!-- examples of different execution environments -->
6 <executors>
7 <local name="localhost" maxRunning="2" />
8 <cluster name="defaultCluster" default="true" memory="1G" queue="short.q" />
9 <cluster name="highPerformanceCluster" slots="4" memory="3G" maxRunning="4" queue="short.q" />
10 <remote name="superComputer" user="mustermann" host="superComputer" privateKey="/path/to/private/auth/key" port="22" disableStrictHostCheck="false" />
11 </executors>
12 </settings>
13
14 <tasks mail="{%MAIL%}">
15 <!-- execute this task on the localhost -->
16 <sleepTask id="1" name="sleep" executor="localhost">
17 <parameter>
18 <wait>30s</wait>
19 </parameter>
20 </sleepTask>
21 </tasks>
22 </watchdog>
executors
defaultClusteris used by default and runs on the
short.qqueue of the computer
maxRunningis set to four which means that a maximum of four tasks will run simultaneously on that execution environment. In line 10 an example for a remote executor is given which executes tasks via ssh using a host named
superComputer.
Afterwards the same sleep task is defined as in the first example and will run on the local executor (16). The other executors can be tested once you adapted them to your local infrastructure (see 2.4 and 2.5).
Table 4: Attributes in the context of execution environments
element | attribute | type | function |
---|---|---|---|
?Executor | name | string | is used as reference in the executor attribute of a task |
?Task | executor | string | name of a ?Executorelement |
?Executor | [environment] | string | environment with that name is used as default environment; default: not set |
?Executor | [default] | boolean | defines which execution environment is taken as default; default: false |
?Executor | [maxRunning] | integer | number of tasks that can run at the same time; default: not restricted |
?Executor | [workingDir] | string | working directory to which the executor switches before task execution; default: /usr/local/storage/ |
?Executor | [stickToHost] | boolean | activates slave mode for that executor which means that tasks that depend on each other are executed on the same execution host; default: false |
?Executor | [maxSlaveRunning] | integer | number of tasks that can run at the same time on a slave if stickToHost is enabled; default: 1 |
?Executor | [pathToJava] | string | path to java binary which is used for slave mode execution; default: /usr/bin/java |
remote | user | string | name of the user on the remote host system |
remote | host | string | name of the host which should be used for execution; multiple hostnames must be separated by ';' - in that case the maxRunning argument is applied on each host separately |
remote | privateKey | string | path the to private ssh auth key; should be protected by a passphrase! |
remote | [port] | integer | port which is used for the ssh connection; default: 22 |
remote | [disableStrictHostCheck] | boolean | disables the validation of the public key of the host; not recommended!; default: false |
cluster | [slots] | integer | number of cores which are reserved on the computer cluster; default: 1 |
cluster | [memory] | string | memory per slot suffixed with M (megabyte) or G (gigabyte); default: 3000M |
cluster | [queue] | string | queue on which the tasks should run on the computer cluster; default: not set |
cluster | [disableDefault] | boolean | default parameters (slots, memoryand queue) are ignored; default: false |
cluster | [customParameters] | string | additional parameters that are directly passed to the DRMAA system without further processing; default: not set |
constelements which must be a child of a
constantselement. The parent element itself must be a child of the
settingsenvironment. Every
constelement must own a unique name which is set with the
nameattribute. The value of the constant is stored between the opening and closing element tag.
${NAME_OF_CONSTANT}is substituted with the corresponding constant in every attribute or text content. Only the
watchdogBaseattribute of
watchdog, the
defaultattribute of
?Executorand the
idattribute of
?Taskand within
dependselements can not be substituted.
Currently, there is one pre-defined constant named
${TMP}which is substituted within
?Tasktags with the working directory of the executor that will execute the task.
Example 7: Definition and use of global constants
1 <?xml version="1.0" encoding="UTF-8"?>
2 <watchdog xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="watchdog.xsd" watchdogBase="{%INSTALL%}" isTemplate="true">
3
4 <settings>
5 <!-- definition of a constant named WAIT_TIME -->
6 <constants>
7 <const name="WAIT_TIME">30s</const>
8 <const name="FILE_NAME">sleep</const>
9 <const name="LOG_BASE">/tmp</const>
10 </constants>
11 </settings>
12
13 <tasks mail="{%MAIL%}">
14
15 <!-- definition a simple sleep task with constant replacement -->
16 <sleepTask id="1" name="sleep test">
17 <streams>
18 <stdout>${LOG_BASE}/${FILE_NAME}.out</stdout>
19 </streams>
20 <parameter>
21 <wait>${WAIT_TIME}</wait>
22 </parameter>
23 </sleepTask>
24 </tasks>
25 </watchdog>
${WAIT_TIME}is used as wait time in the sleep
Table 5: Attributes in the context of global constants
element | attribute | type | function |
---|---|---|---|
constants | parent of constelements and child of settings | ||
const | name | string | name of the variable that is replaced with ${name}in attributes and text content; only chars out of [A-Za-z_] are allowed as first character followed by [A-Za-z_0-9] in the name; apart from a few exceptions it is allowed everywhere |
const | string | replacement value |
varelement new variables can be defined or updated. The name of the variable must be defined with the
nameattribute while the value is stored between the opening and closing element tag. The parent element of each
varelement must be a
environmentelement which also owns a
nameattribute. This
nameattribute is used to link the environment with a task using the
environmentattribute all tasks possess. It is also possible to define environment variables locally within task definitions. If local and global variables with the same name are set, the local ones override the global variables.
The following environment variables are set by Watchdog by default:
- IS_WATCHDOG_JOB: if module was executed by Watchdog this value is set to
1
- WATCHDOG_CORES: number of reserved cores if task runs on a cluster environment
- WATCHDOG_MEMORY: number of total reserved memory in megabyte if task runs on a cluster environment
Example 8: Definition of environment variables
1 <?xml version="1.0" encoding="UTF-8"?>
2 <watchdog xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="watchdog.xsd" watchdogBase="{%INSTALL%}" isTemplate="true">
3
4 <settings>
5 <!-- definition of a environment -->
6 <environments>
7 <environment name="pathEnv">
8 <var name="PATH" update="true">~/software/bin</var>
9 </environment>
10 </environments>
11 </settings>
12
13 <!-- begin task block and use that mail to inform the user on success or failure -->
14 <tasks mail="{%MAIL%}">
15
16 <!-- definition of a simple sleep task using custom environment variables -->
17 <envTask id="1" name="env" environment="pathEnv">
18 <streams>
19 <stdout>/tmp/env.test</stdout>
20 </streams>
21
22 <!-- definition of a local environment with two variables -->
23 <environment>
24 <var name="SHELL">/bin/sh</var>
25 <var name="TEST" update="true" sep="@">separator test</var>
26 </environment>
27 </envTask>
28 </tasks>
29 </watchdog>
pathEnvis defined in which the variable
PATHis
~/software/binis added at the beginning of the
PATHvariable and after the default seperator character the previous value is kept. The
environmentattribute of the
envtask is set to the name of the previously defined
/bin/shwhile the second one updates a variable called
TESTusing an alternative separator.
Table 6: Attributes in the context of environment variables
element | attribute | type | function |
---|---|---|---|
environment | name | string | is used as reference in the environmentattribute of a task |
environment | [copyLocalValue] | boolean | copies all environment variables which are set on the host running Watchdog; set variables are not deleted on the remote system; bash functions which names are ending with () are not copied as this might cause problems; default: false |
environment | [useExternalExport] | boolean | uses a external command to set the variables; is necessary to update variables on remote or cluster executors and might also be necessary to set environment variables on remote hosts because of ssh security policies; default: true |
environment | [shebang] | string | shebang which is used for the script that first executes the export commands and afterwards the real commands; default: #!/bin/bash |
environment | [exportCommand] | string | custom command to set a environment variable; {$NAME}and {$VALUE}are substituted and must be part of the command; default: export {$NAME}="{$VALUE}" |
?Task | environment | string | name of a environmentelement |
var | string | value of the environment variable | |
var | name | string | name of the environment variable |
var | [update] | boolean | if truethe value is added at the beginning of the variable and the old values comes afterwards separated with the value stored in the sepattribute; default: false |
var | [sep] | string | separator which is used when the value of the variable should be updated; default: : |
var | [copyLocalValue] | boolean | copies the environment variables with the name namewhich is set on the host running Watchdog; default: false |
notifyattribute of tasks.
Example 9: Different mail notification options
1 <?xml version="1.0" encoding="UTF-8"?>
2 <watchdog xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="watchdog.xsd" watchdogBase="{%INSTALL%}" isTemplate="true">
3
4 <settings>
5 <!-- definition of different process blocks -->
6 <processBlock>
7 <processSequence name="sleepTime" start="5" end="15" step="5" />
8 </processBlock>
9 </settings>
10
11 <!-- begin task block and use that mail to inform the user on success or failure -->
12 <tasks mail="{%MAIL%}">
13 <!-- definition a simple sleep task -->
14 <sleepTask id="1" name="sleep simple" notify="enabled">
15 <parameter>
16 <wait>10s</wait>
17 </parameter>
18 </sleepTask>
19
20 <!-- definition of process sequence sleep tasks -->
21 <sleepTask id="2" name="sleep process sequence" notify="subtask" processBlock="sleepTime">
22 <parameter>
23 <wait>[]s</wait>
24 </parameter>
25 </sleepTask>
26 </tasks>
27 </watchdog>
notifyattribute is set to
enabled(14)
tasks
sleepTimeand causes Watchdog to inform the user as soon as a subtask is finished because the
notifyattribute is set to
subtask(7, 21)
Table 7: Attributes in the context of mail notification
element | attribute | type | function |
---|---|---|---|
tasks | string | mail adress which is used for notification | |
?Task | notify | enum | enabled: inform when complete task was executed subtask: inform when a subtask was executed disabled: notification only in case of an error |
?Task | processBlock | string | reference to a process block when notifyis set to subtask |
stdoutor
stderrelement. It is also possible to use a file as input via the
stdinelement. Additionally, a working directory can be set by using the
workingDirelement. When this is done also relative path for
stdout,
stderrand
stdinare allowed. Other than usual, the elements must occour in the same order as they are listed in the following:
workingDir,
stdout,
stderrand
stdin(but each of them is optional)
Example 10: Definition of standard streams and working directory
1 <?xml version="1.0" encoding="UTF-8"?>
2 <watchdog xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="watchdog.xsd" watchdogBase="{%INSTALL%}" isTemplate="true">
3
4 <!-- begin task block and use that mail to inform the user on success or failure -->
5 <tasks mail="{%MAIL%}">
6
7 <!-- definition a simple sleep task -->
8 <sleepTask id="1" name="sleep">
9 <parameter>
10 <wait>30s</wait>
11 </parameter>
12 <!-- definition of a standard output location and switch of the working directory -->
13 <streams>
14 <workingDir>/tmp/</workingDir>
15 <stdout>{%EXAMPLE_DATA%}/sleepTest.out</stdout>
16 <stderr append="true">sleepTest.err</stderr>
17 </streams>
18 </sleepTask>
19 </tasks>
20 </watchdog>
/tmp(14).
sleeptask is written to the file
{%EXAMPLE_DATA%}/sleepTest.out(15).
/tmp/sleepTest.err(16).
Table 8: Attributes in the context of standard streams_and_working_directory
element | attribute | type | function |
---|---|---|---|
streams | boolean | saves used resources to false | |
workingDir | string | sets a custom working directory before the tool is executed; default: /usr/local/storage | |
stdout | string | writes standard output stream into file; default not saved | |
stderr | string | writes standard error stream into file; default not saved | |
stdin | string | file is used as standard input; default: not set | |
stdin | [disableExistenceCheck] | boolean | file must not exist when Watchdog is started; default: false |
stdout/ stderr | [append] | boolean | appends the stream at the end of the file; default: false |
Moreover, files stored on remote files systems can be up- or downloaded by Watchdog. By default, virtual file systems based on the protocols File, HTTP, HTTPS, FTP, FTPS and SFTP as well as the main memory (RAM) are supported. These virtual file systems are provided by the Commons Virtual File System project of the Apache Software Foundation. Examples for valid URIs of these file systems can be can be found here. However, any file system with an implementation of the
FileProvidercan also be included by the user as described in 6.1.
Task actions are defined in an
actionstag as child of
?Task. Slave mode is automatically activated if a task action is used. Currently six different IO operations are implemented:
createFile
- creates an empty filecreateFolder
- create an empty foldercopyFile
- copies a filecopyFolder
- copies a folder (with content)deleteFile
- deletes a filedeleteFolder
- deletes a folder (with content)
timeattribute each
actionstag owns. The following arguments are available:
beforeTask
- before the task is executedafterTask
- after the task is executedonSuccess
- when the task was successfully executedonFailure
- when task execution failedbeforeTerminate
- before Watchdog or a slave terminates itself
Example 11: Definition of task actions
1 <?xml version="1.0" encoding="UTF-8"?>
2 <watchdog xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="watchdog.xsd" watchdogBase="{%INSTALL%}" isTemplate="true">
3
4 <tasks mail="{%MAIL%}">
5 <gzipTask id="1" name="gzip task">
6 <parameter>
7 <!-- path to a file that does not exist yet -->
8 <input>/tmp/watchdog_file_to_compress.tmp</input>
9 </parameter>
10 <!-- action that copies a to the input location -->
11 <actions time="beforeTask">
12 <copyFile file="{%INSTALL%}examples/example_task_actions.xml" destination="/tmp/watchdog_file_to_compress.tmp" override="true" />
13 </actions>
14 </gzipTask>
15 </tasks>
16 </watchdog>
/tmp/watchdog_file_to_compress.tmpis compressed using gzip (5-14). Before the compress task is executed, the task action defined within the
actionstag is executed because the
timeattribute is set to
beforeTask(11-13). The task action copies the file stored in
{%INSTALL%}/examples/example_task_actions.xmlto
/tmp/watchdog_file_to_compress.tmp(12).
Table 9: Attributes in the context of actions
element | attribute | type | function |
---|---|---|---|
actions | time | enum | defines when the task action block is executed; beforeTask: before the task is executed; afterTask: after the task is executed; onSuccess: when the task was successfully executed; onFailure: when task execution failed; beforeTerminate: before Watchdog or a slave terminates itself |
actions | [uncoupleFromExecutor] | boolean | if enabled, task actions are executed on the host running Watchdog instead of the execution host; default: false |
createFile | file | string | path to the file that should be created |
createFile | [override] | boolean | defines if an existing file should be overwritten; default: false |
createFile | [createParent] | boolean | defines if the parent directories should be created if nonexistent; default: true |
createFolder | folder | string | path to the folder that should be created and will be empty if action succeeds |
createFolder | [override] | boolean | defines if an existing folder should be deleted; default: false |
createFolder | [createParent] | boolean | defines if the parent directories should be created if nonexistent; default: true |
copyFile | file | string | path to the file that should be copied |
copyFile | destination | string | path to the destination of the new file |
copyFile | [override] | boolean | defines if an existing file should be overwritten; default: false |
copyFile | [deleteSource] | boolean | deletes the source file after the copy operation; default: false |
copyFile | [createParent] | boolean | defines if the parent directories should be created if nonexistent; default: true |
copyFolder | folder | string | path to the folder that should be copied |
copyFolder | destination | string | path to the destination folder |
copyFolder | [pattern] | string | pattern selecting files that should be copied in that folder; syntax as in bash |
copyFolder | [override] | boolean | defines if an existing folder should be deleted; default: false |
copyFolder | [deleteSource] | boolean | deletes the source folder after the copy operation; default: false |
copyFolder | [createParent] | boolean | defines if the parent directories should be created if nonexistent; default: true |
deleteFile | file | string | path to the file that should be deleted |
deleteFolder | folder | string | path to the folder that should be deleted |
deleteFolder | [pattern] | string | pattern selecting files that should be deleted in that folder; syntax as in bash |
?Taskelement simple calculations can be preformed using the
$(expr)construct whereby
exprmust be a numerical equation. The following operators are supported: +, -, *, /, ^, ² and ³. Additional the brackets
()are provided. Moreover in case of a
processSequence
iis replaced by the current value of the process sequence. In the more general case of a
processBlock
xis substituted by an increasing number starting at
1. The result of all calculations is rounded to five decimal places or converted to an integer if it is one.
Example 12: Definition of simple calculations
1 <?xml version="1.0" encoding="UTF-8"?>
2 <watchdog xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="watchdog.xsd" watchdogBase="{%INSTALL%}" isTemplate="true">
3
4 <settings>
5 <!-- definition of two process blocks -->
6 <processBlock>
7 <processSequence name="sleepTime" start="1" end="5" step="1.5" />
8 <processFolder name="txtFiles" folder="{%EXAMPLE_DATA%}/txt/" pattern="*.txt" />
9 </processBlock>
10 </settings>
11
12 <tasks mail="{%MAIL%}">
13 <!-- sleep task with a simple calculation -->
14 <sleepTask id="1" name="sleep" processBlock="sleepTime">
15 <parameter>
16 <wait>$((i+1)^2-1)s</wait>
17 </parameter>
18 </sleepTask>
19
20 <!-- compress txt files and write log files to ()/log/* -->
21 <gzipTask id="2" name="quality test" processBlock="txtFiles">
22 <streams>
23 <stdout>()/log/$(x).out</stdout>
24 </streams>
25 <parameter>
26 <input>{}</input>
27 <output>{}.gz</output>
28 <quality>3</quality>
29 </parameter>
30 </gzipTask>
31 </tasks>
32 </watchdog>
$(expr)construct is shown. The wait time for the sleep task is calculated based on the input numbers of the
processSequence(7, 16).
modules/stored in the installation directory of Watchdog. By using the
moduleselement as child of
settings, additional folders can be added.
Example 13: Definition of multiple module include folders
1 <?xml version="1.0" encoding="UTF-8"?>
2 <watchdog xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="watchdog.xsd" watchdogBase="{%INSTALL%}" isTemplate="true">
3
4 <settings>
5 <!-- TODO: modify one of these folders {%INSTALL%}myCustomFolder/ or -->
6 <!-- /home/TODO/additionalModules/ to match a folder that contains the 'sleep' module -->
7 <modules defaultFolder="myCustomFolder/">
8 <folder>/home/TODO/additionalModules/</folder>
9 </modules>
10 </settings>
11
12 <!-- begin task block and use that mail to inform the user on success or failure -->
13 <tasks mail="{%MAIL%}">
14 <!-- definition a simple sleep task -->
15 <sleepTask id="1" name="sleep">
16 <parameter>
17 <wait>30s</wait>
18 </parameter>
19 </sleepTask>
20 </tasks>
21 </watchdog>
myCustomFolder/with the
defaultFolder
{%INSTALL%}/myCustomFolder/and
/home/additionalModules/. In order to test that example you must create a new folder, copy the
sleepmodule from Watchdog's module folder and adapt the path in line 7 or 8 to match that folder.
Table 10: Attributes in the context of module include_folders
element | attribute | type | function |
---|---|---|---|
modules | [defaultFolder] | string | changes the default search folder; an absolute or relative path to Watchdog's install dir is allowed modules/ |
folder | string | adds a new directory to that is used for localization of modules |
Interfaces for checkers are stored in the package de.lmu.ifi.bio.watchdog.interfaces. Basically a function returning a boolean value that indicates whether the task succeeded or failed must be implemented. The constructor must accept as first argument a object of the type Task that contains information about the task that was finished. Additional arguments of type Boolean, Integer, Double or String can be passed via the XML definition.
Example 14: Load custom checkers
1 <?xml version="1.0" encoding="UTF-8"?>
2 <watchdog xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="watchdog.xsd" watchdogBase="{%INSTALL%}" isTemplate="true">
3
4 <!-- begin task block and use that mail to inform the user on success or failure -->
5 <tasks mail="{%MAIL%}">
6
7 <!-- definition a simple sleep task -->
8 <sleepTask id="1" name="sleep">
9 <parameter>
10 <wait>30s</wait>
11 </parameter>
12 <checkers>
13 <!-- load a success checker with one additional constructor argument -->
14 <!-- it will check, if the file {%INSTALL%}examples/mail_config exists and is not empty -->
15 <checker classPath="{%EXAMPLE_DATA%}/OutputFileExistsSuccessChecker.class" className="de.lmu.ifi.bio.watchdog.successChecker.OutputFileExistsSuccessChecker" type="success">
16 <cArg type="string">{%INSTALL%}examples/mail_config</cArg>
17 </checker>
18 </checkers>
19 </sleepTask>
20 </tasks>
21 </watchdog>
Example 14 shows an example of how an success checker can be added to a task by using the
checkerselement as a child of
?Task(12-18). In addition to the location of the compiled Java class and the full class name arguments can be passed to the constructor of the class. In this example one variable of type
stringis passed to the constructor of the success checker using the
cArgelement (16). Once the task is finished, the checkers are evaluated in the same order as they were added in the XML workflow. In cases in which simultaneously success and error were detected, the task will be treated as failed. In this example the success checker will ensure that the file
{%INSTALL%}/examples/mail_config existsand is not empty (15-16)
Table 11: Attributes in the context of custom checkers
element | attribute | type | function |
---|---|---|---|
checker | type | enum | type of the checker; success: checker should be used as success checker; error: checker is used as error checker |
checker | className | string | complete class name including the package the class is located in |
checker | classPath | string | absolute path to the compiled java class file |
cArg | type | enum | type to which the argument should be parsed in java; possible values: boolean, integer, double and string |
nameOfModuleare explained. Basic XSD skills are needed to understand how things work together. Modules are defined in XSD format and should have the basic structure showed in example 15. To actually create modules the script
helper_scripts/createNewModule.shcan be used and modified by hand as not all settings can be configured by it.
Example 15: Basic XSD structure
1 <?xml version="1.0" encoding="UTF-8" ?>
2 <x:schema xmlns:x="http://www.w3.org/2001/XMLSchema" xmlns:vc="http://www.w3.org/2007/XMLSchema-versioning" vc:minVersion="1.1" xmlns:xerces="http://xerces.apache.org">
3
4 ...
5
6 </x:schema>
nameOfModulemust be created in the
modulesfolder. That folder must contain a file named
nameOfModule.xsdwhich will hold the actual module definition. 5.1 Input parameter definition Example 16 shows how input parameters and flags can be defined.
Example 16: Input parameter
1 <!-- definition of the task parameters -->
2 <x:complexType name="nameOfModuleTaskParameterType">
3 <x:all>
4 <x:element name="parameter1" type="paramAbsoluteFilePath" minOccurs="1" maxOccurs="1" />
5 <x:element name="parameter2" type="paramString" minOccurs="0" maxOccurs="unbounded" />
6 <x:element name="flag1" type="paramBoolean" minOccurs="0" maxOccurs="1" />
7 </x:all>
8 </x:complexType>
minOccursand
maxOccursattributes it can be specified how often a parameter can be used. Also parameters can have different types which are enforced during the validation of the XML workflows. Some pre-defined types are
- paramBoolean (for flags)
- paramString
- paramInteger
- paramDouble
Example 17: Input parameter
1 <!-- module specific parameter types -->
2 <x:complexType name="paramWait_sleep">
3 <x:simpleContent>
4 <x:restriction base="paramString">
5 <x:assertion test="matches($value, '(${[A-Za-z_]+})|($(.+))|([[({]($[A-Za-z_]+(,s*){0,1}){0,1}([0-9]+(,S*){0,1}){0,1}[])}])') or matches($value, '^[0-9]+[smhd]{0,1}$')" xerces:message="Parameter with name '{$tag}' must match [0-9]+[smhd]{0,1}." />
6 </x:restriction>
7 </x:simpleContent>
8 </x:complexType>
returnFilePathParameterparameter. If you want to change this default parameter name see 5.3.
Example 18: Output parameter
1 <!-- define output parameters which must be written to a file -->
2 <x:complexType name="nameOfModuleTaskReturnType">
3 <x:complexContent>
4 <x:extension base="taskReturnType">
5 <x:all>
6 <x:element name="outputParam1" type="x:string" />
7 </x:all>
8 </x:extension>
9 </x:complexContent>
10 </x:complexType>
outputParam1of type
x:string. The module itself must ensure that the parameters are written physically before the module exits or otherwise Watchdog will terminate itself. In case of a bash script which is executed, two functions named
writeParam2Fileand
blockUntilFileIsWrittendefined in
core_lib/functions.shcan be used. 5.3 Binary call command and other settings Now the command that will be executed can be defined. Example 19 specifies that a script named
nameOfModule.shthat is stored in
modules/nameOfModulewill be called.
Example 19: Binary call command
1 <!-- set command and other settings -->
2 <x:complexType name="nameOfModuleTaskOverrideType">
3 <x:complexContent>
4 <x:restriction base="baseAttributeTaskType">
5 <x:attribute name="binName" type="x:string" fixed="nameOfModule.sh" />
6 </x:restriction>
7 </x:complexContent>
8 </x:complexType>
binName
: name of the command which should be calledpreBinCommand
: command that is added before the binName; e.g. interpreterisWatchdogModule
: by default the command must be located in modules/binName
; if this parameter is false the command must point to a absolute binary or be reachable via the PATH environment variablereturnFilePathParameter
: name of the parameter that is used to store the return valuesparamFormat
: defines how names of parameters are prefixed; (do not print parameter name, - or --); default:--
spacingFormat
: defines how names of parameters and values are spaced; default:blank
quoteFormat
: defines how values are quoted; default:single quoting
separateFormat
: defines the separator string between multiple occurrences of the same parameter; default:,
nameof type
nameOfModuleTypeand
substitutionGroupset to
abstractTask. Example 20 shows the needed line for the example module. Afterwards the type of the task is defined (5-15). If no output parameters are used line 10 can be omitted.
Example 20: Assign a name to the module
1 <!-- make task definition availible via substitution group -->
2 <x:element name="nameOfModuleTask" type="nameOfModuleType" substitutionGroup="abstractTask" />
3
4 <!-- definition of final task -->
5 <x:complexType name="nameOfModuleTaskType">
6 <x:complexContent>
7 <x:extension base="nameOfModuleTaskOverrideType">
8 <x:all>
9 <x:element name="parameter" type="nameOfModuleTaskParameterType" minOccurs="1" maxOccurs="1" />
10 <x:element name="return" type="nameOfModuleTaskReturnType" minOccurs="0" maxOccurs="0" />
11 <x:group ref="defaultTaskElements" />
12 </x:all>
13 </x:extension>
14 </x:complexContent>
15 </x:complexType>
Example 21: Putting it all together
1 <?xml version="1.0" encoding="UTF-8" ?>
2 <x:schema xmlns:x="http://www.w3.org/2001/XMLSchema" xmlns:vc="http://www.w3.org/2007/XMLSchema-versioning" vc:minVersion="1.1" xmlns:xerces="http://xerces.apache.org">
3
4 <!-- definition of the task parameters -->
5 <x:complexType name="nameOfModuleTaskParameterType">
6 <x:all>
7 <x:element name="parameter1" type="paramAbsoluteFilePath" minOccurs="1" maxOccurs="1" />
8 <x:element name="parameter2" type="paramString" minOccurs="0" maxOccurs="unbounded" />
9 <x:element name="flag1" type="paramBoolean" minOccurs="0" maxOccurs="1" />
10 </x:all>
11 </x:complexType>
12
13 <!-- define output parameters which must be written to a file -->
14 <x:complexType name="nameOfModuleTaskReturnType">
15 <x:complexContent>
16 <x:extension base="taskReturnType">
17 <x:all>
18 <x:element name="outputParam1" type="x:string" />
19 </x:all>
20 </x:extension>
21 </x:complexContent>
22 </x:complexType>
23
24 <!-- set command and other settings -->
25 <x:complexType name="nameOfModuleTaskOverrideType">
26 <x:complexContent>
27 <x:restriction base="baseAttributeTaskType">
28 <x:attribute name="binName" type="x:string" fixed="nameOfModule.sh" />
29 </x:restriction>
30 </x:complexContent>
31 </x:complexType>
32
33 <!-- make task definition availible via substitution group -->
34 <x:element name="nameOfModuleTask" type="nameOfModuleTaskType" substitutionGroup="abstractTask" />
35
36 <!-- definition of final task -->
37 <x:complexType name="nameOfModuleTaskType">
38 <x:complexContent>
39 <x:extension base="nameOfModuleTaskOverrideType">
40 <x:all>
41 <x:element name="parameter" type="nameOfModuleTaskParameterType" minOccurs="1" maxOccurs="1" />
42 <x:element name="return" type="nameOfModuleTaskReturnType" minOccurs="0" maxOccurs="0" />
43 <x:group ref="defaultTaskElements" />
44 </x:all>
45 </x:extension>
46 </x:complexContent>
47 </x:complexType>
48
49 </x:schema>
core_lib/exitCodes.shcontains some exit codes which names are also included in mail notifications if they are used. Custom exit codes can be easily added.
Error messages: Watchdog can detect by default error messages in standard out and standard error streams if they begin with
[ERROR]. The errors are only stored if standard out and error files are saved to disk using the
streamstag. If an error was detected but the exit code was 0 the command will also fail.
Module test: A script named
test_nameOfModule.shcan also be part of the module. It is automatically called, if the user calls
helper_scripts/moduleTest.sh. Also the module folder might contain some test data in the folder
test_data. For simple test cases the bash function
testExitCodecan be used to test, if an input leads to the expected output. 6 Extend Watchdog's functionality In the following sections two different ways to extend Watchdog's functionality are described.
- Virtual File Systems that can be used within task actions (see 6.1)
- XML Plugins that add new
?Executor
and?ProcessBlock
elements (see 6.2)
In order to add a new virtual file system, a class that implements the
VFSRegisterinterface can be addded to the jar-file. The class will be automatically loaded by Watchdog and the new virtual file system will be useable without other modifications. The following four methods must be implemented for the interface:
getFileProvider
- must return an instance of theFileProvider
interface as defined in the Commons Virtual File System projectgetURLSchemes
- returns the url schemes that should be used in combination with thatFileProvider
(e.g.ftp
)getMimeTypes
- sets schemes that should be used for specific mimetypesgetExtensions
- sets schemes that should be used for specific file extension
SimpleVFSRegistercan be extended if an instance of the
FileProviderclass can be created without arguments. Then only the name of the
FileProviderclass and the URL schemes that should be used must be defined. Example 22 shows how the virtual FTP file system is integrated in Watchdog by using the
FtpsFileProviderclass of the Commons Virtual File System project.
Example 22: Simple implementation of the VFSRegister interface
1 package de.lmu.ifi.bio.watchdog.task.actions.vfs.impl;
2
3 public class FTPSVFSRegister extends SimpleVFSRegister {
4
5 private static final String CLASS_NAME = "org.apache.commons.vfs2.provider.ftps.FtpsFileProvider";
6 private static final String[] SCHEME = new String[]{"ftps"};
7
8 public FTPSVFSRegister() throws Exception {
9 super(CLASS_NAME, SCHEME);
10 }
11 }
- create an XSD file describing the new element and its parent element for use in Watchdog workflows
- Extend a few abstract classes
- Add class files for the new classes to the Watchdog jar-file and copy the new XSD file to a sub-directory of the Watchdog installation directory
XMLParserPluginabstract class are loaded dynamically during workflow execution. Currently, this is restricted to XML parsers for the generic type
ProcessBlockor
ExecutorInfo. The XML parser for a new XML element provides the functionality to parse this element in a workflow (i.e. a new executor or process block type) and to create a new object representing the corresponding element type. Here, the four most important functions of the
XMLParserPluginabstract class that have to be implemented are:
getNameOfParseableTag
: returns the name of the element the class can parsegetNameOfParentTag
: returns the name of the parent element of this elementgetXSDDefinition
: returns the path to the XSD file describing this element (relative to the xsd sub-directory of the Watchdog directory)parseElement
: implements the actual parsing process.
XMLDataStoreand
XMLPlugin, for instance by extending one of the abstract classes
ProcessBlockor
ExecutorInfoor one of their subclasses.
For use in the Workflow designer GUI of Watchdog, two additional requirements have to be met:
- An FXML file has to be provided describing how the attributes of the new element type are represented graphically. FXML is an XML-based markup language for describing the layout of a user interface in a JavaFX application.
- Classes extending
PluginView
andPluginViewController
have to be implemented for testing whether the input is valid and for loading and saving data to and from XML.
de.lmu.ifi.bio.watchdog.xmlParser.pluginsof the Java source code. 7 Docker In order to run a Docker image, Docker must be installed and configured correctly as descibed here. 7.1 Install the Watchdog Docker image A Watchdog image for Docker can be obtained from hub.docker.com. The image is rebuild automatically by the Bioconda project once a new version is released on Bioconda.
You can download the latest version of the image with
docker pull klugem/watchdog-wms. Within the Docker image the environment variable
WATCHDOG_HOMEis set automatically to the installation directory of Watchdog (required for the
watchdogBaseattribute). The
-useEnvBaseflag of the command line version can be used to override the
watchdogBaseattribute of the XML workflow with the value stored in
WATCHDOG_HOME. Moreover, the installation directory of Watchdog is mounted under
/watchdogwithin the Docker image. 7.2 Sharing of files In order to exchange files with the host system, the -v or -mount option of Docker can be used. These option can be used multiple times.
docker run -v source_folder_or_file_on_host:destination_folder_or_file[:ro] image command
More information can be found in the documentation of Docker. 7.3 Port forwarding In order to use the build-in webserver of Watchdog, the port used by the webserver must be forwarded to the host running the Docker container.
The command
docker run -p 8090:8080 image commandmaps the container port 8080 (TCP) to the port 8090 (TCP) on the Docker host. More information can be found in the documentation of Docker. 7.4 How to use the Docker Watchdog image The examples within the Docker image are automatically configured when {%Nwatchdog-cmd%N} is started the first time and are stored in
/watchdog/examples. The command
docker run -h localhost -p 8080:8080 klugem/watchdog-wms watchdog-cmd -useEnvBase -x /watchdog/examples/example_basic_sleep.xml
executes the example described in 3.2 and forwards the webserver port to the host port 8080.
Alternatively, it is possible to run a workflow that is stored on the host system as described in 7.2. Ensure that all files used in the workflow are made accessible within the Docker image. 7.5 Use Docker in modules A Docker image can also be used in a module. The module
bowtie2Dockerimplements an example module that uses the Docker image of Bowtie 2 that is provided by Bioconda and hosted on quay.io. The Docker image will be automatically downloaded if it is not found locally.
Make sure that the Docker daemon is installed and running before you test this example.
Example 23: Example usage of the Bowtie 2 Docker module
1 <?xml version="1.0" encoding="UTF-8"?>
2 <watchdog xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="watchdog.xsd" watchdogBase="{%INSTALL%}" isTemplate="true">
3
4 <settings>
5 <constants>
6 <const name="BASE">{%INSTALL%}/modules/bowtie2Docker/example_data</const>
7 </constants>
8 </settings>
9
10 <tasks mail="{%MAIL%}">
11 <bowtie2DockerTask id="1" name="bowtie2_in_docker">
12 <streams>
13 <stdout>/tmp/bowtie2.docker.test.out</stdout>
14 <stderr>/tmp/bowtie2.docker.test.err</stderr>
15 </streams>
16 <parameter>
17 <genome>${BASE}/index/lambda_virus</genome>
18 <reads>${BASE}/reads/reads_1.fq</reads>
19 <reads>${BASE}/reads/reads_1.fq</reads>
20 <outfile>/tmp/bowtie2.docker.test.sam</outfile>
21 </parameter>
22 </bowtie2DockerTask>
23 </tasks>
24 </watchdog>
bowtie2Dockermodule can be used with the provided example data. The test data that is shipped with Bowtie 2 is stored in the folder
example_dataof the module (6). Log files are written to
/tmp/bowtie2.docker.test.[out|err](13, 14) while the mapped reads are stored in SAM format in
/tmp/bowtie2.docker.test.sam(20).