ruffus
Installation
The easy way
The most up-to-date code:
Prequisites
Installing easy_install
Installing pip
Graphical flowcharts The most up-to-date code:
Ruffus
Manual: List of Chapters and Example code
Chapter 1
: An introduction to basic
Ruffus
syntax
Overview
Importing
Ruffus
Ruffus
decorators
Your first
Ruffus
pipeline
Chapter 2
: Transforming data in a pipeline with
@transform
Review
Task functions as recipes
@transform
is a 1 to 1 operation
Input
and
Output
parameters
Chapter 3
: More on
@transform
-ing data
Review
Running pipelines in parallel
Up-to-date jobs are not re-run unnecessarily
Defining pipeline tasks out of order
Multiple dependencies
@follows
Making directories automatically with
@follows
and
mkdir
Globs in the
Input
parameter
Mixing Tasks and Globs in the
Input
parameter
Chapter 4
: Creating files with
@originate
Simplifying our example with
@originate
Chapter 5
: Understanding how your pipeline works with
pipeline_printout(...)
Printing out which jobs will be run
Determining which jobs are out-of-date or not
Verbosity levels
Abbreviating long file paths with
verbose_abbreviated_path
Getting a list of all tasks in a pipeline
Chapter 6
: Running
Ruffus
from the command line with ruffus.cmdline
Template for argparse
Command Line Arguments
1) Logging
2) Tracing pipeline progress
3) Printing a flowchart
4) Running in parallel on multiple processors
5) Setup checkpointing so that
Ruffus
knows which files are out of date
6) Skipping specified options
7) Specifying verbosity and abbreviating long paths
8) Displaying the version
Template for optparse
Chapter 7
: Displaying the pipeline visually with
pipeline_printout_graph(...)
Printing out a flowchart of our pipeline
Command line options made easier with
ruffus.cmdline
Horribly complicated pipelines!
Circular dependency errors in pipelines!
@graphviz
: Customising the appearance of each task
Chapter 8
: Specifying output file names with
formatter()
and
regex()
Review
A different file name
suffix()
for each pipeline stage
formatter()
manipulates pathnames and regular expression
regex()
manipulates via regular expressions
Chapter 9
: Preparing directories for output with
@mkdir()
Overview
Creating directories after string substitution in a zoo...
Chapter 10
: Checkpointing: Interrupted Pipelines and Exceptions
Overview
Interrupting tasks
Checkpointing: only log completed jobs
Do not share the same checkpoint file across for multiple pipelines!
Setting checkpoint file names
Useful checkpoint file name policies
DEFAULT_RUFFUS_HISTORY_FILE
Regenerating the checkpoint file
Rules for determining if files are up to date
Missing files generate exceptions
Caveats: Coarse Timestamp resolution
Flag files: Checkpointing for the paranoid
Chapter 11
: Pipeline topologies and a compendium of
Ruffus
decorators
Overview
@transform
A bestiary of
Ruffus
decorators
@originate
@merge
@split
@subdivide
@collate
Combinatorics
@product
@combinations
@combinations_with_replacement
@permutations
Chapter 12
: Splitting up large tasks / files with
@split
Overview
Example: Calculate variance for a large list of numbers in parallel
Output files for
@split
Be careful in specifying
Output
globs
Clean up previous pipeline runs
1 to many
Nothing to many
Chapter 13
:
@merge
multiple input into a single result
Overview of
@merge
@merge
is a many to one operator
Example: Combining partial solutions: Calculating variances
Chapter 14
: Multiprocessing,
drmaa
and Computation Clusters
Overview
Restricting parallelism with
@jobs_limit
Using
drmaa
to dispatch work to Computational Clusters or Grid engines from Ruffus jobs
Forcing a pipeline to appear up to date
Chapter 15
: Logging progress through a pipeline
Overview
Logging task/job completion
Use
ruffus.cmdline
Customising logging
Log your own messages
Chapter 16
:
@subdivide
tasks to run efficiently and regroup with
@collate
Overview
@subdivide
in parallel
Grouping using
@collate
Chapter 17
:
@combinations
,
@permutations
and all versus all
@product
Overview
Generating output with
formatter()
All vs all comparisons with
@product
Permute all k-tuple orderings of inputs without repeats using
@permutations
Select unordered k-tuples within inputs excluding repeated elements using
@combinations
Select unordered k-tuples within inputs
including
repeated elements with
@combinations_with_replacement
Chapter 18
: Turning parts of the pipeline on and off at runtime with
@active_if
Overview
@active_if
controls the state of tasks
Chapter 19
: Signal the completion of each stage of our pipeline with
@posttask
Overview
Chapter 20
: Manipulating task inputs via string substitution using
inputs()
and
add_inputs()
Overview
Adding additional
input
prerequisites per job with
add_inputs()
Replacing all input parameters with
inputs()
Chapter 21
: Esoteric: Generating parameters on the fly with
@files
Overview
@files
syntax
A Cartesian Product, all vs all example
Chapter 22
: Esoteric: Running jobs in parallel without files using
@parallel
@parallel
Chapter 23
: Esoteric: Writing custom functions to decide which jobs are up to date with
@check_if_uptodate
@check_if_uptodate
: Manual dependency checking
Appendix 1
: Flow Chart Colours with
pipeline_printout_graph(...)
Flowchart colours
Appendix 2
: How dependency is checked
Overview
Appendix 3
: Exceptions thrown inside pipelines
Overview
Pipelines running in parallel accumulate Exceptions
Terminate pipeline immediately upon Exceptions
Display exceptions as they occur
Appendix 4
: Names exported from Ruffus
Ruffus Names
Appendix 5
:
@files
: Deprecated syntax
Overview
@files
Running the same code on different parameters in parallel
Appendix 6
:
@files_re
: Deprecated
syntax using regular expressions
Overview
Chapter 1
: Python Code for An introduction to basic Ruffus syntax
Your first Ruffus script
Resulting Output
Chapter 1
: Python Code for Transforming data in a pipeline with
@transform
Your first Ruffus script
Resulting Output
Chapter 3
: Python Code for More on
@transform
-ing data
Producing several items / files per job
Defining tasks function out of order
Multiple dependencies
Multiple dependencies after @follows
Chapter 4
: Python Code for Creating files with
@originate
Using
@originate
Resulting Output
Chapter 5
: Python Code for Understanding how your pipeline works with
pipeline_printout(...)
Display the initial state of the pipeline
Normal Output
High Verbosity Output
Display the partially up-to-date pipeline
Chapter 7
: Python Code for Displaying the pipeline visually with
pipeline_printout_graph(...)
Code
Resulting Flowcharts
Chapter 8
: Python Code for Specifying output file names with
formatter()
and
regex()
Example Code for
suffix()
Example Code for
formatter()
Example Code for
formatter()
with replacements in
extra
arguments
Example Code for
formatter()
in Zoos
Example Code for
regex()
in zoos
Chapter 9
: Python Code for Preparing directories for output with
@mkdir()
Code for
formatter()
Zoo example
Code for
regex()
Zoo example
Chapter 10
: Python Code for Checkpointing: Interrupted Pipelines and Exceptions
Code for the “Interrupting tasks” example
Chapter 12
: Python Code for Splitting up large tasks / files with
@split
Splitting large jobs
Resulting Output
Chapter 13
: Python Code for
@merge
multiple input into a single result
Splitting large jobs
Resulting Output
Chapter 14
: Python Code for Multiprocessing,
drmaa
and Computation Clusters
@jobs_limit
Using
ruffus.drmaa_wrapper
Chapter 15
: Python Code for Logging progress through a pipeline
Rotating set of file logs
Chapter 16
: Python Code for
@subdivide
tasks to run efficiently and regroup with
@collate
@subdivide
and regroup with
@collate
example
Chapter 17
: Python Code for
@combinations
,
@permutations
and all versus all
@product
Example code for
@product
Example code for
@permutations
Example code for
@combinations
Example code for
@combinations_with_replacement
Chapter 20
: Python Code for Manipulating task inputs via string substitution using
inputs()
and
add_inputs()
Example code for adding additional
input
prerequisites per job with
add_inputs()
Example code for replacing all input parameters with
inputs()
Chapter 21
: Esoteric: Python Code for Generating parameters on the fly with
@files
Introduction
Code
Resulting Output
Appendix 1
: Python code for Flow Chart Colours with
pipeline_printout_graph(...)
Code
Cheat Sheet
1. Annotate functions with
Ruffus
decorators
2. Print dependency graph if necessary
3. Run the pipeline
Pipeline functions
pipeline_run
pipeline_printout
pipeline_printout_graph
pipeline_get_task_names
drmaa functions
run_job
Installation
The easy way
The most up-to-date code:
Prequisites
Installing easy_install
Installing pip
Graphical flowcharts The most up-to-date code:
Design & Architecture
GNU Make
Scons
,
Rake
and other
Make
alternatives
Managing pipelines stage-by-stage using
Ruffus
Alternatives to
Ruffus
Major Features added to Ruffus
version 2.6
version 2.5
version 2.4.1
version 2.4
version 2.3
version 2.2
version 2.1.1
version 2.1.0
version 2.0.10
version 2.0.9
version 2.0.8
version 2.0.2
version 2.0
version 1.1.4
version 1.0.7
version 1.0
Fixed Bugs
New Object orientated syntax for Ruffus in Version 2.6
Syntax
Advantages
Compatability
Class methods
Call chaining
Referring to Tasks
Worked Example for New Object orientated syntax for Ruffus in Version 2.6
Worked example
Python Code for: New Object orientated syntax for Ruffus in Version 2.6
Where I see Ruffus going
In up coming release:
Todo: document
output_from()
Todo: document new syntax
Todo: Log the progress through the pipeline in a machine parsable format
Todo: either_or: Prevent failed jobs from propagating further
Todo: (bug fix) pipeline_printout_graph should print inactive tasks
Todo: Mark input strings as non-file names, and add support for dynamically returned parameters
Future Changes to Ruffus
Todo: Replacements for formatter(), suffix(), regex()
Todo: Allow “extra” parameters to be used in output substitution
Todo: Extra signalling before and after each task and job
Todo:
@split
/
@subdivide
returns the actual output created
Todo: New decorators
Todo: Bioinformatics example to end all examples
Todo: Allow the next task to start before all jobs in the previous task have finished
Todo: Allow checkpoint files to be moved
Todo: Remove intermediate files
Planned Improvements to Ruffus
Planned: Running python code (task functions) transparently on remote cluster nodes
Planned: Custom parameter generator
Planned: Ruffus GUI interface.
Planned: Non-decorator / Function interface to Ruffus
Planned: @retry_on_error(NUM_OF_RETRIES)
Planned: Clean up
Implementation Tips
Items remaining for current release
Release
blogger
dbdict.py
how to write new decorators
Implementation notes
Ctrl-C
handling
Python3 compatability
Refactoring: parameter handling
formatter
@product()
@permutations(...),
@combinations(...),
@combinations_with_replacement(...)
drmaa alternatives
Task completion monitoring
@mkdir(...),
Parameter handling
Add Object Orientated interface
FAQ
Citations
Good practices
General
Windows
Sun Grid Engine / PBS / SLURM etc
Sharing python objects between Ruffus processes running concurrently
Glossary
Hall of Fame: User contributed flowcharts
RNASeq pipeline
non-coding evolutionary constraints
SNP annotation
Chip-Seq analysis
Why
Ruffus
?
Construction of a simple pipeline to run BLAST jobs
Overview
Prerequisites
Code
Step 1. Splitting up the query sequences
Step 2. Run BLAST jobs in parallel
Step 3. Combining BLAST results
Step 4. Running the pipeline
Step 5. Testing dependencies
What is next?
Part 2: A slightly more practical pipeline to run blasts jobs
Overview
Step 1. Cleaning up any leftover junk from previous pipeline runs
Step 2. Adding a “flag” file to mark successful completion
Step 3. Allowing the script to be invoked on the command line
Step 4. Printing out a flowchart for the pipeline
Step 5. Errors
Step 6. Will it run?
Ruffus code
Ruffus code
Example code for
FAQ
Good
practices:
"What
is
the
best
way
of
handling
data
in
file
pairs
(or
triplets
etc.)?"
Ruffus Decorators
Core
Combinatorics
Advanced
Esoteric!
Indicator Objects
formatter
suffix
regex
add_inputs
inputs
mkdir
touch_file
output_from
combine
@originate
(
output
, [
extras
,...] )
@split (
input
,
output
, [
extras
,...] )
@transform(
input
,
filter
,
output
, [
extras
,...] )
@merge (
input
,
output
, [
extras
,...] )
@subdivide
@subdivide
(
input
,
regex
(
matching_regex
)
|
formatter
(
matching_formatter
)
, [
inputs
(
input_pattern_or_glob
)
|
add_inputs
(
input_pattern_or_glob
)
],
output
, [
extras
,...] )
@transform(
input
,
filter
,
replace_inputs
|
add_inputs
,
output
, [
extras
,...] )
@collate(
input
,
filter
,
output
, [
extras
,...] )
@collate(
input
,
filter
,
replace_inputs
|
add_inputs
,
output
, [
extras
,...] )
@graphviz
@graphviz
(
graphviz_parameters
,...] )
@mkdir(
input
,
filter
,
output
)
@jobs_limit
@jobs_limit
(
maximum_num_of_jobs
, [
name
])
@posttask
@posttask
(
function
|
touch_file
(
file_name
)
)
@active_if
@active_if
(on_or_off1, [on_or_off2,...])
@follows
@follows
(
task
|
“task_name”
|
mkdir
(
directory_name
), [more_tasks, ...])
@product(
input
,
filter
, [
input2
,
filter2
, ...],
output
, [
extras
,...] )
@permutations(
input
,
filter
,
tuple_size
,
output
, [
extras
,...] )
@combinations(
input
,
filter
,
tuple_size
,
output
, [
extras
,...] )
@combinations_with_replacement(
input
,
filter
,
tuple_size
,
output
, [
extras
,...] )
Generating parameters on the fly for @files
@files
(
custom_function
)
@check_if_uptodate
@check_if_uptodate
(
dependency_checking_function
)
@parallel
@parallel
( [ [
job_params
, ...], [
job_params
, ...]...] |
parameter_generating_function
)
@files
@files
(
input1
,
output1
, [
extra_parameters1
, ...])
@files
(
((
input
,
output
, [
extra_parameters
,...]
), (...), ...)
)
@files_re
@files_re
(
tasks_or_file_names
,
matching_regex
, [
input_pattern
],
output_pattern
, [
extra_parameters
,...])
ruffus.Task
Decorators
Pipeline functions
Logging
Implementation:
Exceptions and Errors
ruffus.proxy_logger
Create proxy for logging for use with multiprocessing
Proxies for a log:
Create a logging object
ruffus
Docs
»
Edit on GitHub
Index
Symbols
|
A
|
B
|
C
|
D
|
E
|
F
|
G
|
I
|
J
|
L
|
M
|
N
|
O
|
P
|
R
|
S
|
T
|
U
Symbols
@active_if
Syntax
Tutorial
@check_if_uptodate
Syntax
@collate
Syntax
Tutorial
@collate (Advanced Usage)
Syntax
@collate, add_inputs(...)
Syntax
@collate, inputs(...)
Syntax
@combinations
Syntax
@combinations_with_replacement
Syntax
@files
Manual
Syntax
Tutorial on-the-fly parameter generation
check if up to date
in parallel
@files (on-the-fly parameter generation)
Syntax
@files_re
Syntax
combine (Deprecated Syntax)
@follow
imposing order with
@follows
Syntax
mkdir (Manual)
mkdir (Syntax)
@graphviz
Syntax
@jobs_limit
Syntax
Tutorial
@merge
Syntax
@mkdir
Syntax
@originate
Syntax
@parallel
Syntax
Tutorial
@permutations
Syntax
@posttask
Syntax
touch_file (Syntax)
touchfile (Manual)
@product
Syntax
@split
Syntax
@subdivide
Syntax
Tutorial
@transform
Syntax
multiple dependencies
@transform, add_inputs(...)
Syntax
@transform, inputs(...)
Syntax
A
Acknowledgements
add_inputs
Indicator Object (Adding additional input parameters)
Tutorial
args_param_factory() (in module ruffus.task)
B
break
C
check if up to date
@files
check_if_uptodate
Tutorial
Checking dependencies
Tutorial
collate_param_factory() (in module ruffus.task)
combinatorics
Tutorial
combine
@follows (Deprecated Syntax)
Manual
command line
Tutorial
Comparison of Ruffus with alternatives
Design
D
data sharing across processes
Tutorial
decorator
decorators_compendium
Tutorial
defining tasks out of order
output_from
deprecated @files
Tutorial
deprecated @files_re
Tutorial
Design
Comparison of Ruffus with alternatives
Ruffus
drmaa
run_job
E
errors
Etymology
Ruffus
Exception
Missing input files
Exceptions
Tutorial
exceptions
Tutorial
F
files_param_factory() (in module ruffus.task)
flag files
Manual
flowchart colours
Tutorial
,
[1]
for rerunning jobs
rules
formatter
Indicator Object (Disambiguating parameters)
Tutorial
G
generator
globs
inputs parameters
globs in input parameters
Tutorial
I
importing ruffus
imposing order with
@follow
in parallel
@files
Indicator Object (Adding additional input parameters)
add_inputs
Indicator Object (Disambiguating parameters)
combine
formatter
mkdir
output_from
regex
suffix
touch_file
Indicator Object (Replacing input parameters)
inputs
input / output parameters
Tutorial
inputs
Indicator Object (Replacing input parameters)
Tutorial
inputs parameters
globs
Interrupted Pipeline
Tutorial
interrupting tasks
Tutorial
interrupts
J
job
job_wrapper_generic() (in module ruffus.task)
job_wrapper_io_files() (in module ruffus.task)
job_wrapper_mkdir() (in module ruffus.task)
L
logging
Tutorial
logging customising
Tutorial
logging with ruffus.cmdline
Tutorial
logging your own message
Tutorial
M
make_shared_logger_and_proxy() (in module ruffus.proxy_logger)
Manual
@files
Timestamp resolution
combine
flag files
merge
Tutorial
merge_param_factory() (in module ruffus.task)
Missing input files
Exception
Mixing tasks, globs and file names
Tutorial
mkdir
@follows (Manual)
@follows (Syntax)
Tutorial
multiple dependencies
@transform
multiple errors
multiprocessing
Tutorial
N
Name origins
Ruffus
needs_update_check_directory_missing() (in module ruffus.task)
needs_update_check_modify_time() (in module ruffus.task)
O
on_the_fly
Tutorial
one to one @transform
Tutorial
originate
Tutorial
output file names
Tutorial
output_from
Indicator Object (Disambiguating parameters)
defining tasks out of order
referring to functions before they are defined
overview
Tutorial
P
pipeline functions
pipeline_get_task_names
pipeline_printout_graph
pipeline_run
,
[1]
pipeline_get_task_names
print list of task names without running the pipeline
pipeline_printout
Printout simulated run of the pipeline
Tutorial
pipeline_printout() (in module ruffus.task)
pipeline_printout_graph
Tutorial
print flowchart representation of pipeline functions
pipeline_printout_graph() (in module ruffus.task)
pipeline_run
Run pipeline
Tutorial
pipeline_run touch mode
Tutorial
pipeline_run verbosity
Tutorial
pipeline_run() (in module ruffus.task)
pipeline_run(multiprocess)
Tutorial
posttask
Tutorial
print flowchart representation of pipeline functions
pipeline_printout_graph
print list of task names without running the pipeline
pipeline_get_task_names
Printout simulated run of the pipeline
pipeline_printout
R
referring to functions before they are defined
output_from
Regenerating the checkpoint file
Tutorial
regex
Indicator Object (Disambiguating parameters)
Tutorial
Ruffus
Design
Etymology
Name origins
Ruffus names list
Tutorial
ruffus.proxy_logger (module)
rules
for rerunning jobs
Run drmaa
run_job
Run pipeline
pipeline_run
run_job
Run drmaa
S
setup_std_shared_logger() (in module ruffus.proxy_logger)
signalling
split
Tutorial
split_param_factory() (in module ruffus.task)
string substiution for inputs
Tutorial
suffix
Indicator Object (Disambiguating parameters)
Tutorial
Syntax
@active_if
@check_if_uptodate
@collate
@collate (Advanced Usage)
@collate, add_inputs(...)
@collate, inputs(...)
@combinations
@combinations_with_replacement
@files
@files (on-the-fly parameter generation)
@files_re
@follows
@graphviz
@jobs_limit
@merge
@mkdir
@originate
@parallel
@permutations
@posttask
@product
@split
@subdivide
@transform
@transform, add_inputs(...)
@transform, inputs(...)
T
t_black_hole_logger (class in ruffus.task)
t_stderr_logger (class in ruffus.task)
task
Task completion
Tutorial
Timestamp resolution
Manual
touch mode pipeline_run
Tutorial
touch_file
@posttask (Syntax)
touchfile
@posttask (Manual)
transform
Tutorial
transform_param_factory() (in module ruffus.task)
transforming in parallel
Tutorial
Tutorial
@active_if
@collate
@jobs_limit
@parallel
@subdivide
Checking dependencies
Exceptions
Interrupted Pipeline
Mixing tasks, globs and file names
Regenerating the checkpoint file
Ruffus names list
Task completion
Up to date
add_inputs
check_if_uptodate
combinatorics
command line
data sharing across processes
decorators_compendium
deprecated @files
deprecated @files_re
exceptions
flowchart colours
,
[1]
formatter
globs in input parameters
input / output parameters
inputs
interrupting tasks
logging
logging customising
logging with ruffus.cmdline
logging your own message
merge
mkdir
multiprocessing
on_the_fly
one to one @transform
originate
output file names
overview
pipeline_printout
pipeline_printout_graph
pipeline_run
pipeline_run touch mode
pipeline_run verbosity
pipeline_run(multiprocess)
posttask
regex
split
string substiution for inputs
suffix
touch mode pipeline_run
transform
transforming in parallel
Tutorial on-the-fly parameter generation
@files
U
Up to date
Tutorial
Read the Docs
v: latest
Versions
latest
Downloads
pdf
htmlzip
epub
On Read the Docs
Project Home
Builds
Free document hosting provided by
Read the Docs
.