ruffus
Installation
The easy way
The most up-to-date code:
Prequisites
Installing easy_install
Installing pip
Graphical flowcharts The most up-to-date code:
Ruffus
Manual: List of Chapters and Example code
Chapter 1
: An introduction to basic
Ruffus
syntax
Overview
Importing
Ruffus
Ruffus
decorators
Your first
Ruffus
pipeline
Chapter 2
: Transforming data in a pipeline with
@transform
Review
Task functions as recipes
@transform
is a 1 to 1 operation
Input
and
Output
parameters
Chapter 3
: More on
@transform
-ing data
Review
Running pipelines in parallel
Up-to-date jobs are not re-run unnecessarily
Defining pipeline tasks out of order
Multiple dependencies
@follows
Making directories automatically with
@follows
and
mkdir
Globs in the
Input
parameter
Mixing Tasks and Globs in the
Input
parameter
Chapter 4
: Creating files with
@originate
Simplifying our example with
@originate
Chapter 5
: Understanding how your pipeline works with
pipeline_printout(...)
Printing out which jobs will be run
Determining which jobs are out-of-date or not
Verbosity levels
Abbreviating long file paths with
verbose_abbreviated_path
Getting a list of all tasks in a pipeline
Chapter 6
: Running
Ruffus
from the command line with ruffus.cmdline
Template for argparse
Command Line Arguments
1) Logging
2) Tracing pipeline progress
3) Printing a flowchart
4) Running in parallel on multiple processors
5) Setup checkpointing so that
Ruffus
knows which files are out of date
6) Skipping specified options
7) Specifying verbosity and abbreviating long paths
8) Displaying the version
Template for optparse
Chapter 7
: Displaying the pipeline visually with
pipeline_printout_graph(...)
Printing out a flowchart of our pipeline
Command line options made easier with
ruffus.cmdline
Horribly complicated pipelines!
Circular dependency errors in pipelines!
@graphviz
: Customising the appearance of each task
Chapter 8
: Specifying output file names with
formatter()
and
regex()
Review
A different file name
suffix()
for each pipeline stage
formatter()
manipulates pathnames and regular expression
regex()
manipulates via regular expressions
Chapter 9
: Preparing directories for output with
@mkdir()
Overview
Creating directories after string substitution in a zoo...
Chapter 10
: Checkpointing: Interrupted Pipelines and Exceptions
Overview
Interrupting tasks
Checkpointing: only log completed jobs
Do not share the same checkpoint file across for multiple pipelines!
Setting checkpoint file names
Useful checkpoint file name policies
DEFAULT_RUFFUS_HISTORY_FILE
Regenerating the checkpoint file
Rules for determining if files are up to date
Missing files generate exceptions
Caveats: Coarse Timestamp resolution
Flag files: Checkpointing for the paranoid
Chapter 11
: Pipeline topologies and a compendium of
Ruffus
decorators
Overview
@transform
A bestiary of
Ruffus
decorators
@originate
@merge
@split
@subdivide
@collate
Combinatorics
@product
@combinations
@combinations_with_replacement
@permutations
Chapter 12
: Splitting up large tasks / files with
@split
Overview
Example: Calculate variance for a large list of numbers in parallel
Output files for
@split
Be careful in specifying
Output
globs
Clean up previous pipeline runs
1 to many
Nothing to many
Chapter 13
:
@merge
multiple input into a single result
Overview of
@merge
@merge
is a many to one operator
Example: Combining partial solutions: Calculating variances
Chapter 14
: Multiprocessing,
drmaa
and Computation Clusters
Overview
Restricting parallelism with
@jobs_limit
Using
drmaa
to dispatch work to Computational Clusters or Grid engines from Ruffus jobs
Forcing a pipeline to appear up to date
Chapter 15
: Logging progress through a pipeline
Overview
Logging task/job completion
Use
ruffus.cmdline
Customising logging
Log your own messages
Chapter 16
:
@subdivide
tasks to run efficiently and regroup with
@collate
Overview
@subdivide
in parallel
Grouping using
@collate
Chapter 17
:
@combinations
,
@permutations
and all versus all
@product
Overview
Generating output with
formatter()
All vs all comparisons with
@product
Permute all k-tuple orderings of inputs without repeats using
@permutations
Select unordered k-tuples within inputs excluding repeated elements using
@combinations
Select unordered k-tuples within inputs
including
repeated elements with
@combinations_with_replacement
Chapter 18
: Turning parts of the pipeline on and off at runtime with
@active_if
Overview
@active_if
controls the state of tasks
Chapter 19
: Signal the completion of each stage of our pipeline with
@posttask
Overview
Chapter 20
: Manipulating task inputs via string substitution using
inputs()
and
add_inputs()
Overview
Adding additional
input
prerequisites per job with
add_inputs()
Replacing all input parameters with
inputs()
Chapter 21
: Esoteric: Generating parameters on the fly with
@files
Overview
@files
syntax
A Cartesian Product, all vs all example
Chapter 22
: Esoteric: Running jobs in parallel without files using
@parallel
@parallel
Chapter 23
: Esoteric: Writing custom functions to decide which jobs are up to date with
@check_if_uptodate
@check_if_uptodate
: Manual dependency checking
Appendix 1
: Flow Chart Colours with
pipeline_printout_graph(...)
Flowchart colours
Appendix 2
: How dependency is checked
Overview
Appendix 3
: Exceptions thrown inside pipelines
Overview
Pipelines running in parallel accumulate Exceptions
Terminate pipeline immediately upon Exceptions
Display exceptions as they occur
Appendix 4
: Names exported from Ruffus
Ruffus Names
Appendix 5
:
@files
: Deprecated syntax
Overview
@files
Running the same code on different parameters in parallel
Appendix 6
:
@files_re
: Deprecated
syntax using regular expressions
Overview
Chapter 1
: Python Code for An introduction to basic Ruffus syntax
Your first Ruffus script
Resulting Output
Chapter 1
: Python Code for Transforming data in a pipeline with
@transform
Your first Ruffus script
Resulting Output
Chapter 3
: Python Code for More on
@transform
-ing data
Producing several items / files per job
Defining tasks function out of order
Multiple dependencies
Multiple dependencies after @follows
Chapter 4
: Python Code for Creating files with
@originate
Using
@originate
Resulting Output
Chapter 5
: Python Code for Understanding how your pipeline works with
pipeline_printout(...)
Display the initial state of the pipeline
Normal Output
High Verbosity Output
Display the partially up-to-date pipeline
Chapter 7
: Python Code for Displaying the pipeline visually with
pipeline_printout_graph(...)
Code
Resulting Flowcharts
Chapter 8
: Python Code for Specifying output file names with
formatter()
and
regex()
Example Code for
suffix()
Example Code for
formatter()
Example Code for
formatter()
with replacements in
extra
arguments
Example Code for
formatter()
in Zoos
Example Code for
regex()
in zoos
Chapter 9
: Python Code for Preparing directories for output with
@mkdir()
Code for
formatter()
Zoo example
Code for
regex()
Zoo example
Chapter 10
: Python Code for Checkpointing: Interrupted Pipelines and Exceptions
Code for the “Interrupting tasks” example
Chapter 12
: Python Code for Splitting up large tasks / files with
@split
Splitting large jobs
Resulting Output
Chapter 13
: Python Code for
@merge
multiple input into a single result
Splitting large jobs
Resulting Output
Chapter 14
: Python Code for Multiprocessing,
drmaa
and Computation Clusters
@jobs_limit
Using
ruffus.drmaa_wrapper
Chapter 15
: Python Code for Logging progress through a pipeline
Rotating set of file logs
Chapter 16
: Python Code for
@subdivide
tasks to run efficiently and regroup with
@collate
@subdivide
and regroup with
@collate
example
Chapter 17
: Python Code for
@combinations
,
@permutations
and all versus all
@product
Example code for
@product
Example code for
@permutations
Example code for
@combinations
Example code for
@combinations_with_replacement
Chapter 20
: Python Code for Manipulating task inputs via string substitution using
inputs()
and
add_inputs()
Example code for adding additional
input
prerequisites per job with
add_inputs()
Example code for replacing all input parameters with
inputs()
Chapter 21
: Esoteric: Python Code for Generating parameters on the fly with
@files
Introduction
Code
Resulting Output
Appendix 1
: Python code for Flow Chart Colours with
pipeline_printout_graph(...)
Code
Cheat Sheet
1. Annotate functions with
Ruffus
decorators
2. Print dependency graph if necessary
3. Run the pipeline
Pipeline functions
pipeline_run
pipeline_printout
pipeline_printout_graph
pipeline_get_task_names
drmaa functions
run_job
Installation
The easy way
The most up-to-date code:
Prequisites
Installing easy_install
Installing pip
Graphical flowcharts The most up-to-date code:
Design & Architecture
GNU Make
Scons
,
Rake
and other
Make
alternatives
Managing pipelines stage-by-stage using
Ruffus
Alternatives to
Ruffus
Major Features added to Ruffus
version 2.6
version 2.5
version 2.4.1
version 2.4
version 2.3
version 2.2
version 2.1.1
version 2.1.0
version 2.0.10
version 2.0.9
version 2.0.8
version 2.0.2
version 2.0
version 1.1.4
version 1.0.7
version 1.0
Fixed Bugs
New Object orientated syntax for Ruffus in Version 2.6
Syntax
Advantages
Compatability
Class methods
Call chaining
Referring to Tasks
Worked Example for New Object orientated syntax for Ruffus in Version 2.6
Worked example
Python Code for: New Object orientated syntax for Ruffus in Version 2.6
Where I see Ruffus going
In up coming release:
Todo: document
output_from()
Todo: document new syntax
Todo: Log the progress through the pipeline in a machine parsable format
Todo: either_or: Prevent failed jobs from propagating further
Todo: (bug fix) pipeline_printout_graph should print inactive tasks
Todo: Mark input strings as non-file names, and add support for dynamically returned parameters
Future Changes to Ruffus
Todo: Replacements for formatter(), suffix(), regex()
Todo: Allow “extra” parameters to be used in output substitution
Todo: Extra signalling before and after each task and job
Todo:
@split
/
@subdivide
returns the actual output created
Todo: New decorators
Todo: Bioinformatics example to end all examples
Todo: Allow the next task to start before all jobs in the previous task have finished
Todo: Allow checkpoint files to be moved
Todo: Remove intermediate files
Planned Improvements to Ruffus
Planned: Running python code (task functions) transparently on remote cluster nodes
Planned: Custom parameter generator
Planned: Ruffus GUI interface.
Planned: Non-decorator / Function interface to Ruffus
Planned: @retry_on_error(NUM_OF_RETRIES)
Planned: Clean up
Implementation Tips
Items remaining for current release
Release
blogger
dbdict.py
how to write new decorators
Implementation notes
Ctrl-C
handling
Python3 compatability
Refactoring: parameter handling
formatter
@product()
@permutations(...),
@combinations(...),
@combinations_with_replacement(...)
drmaa alternatives
Task completion monitoring
@mkdir(...),
Parameter handling
Add Object Orientated interface
FAQ
Citations
Good practices
General
Windows
Sun Grid Engine / PBS / SLURM etc
Sharing python objects between Ruffus processes running concurrently
Glossary
Hall of Fame: User contributed flowcharts
RNASeq pipeline
non-coding evolutionary constraints
SNP annotation
Chip-Seq analysis
Why
Ruffus
?
Construction of a simple pipeline to run BLAST jobs
Overview
Prerequisites
Code
Step 1. Splitting up the query sequences
Step 2. Run BLAST jobs in parallel
Step 3. Combining BLAST results
Step 4. Running the pipeline
Step 5. Testing dependencies
What is next?
Part 2: A slightly more practical pipeline to run blasts jobs
Overview
Step 1. Cleaning up any leftover junk from previous pipeline runs
Step 2. Adding a “flag” file to mark successful completion
Step 3. Allowing the script to be invoked on the command line
Step 4. Printing out a flowchart for the pipeline
Step 5. Errors
Step 6. Will it run?
Ruffus code
Ruffus code
Example code for
FAQ
Good
practices:
"What
is
the
best
way
of
handling
data
in
file
pairs
(or
triplets
etc.)?"
Ruffus Decorators
Core
Combinatorics
Advanced
Esoteric!
Indicator Objects
formatter
suffix
regex
add_inputs
inputs
mkdir
touch_file
output_from
combine
@originate
(
output
, [
extras
,...] )
@split (
input
,
output
, [
extras
,...] )
@transform(
input
,
filter
,
output
, [
extras
,...] )
@merge (
input
,
output
, [
extras
,...] )
@subdivide
@subdivide
(
input
,
regex
(
matching_regex
)
|
formatter
(
matching_formatter
)
, [
inputs
(
input_pattern_or_glob
)
|
add_inputs
(
input_pattern_or_glob
)
],
output
, [
extras
,...] )
@transform(
input
,
filter
,
replace_inputs
|
add_inputs
,
output
, [
extras
,...] )
@collate(
input
,
filter
,
output
, [
extras
,...] )
@collate(
input
,
filter
,
replace_inputs
|
add_inputs
,
output
, [
extras
,...] )
@graphviz
@graphviz
(
graphviz_parameters
,...] )
@mkdir(
input
,
filter
,
output
)
@jobs_limit
@jobs_limit
(
maximum_num_of_jobs
, [
name
])
@posttask
@posttask
(
function
|
touch_file
(
file_name
)
)
@active_if
@active_if
(on_or_off1, [on_or_off2,...])
@follows
@follows
(
task
|
“task_name”
|
mkdir
(
directory_name
), [more_tasks, ...])
@product(
input
,
filter
, [
input2
,
filter2
, ...],
output
, [
extras
,...] )
@permutations(
input
,
filter
,
tuple_size
,
output
, [
extras
,...] )
@combinations(
input
,
filter
,
tuple_size
,
output
, [
extras
,...] )
@combinations_with_replacement(
input
,
filter
,
tuple_size
,
output
, [
extras
,...] )
Generating parameters on the fly for @files
@files
(
custom_function
)
@check_if_uptodate
@check_if_uptodate
(
dependency_checking_function
)
@parallel
@parallel
( [ [
job_params
, ...], [
job_params
, ...]...] |
parameter_generating_function
)
@files
@files
(
input1
,
output1
, [
extra_parameters1
, ...])
@files
(
((
input
,
output
, [
extra_parameters
,...]
), (...), ...)
)
@files_re
@files_re
(
tasks_or_file_names
,
matching_regex
, [
input_pattern
],
output_pattern
, [
extra_parameters
,...])
ruffus.Task
Decorators
Pipeline functions
Logging
Implementation:
Exceptions and Errors
ruffus.proxy_logger
Create proxy for logging for use with multiprocessing
Proxies for a log:
Create a logging object
ruffus
Docs
»
Overview: module code
Edit on GitHub
All modules for which code is available
ruffus.proxy_logger
ruffus.task
Read the Docs
v: latest
Versions
latest
Downloads
pdf
htmlzip
epub
On Read the Docs
Project Home
Builds
Free document hosting provided by
Read the Docs
.