Ruffus documentation¶

Start Here:¶

Installation
- The easy way
- The most up-to-date code:
- Prequisites
- Installing easy_install
- Installing pip
- Graphical flowcharts The most up-to-date code:
Ruffus Manual: List of Chapters and Example code
Chapter 1: An introduction to basic Ruffus syntax
- Overview
- Importing Ruffus
- Ruffus decorators
- Your first Ruffus pipeline
Chapter 2: Transforming data in a pipeline with @transform
- Review
- Task functions as recipes
- @transform is a 1 to 1 operation
- Input and Output parameters
Chapter 3: More on @transform-ing data
- Review
- Running pipelines in parallel
- Up-to-date jobs are not re-run unnecessarily
- Defining pipeline tasks out of order
- Multiple dependencies
- @follows
- Making directories automatically with @follows and mkdir
- Globs in the Input parameter
- Mixing Tasks and Globs in the Input parameter
Chapter 4: Creating files with @originate
- Simplifying our example with @originate
Chapter 5: Understanding how your pipeline works with pipeline_printout(...)
- Printing out which jobs will be run
- Determining which jobs are out-of-date or not
- Verbosity levels
- Abbreviating long file paths with verbose_abbreviated_path
- Getting a list of all tasks in a pipeline
Chapter 6: Running Ruffus from the command line with ruffus.cmdline
- Template for argparse
- Command Line Arguments
- 1) Logging
- 2) Tracing pipeline progress
- 3) Printing a flowchart
- 4) Running in parallel on multiple processors
- 5) Setup checkpointing so that Ruffus knows which files are out of date
- 6) Skipping specified options
- 7) Specifying verbosity and abbreviating long paths
- 8) Displaying the version
- Template for optparse
Chapter 7: Displaying the pipeline visually with pipeline_printout_graph(...)
- Printing out a flowchart of our pipeline
- Command line options made easier with ruffus.cmdline
- Horribly complicated pipelines!
- Circular dependency errors in pipelines!
- @graphviz: Customising the appearance of each task
Chapter 8: Specifying output file names with formatter() and regex()
- Review
- A different file name suffix() for each pipeline stage
- formatter() manipulates pathnames and regular expression
- regex() manipulates via regular expressions
Chapter 9: Preparing directories for output with @mkdir()
- Overview
- Creating directories after string substitution in a zoo...
Chapter 10: Checkpointing: Interrupted Pipelines and Exceptions
- Overview
- Interrupting tasks
- Checkpointing: only log completed jobs
- Do not share the same checkpoint file across for multiple pipelines!
- Setting checkpoint file names
- Useful checkpoint file name policies DEFAULT_RUFFUS_HISTORY_FILE
- Regenerating the checkpoint file
- Rules for determining if files are up to date
- Missing files generate exceptions
- Caveats: Coarse Timestamp resolution
- Flag files: Checkpointing for the paranoid
Chapter 11: Pipeline topologies and a compendium of Ruffus decorators
- Overview
- @transform
- A bestiary of Ruffus decorators
- @originate
- @merge
- @split
- @subdivide
- @collate
- Combinatorics
- @product
- @combinations
- @combinations_with_replacement
- @permutations
Chapter 12: Splitting up large tasks / files with @split
- Overview
- Example: Calculate variance for a large list of numbers in parallel
- Output files for @split
- Be careful in specifying Output globs
- Clean up previous pipeline runs
- 1 to many
- Nothing to many
Chapter 13: @merge multiple input into a single result
- Overview of @merge
- @merge is a many to one operator
- Example: Combining partial solutions: Calculating variances
Chapter 14: Multiprocessing, drmaa and Computation Clusters
- Overview
- Restricting parallelism with @jobs_limit
- Using drmaa to dispatch work to Computational Clusters or Grid engines from Ruffus jobs
- Forcing a pipeline to appear up to date
Chapter 15: Logging progress through a pipeline
- Overview
- Logging task/job completion
- Use ruffus.cmdline
- Customising logging
- Log your own messages
Chapter 16: @subdivide tasks to run efficiently and regroup with @collate
- Overview
- @subdivide in parallel
- Grouping using @collate
Chapter 17: @combinations, @permutations and all versus all @product
- Overview
- Generating output with formatter()
- All vs all comparisons with @product
- Permute all k-tuple orderings of inputs without repeats using @permutations
- Select unordered k-tuples within inputs excluding repeated elements using @combinations
- Select unordered k-tuples within inputs including repeated elements with @combinations_with_replacement
Chapter 18: Turning parts of the pipeline on and off at runtime with @active_if
- Overview
- @active_if controls the state of tasks
Chapter 19: Signal the completion of each stage of our pipeline with @posttask
- Overview
Chapter 20: Manipulating task inputs via string substitution using inputs() and add_inputs()
- Overview
- Adding additional input prerequisites per job with add_inputs()
- Replacing all input parameters with inputs()
Chapter 21: Esoteric: Generating parameters on the fly with @files
- Overview
- @files syntax
- A Cartesian Product, all vs all example
Chapter 22: Esoteric: Running jobs in parallel without files using @parallel
- @parallel
Chapter 23: Esoteric: Writing custom functions to decide which jobs are up to date with @check_if_uptodate
- @check_if_uptodate : Manual dependency checking
Appendix 1: Flow Chart Colours with pipeline_printout_graph(...)
- Flowchart colours
Appendix 2: How dependency is checked
- Overview
Appendix 3: Exceptions thrown inside pipelines
- Overview
- Pipelines running in parallel accumulate Exceptions
- Terminate pipeline immediately upon Exceptions
- Display exceptions as they occur
Appendix 4: Names exported from Ruffus
- Ruffus Names
Appendix 5: @files: Deprecated syntax
- Overview
- @files
- Running the same code on different parameters in parallel
Appendix 6: @files_re: Deprecated syntax using regular expressions
- Overview

Example code for:

Overview:¶

Reference:¶

Decorators¶

Core

For advanced users

Combinatorics

Esoteric

Deprecated

Modules:¶

Indices and tables¶