Ruffus documentation¶
Start Here:¶
- Installation
- Ruffus Manual: List of Chapters and Example code
- Chapter 1: An introduction to basic Ruffus syntax
- Chapter 2: Transforming data in a pipeline with @transform
- Chapter 3: More on @transform-ing data
- Chapter 4: Creating files with @originate
- Chapter 5: Understanding how your pipeline works with pipeline_printout(...)
- Chapter 6: Running Ruffus from the command line with ruffus.cmdline
- Template for argparse
- Command Line Arguments
- 1) Logging
- 2) Tracing pipeline progress
- 3) Printing a flowchart
- 4) Running in parallel on multiple processors
- 5) Setup checkpointing so that Ruffus knows which files are out of date
- 6) Skipping specified options
- 7) Specifying verbosity and abbreviating long paths
- 8) Displaying the version
- Template for optparse
- Chapter 7: Displaying the pipeline visually with pipeline_printout_graph(...)
- Chapter 8: Specifying output file names with formatter() and regex()
- Chapter 9: Preparing directories for output with @mkdir()
- Chapter 10: Checkpointing: Interrupted Pipelines and Exceptions
- Overview
- Interrupting tasks
- Checkpointing: only log completed jobs
- Do not share the same checkpoint file across for multiple pipelines!
- Setting checkpoint file names
- Useful checkpoint file name policies DEFAULT_RUFFUS_HISTORY_FILE
- Regenerating the checkpoint file
- Rules for determining if files are up to date
- Missing files generate exceptions
- Caveats: Coarse Timestamp resolution
- Flag files: Checkpointing for the paranoid
- Chapter 11: Pipeline topologies and a compendium of Ruffus decorators
- Chapter 12: Splitting up large tasks / files with @split
- Chapter 13: @merge multiple input into a single result
- Chapter 14: Multiprocessing, drmaa and Computation Clusters
- Chapter 15: Logging progress through a pipeline
- Chapter 16: @subdivide tasks to run efficiently and regroup with @collate
- Chapter 17: @combinations, @permutations and all versus all @product
- Overview
- Generating output with formatter()
- All vs all comparisons with @product
- Permute all k-tuple orderings of inputs without repeats using @permutations
- Select unordered k-tuples within inputs excluding repeated elements using @combinations
- Select unordered k-tuples within inputs including repeated elements with @combinations_with_replacement
- Chapter 18: Turning parts of the pipeline on and off at runtime with @active_if
- Chapter 19: Signal the completion of each stage of our pipeline with @posttask
- Chapter 20: Manipulating task inputs via string substitution using inputs() and add_inputs()
- Chapter 21: Esoteric: Generating parameters on the fly with @files
- Chapter 22: Esoteric: Running jobs in parallel without files using @parallel
- Chapter 23: Esoteric: Writing custom functions to decide which jobs are up to date with @check_if_uptodate
- Appendix 1: Flow Chart Colours with pipeline_printout_graph(...)
- Appendix 2: How dependency is checked
- Appendix 3: Exceptions thrown inside pipelines
- Appendix 4: Names exported from Ruffus
- Appendix 5: @files: Deprecated syntax
- Appendix 6: @files_re: Deprecated syntax using regular expressions
Example code for:
- Chapter 1: Python Code for An introduction to basic Ruffus syntax
- Chapter 1: Python Code for Transforming data in a pipeline with @transform
- Chapter 3: Python Code for More on @transform-ing data
- Chapter 4: Python Code for Creating files with @originate
- Chapter 5: Python Code for Understanding how your pipeline works with pipeline_printout(...)
- Chapter 7: Python Code for Displaying the pipeline visually with pipeline_printout_graph(...)
- Chapter 8: Python Code for Specifying output file names with formatter() and regex()
- Chapter 9: Python Code for Preparing directories for output with @mkdir()
- Chapter 10: Python Code for Checkpointing: Interrupted Pipelines and Exceptions
- Chapter 12: Python Code for Splitting up large tasks / files with @split
- Chapter 13: Python Code for @merge multiple input into a single result
- Chapter 14: Python Code for Multiprocessing, drmaa and Computation Clusters
- Chapter 15: Python Code for Logging progress through a pipeline
- Chapter 16: Python Code for @subdivide tasks to run efficiently and regroup with @collate
- Chapter 17: Python Code for @combinations, @permutations and all versus all @product
- Chapter 20: Python Code for Manipulating task inputs via string substitution using inputs() and add_inputs()
- Chapter 21: Esoteric: Python Code for Generating parameters on the fly with @files
- Appendix 1: Python code for Flow Chart Colours with pipeline_printout_graph(...)
Overview:¶
- Cheat Sheet
- Pipeline functions
- drmaa functions
- Installation
- Design & Architecture
- Major Features added to Ruffus
- Fixed Bugs
- New Object orientated syntax for Ruffus in Version 2.6
- Worked Example for New Object orientated syntax for Ruffus in Version 2.6
- Python Code for: New Object orientated syntax for Ruffus in Version 2.6
- Where I see Ruffus going
- In up coming release:
- Todo: document output_from()
- Todo: document new syntax
- Todo: Log the progress through the pipeline in a machine parsable format
- Todo: either_or: Prevent failed jobs from propagating further
- Todo: (bug fix) pipeline_printout_graph should print inactive tasks
- Todo: Mark input strings as non-file names, and add support for dynamically returned parameters
- Future Changes to Ruffus
- Todo: Replacements for formatter(), suffix(), regex()
- Todo: Allow “extra” parameters to be used in output substitution
- Todo: Extra signalling before and after each task and job
- Todo: @split / @subdivide returns the actual output created
- Todo: New decorators
- Todo: Bioinformatics example to end all examples
- Todo: Allow the next task to start before all jobs in the previous task have finished
- Todo: Allow checkpoint files to be moved
- Todo: Remove intermediate files
- Planned Improvements to Ruffus
- Implementation Tips
- Implementation notes
- FAQ
- Glossary
- Hall of Fame: User contributed flowcharts
- Why Ruffus?