ruffus
  • Installation
    • The easy way
    • The most up-to-date code:
    • Prequisites
    • Installing easy_install
    • Installing pip
    • Graphical flowcharts The most up-to-date code:
  • Ruffus Manual: List of Chapters and Example code
  • Chapter 1: An introduction to basic Ruffus syntax
    • Overview
    • Importing Ruffus
    • Ruffus decorators
    • Your first Ruffus pipeline
  • Chapter 2: Transforming data in a pipeline with @transform
    • Review
    • Task functions as recipes
    • @transform is a 1 to 1 operation
    • Input and Output parameters
  • Chapter 3: More on @transform-ing data
    • Review
    • Running pipelines in parallel
    • Up-to-date jobs are not re-run unnecessarily
    • Defining pipeline tasks out of order
    • Multiple dependencies
    • @follows
    • Making directories automatically with @follows and mkdir
    • Globs in the Input parameter
    • Mixing Tasks and Globs in the Input parameter
  • Chapter 4: Creating files with @originate
    • Simplifying our example with @originate
  • Chapter 5: Understanding how your pipeline works with pipeline_printout(...)
    • Printing out which jobs will be run
    • Determining which jobs are out-of-date or not
    • Verbosity levels
    • Abbreviating long file paths with verbose_abbreviated_path
    • Getting a list of all tasks in a pipeline
  • Chapter 6: Running Ruffus from the command line with ruffus.cmdline
    • Template for argparse
    • Command Line Arguments
    • 1) Logging
    • 2) Tracing pipeline progress
    • 3) Printing a flowchart
    • 4) Running in parallel on multiple processors
    • 5) Setup checkpointing so that Ruffus knows which files are out of date
    • 6) Skipping specified options
    • 7) Specifying verbosity and abbreviating long paths
    • 8) Displaying the version
    • Template for optparse
  • Chapter 7: Displaying the pipeline visually with pipeline_printout_graph(...)
    • Printing out a flowchart of our pipeline
    • Command line options made easier with ruffus.cmdline
    • Horribly complicated pipelines!
    • Circular dependency errors in pipelines!
    • @graphviz: Customising the appearance of each task
  • Chapter 8: Specifying output file names with formatter() and regex()
    • Review
    • A different file name suffix() for each pipeline stage
    • formatter() manipulates pathnames and regular expression
    • regex() manipulates via regular expressions
  • Chapter 9: Preparing directories for output with @mkdir()
    • Overview
    • Creating directories after string substitution in a zoo...
  • Chapter 10: Checkpointing: Interrupted Pipelines and Exceptions
    • Overview
    • Interrupting tasks
    • Checkpointing: only log completed jobs
    • Do not share the same checkpoint file across for multiple pipelines!
    • Setting checkpoint file names
    • Useful checkpoint file name policies DEFAULT_RUFFUS_HISTORY_FILE
    • Regenerating the checkpoint file
    • Rules for determining if files are up to date
    • Missing files generate exceptions
    • Caveats: Coarse Timestamp resolution
    • Flag files: Checkpointing for the paranoid
  • Chapter 11: Pipeline topologies and a compendium of Ruffus decorators
    • Overview
    • @transform
    • A bestiary of Ruffus decorators
    • @originate
    • @merge
    • @split
    • @subdivide
    • @collate
    • Combinatorics
    • @product
    • @combinations
    • @combinations_with_replacement
    • @permutations
  • Chapter 12: Splitting up large tasks / files with @split
    • Overview
    • Example: Calculate variance for a large list of numbers in parallel
    • Output files for @split
    • Be careful in specifying Output globs
    • Clean up previous pipeline runs
    • 1 to many
    • Nothing to many
  • Chapter 13: @merge multiple input into a single result
    • Overview of @merge
    • @merge is a many to one operator
    • Example: Combining partial solutions: Calculating variances
  • Chapter 14: Multiprocessing, drmaa and Computation Clusters
    • Overview
    • Restricting parallelism with @jobs_limit
    • Using drmaa to dispatch work to Computational Clusters or Grid engines from Ruffus jobs
    • Forcing a pipeline to appear up to date
  • Chapter 15: Logging progress through a pipeline
    • Overview
    • Logging task/job completion
    • Use ruffus.cmdline
    • Customising logging
    • Log your own messages
  • Chapter 16: @subdivide tasks to run efficiently and regroup with @collate
    • Overview
    • @subdivide in parallel
    • Grouping using @collate
  • Chapter 17: @combinations, @permutations and all versus all @product
    • Overview
    • Generating output with formatter()
    • All vs all comparisons with @product
    • Permute all k-tuple orderings of inputs without repeats using @permutations
    • Select unordered k-tuples within inputs excluding repeated elements using @combinations
    • Select unordered k-tuples within inputs including repeated elements with @combinations_with_replacement
  • Chapter 18: Turning parts of the pipeline on and off at runtime with @active_if
    • Overview
    • @active_if controls the state of tasks
  • Chapter 19: Signal the completion of each stage of our pipeline with @posttask
    • Overview
  • Chapter 20: Manipulating task inputs via string substitution using inputs() and add_inputs()
    • Overview
    • Adding additional input prerequisites per job with add_inputs()
    • Replacing all input parameters with inputs()
  • Chapter 21: Esoteric: Generating parameters on the fly with @files
    • Overview
    • @files syntax
    • A Cartesian Product, all vs all example
  • Chapter 22: Esoteric: Running jobs in parallel without files using @parallel
    • @parallel
  • Chapter 23: Esoteric: Writing custom functions to decide which jobs are up to date with @check_if_uptodate
    • @check_if_uptodate : Manual dependency checking
  • Appendix 1: Flow Chart Colours with pipeline_printout_graph(...)
    • Flowchart colours
  • Appendix 2: How dependency is checked
    • Overview
  • Appendix 3: Exceptions thrown inside pipelines
    • Overview
    • Pipelines running in parallel accumulate Exceptions
    • Terminate pipeline immediately upon Exceptions
    • Display exceptions as they occur
  • Appendix 4: Names exported from Ruffus
    • Ruffus Names
  • Appendix 5: @files: Deprecated syntax
    • Overview
    • @files
    • Running the same code on different parameters in parallel
  • Appendix 6: @files_re: Deprecated syntax using regular expressions
    • Overview
  • Chapter 1: Python Code for An introduction to basic Ruffus syntax
    • Your first Ruffus script
    • Resulting Output
  • Chapter 1: Python Code for Transforming data in a pipeline with @transform
    • Your first Ruffus script
    • Resulting Output
  • Chapter 3: Python Code for More on @transform-ing data
    • Producing several items / files per job
    • Defining tasks function out of order
    • Multiple dependencies
    • Multiple dependencies after @follows
  • Chapter 4: Python Code for Creating files with @originate
    • Using @originate
    • Resulting Output
  • Chapter 5: Python Code for Understanding how your pipeline works with pipeline_printout(...)
    • Display the initial state of the pipeline
    • Normal Output
    • High Verbosity Output
    • Display the partially up-to-date pipeline
  • Chapter 7: Python Code for Displaying the pipeline visually with pipeline_printout_graph(...)
    • Code
    • Resulting Flowcharts
  • Chapter 8: Python Code for Specifying output file names with formatter() and regex()
    • Example Code for suffix()
    • Example Code for formatter()
    • Example Code for formatter() with replacements in extra arguments
    • Example Code for formatter() in Zoos
    • Example Code for regex() in zoos
  • Chapter 9: Python Code for Preparing directories for output with @mkdir()
    • Code for formatter() Zoo example
    • Code for regex() Zoo example
  • Chapter 10: Python Code for Checkpointing: Interrupted Pipelines and Exceptions
    • Code for the “Interrupting tasks” example
  • Chapter 12: Python Code for Splitting up large tasks / files with @split
    • Splitting large jobs
    • Resulting Output
  • Chapter 13: Python Code for @merge multiple input into a single result
    • Splitting large jobs
    • Resulting Output
  • Chapter 14: Python Code for Multiprocessing, drmaa and Computation Clusters
    • @jobs_limit
    • Using ruffus.drmaa_wrapper
  • Chapter 15: Python Code for Logging progress through a pipeline
    • Rotating set of file logs
  • Chapter 16: Python Code for @subdivide tasks to run efficiently and regroup with @collate
    • @subdivide and regroup with @collate example
  • Chapter 17: Python Code for @combinations, @permutations and all versus all @product
    • Example code for @product
    • Example code for @permutations
    • Example code for @combinations
    • Example code for @combinations_with_replacement
  • Chapter 20: Python Code for Manipulating task inputs via string substitution using inputs() and add_inputs()
    • Example code for adding additional input prerequisites per job with add_inputs()
    • Example code for replacing all input parameters with inputs()
  • Chapter 21: Esoteric: Python Code for Generating parameters on the fly with @files
    • Introduction
    • Code
    • Resulting Output
  • Appendix 1: Python code for Flow Chart Colours with pipeline_printout_graph(...)
    • Code
  • Cheat Sheet
    • 1. Annotate functions with Ruffus decorators
    • 2. Print dependency graph if necessary
    • 3. Run the pipeline
  • Pipeline functions
    • pipeline_run
    • pipeline_printout
    • pipeline_printout_graph
    • pipeline_get_task_names
  • drmaa functions
    • run_job
  • Installation
    • The easy way
    • The most up-to-date code:
    • Prequisites
    • Installing easy_install
    • Installing pip
    • Graphical flowcharts The most up-to-date code:
  • Design & Architecture
    • GNU Make
    • Scons, Rake and other Make alternatives
    • Managing pipelines stage-by-stage using Ruffus
    • Alternatives to Ruffus
  • Major Features added to Ruffus
    • version 2.6
    • version 2.5
    • version 2.4.1
    • version 2.4
    • version 2.3
    • version 2.2
    • version 2.1.1
    • version 2.1.0
    • version 2.0.10
    • version 2.0.9
    • version 2.0.8
    • version 2.0.2
    • version 2.0
    • version 1.1.4
    • version 1.0.7
    • version 1.0
  • Fixed Bugs
  • New Object orientated syntax for Ruffus in Version 2.6
    • Syntax
    • Advantages
    • Compatability
    • Class methods
    • Call chaining
    • Referring to Tasks
  • Worked Example for New Object orientated syntax for Ruffus in Version 2.6
    • Worked example
  • Python Code for: New Object orientated syntax for Ruffus in Version 2.6
  • Where I see Ruffus going
  • In up coming release:
    • Todo: document output_from()
    • Todo: document new syntax
    • Todo: Log the progress through the pipeline in a machine parsable format
    • Todo: either_or: Prevent failed jobs from propagating further
    • Todo: (bug fix) pipeline_printout_graph should print inactive tasks
    • Todo: Mark input strings as non-file names, and add support for dynamically returned parameters
  • Future Changes to Ruffus
    • Todo: Replacements for formatter(), suffix(), regex()
    • Todo: Allow “extra” parameters to be used in output substitution
    • Todo: Extra signalling before and after each task and job
    • Todo: @split / @subdivide returns the actual output created
    • Todo: New decorators
    • Todo: Bioinformatics example to end all examples
    • Todo: Allow the next task to start before all jobs in the previous task have finished
    • Todo: Allow checkpoint files to be moved
    • Todo: Remove intermediate files
  • Planned Improvements to Ruffus
    • Planned: Running python code (task functions) transparently on remote cluster nodes
    • Planned: Custom parameter generator
    • Planned: Ruffus GUI interface.
    • Planned: Non-decorator / Function interface to Ruffus
    • Planned: @retry_on_error(NUM_OF_RETRIES)
    • Planned: Clean up
  • Implementation Tips
    • Items remaining for current release
    • Release
    • blogger
    • dbdict.py
    • how to write new decorators
  • Implementation notes
    • Ctrl-C handling
    • Python3 compatability
    • Refactoring: parameter handling
    • formatter
    • @product()
    • @permutations(...), @combinations(...), @combinations_with_replacement(...)
    • drmaa alternatives
    • Task completion monitoring
    • @mkdir(...),
    • Parameter handling
    • Add Object Orientated interface
  • FAQ
    • Citations
    • Good practices
    • General
    • Windows
    • Sun Grid Engine / PBS / SLURM etc
    • Sharing python objects between Ruffus processes running concurrently
  • Glossary
  • Hall of Fame: User contributed flowcharts
    • RNASeq pipeline
    • non-coding evolutionary constraints
    • SNP annotation
    • Chip-Seq analysis
  • Why Ruffus?
  • Construction of a simple pipeline to run BLAST jobs
    • Overview
    • Prerequisites
    • Code
    • Step 1. Splitting up the query sequences
    • Step 2. Run BLAST jobs in parallel
    • Step 3. Combining BLAST results
    • Step 4. Running the pipeline
    • Step 5. Testing dependencies
    • What is next?
  • Part 2: A slightly more practical pipeline to run blasts jobs
    • Overview
    • Step 1. Cleaning up any leftover junk from previous pipeline runs
    • Step 2. Adding a “flag” file to mark successful completion
    • Step 3. Allowing the script to be invoked on the command line
    • Step 4. Printing out a flowchart for the pipeline
    • Step 5. Errors
    • Step 6. Will it run?
  • Ruffus code
  • Ruffus code
  • Example code for FAQ Good practices: "What is the best way of handling data in file pairs (or triplets etc.)?"
  • Ruffus Decorators
    • Core
    • Combinatorics
    • Advanced
    • Esoteric!
  • Indicator Objects
    • formatter
    • suffix
    • regex
    • add_inputs
    • inputs
    • mkdir
    • touch_file
    • output_from
    • combine
  • @originate ( output, [extras,...] )
  • @split ( input, output, [extras,...] )
  • @transform( input, filter, output, [extras,...] )
  • @merge ( input, output, [extras,...] )
  • @subdivide
    • @subdivide ( input, regex(matching_regex) | formatter(matching_formatter), [ inputs (input_pattern_or_glob) | add_inputs (input_pattern_or_glob) ], output, [extras,...] )
  • @transform( input, filter, replace_inputs | add_inputs, output, [extras,...] )
  • @collate( input, filter, output, [extras,...] )
  • @collate( input, filter, replace_inputs | add_inputs, output, [extras,...] )
  • @graphviz
    • @graphviz ( graphviz_parameters,...] )
  • @mkdir( input, filter, output )
  • @jobs_limit
    • @jobs_limit ( maximum_num_of_jobs, [ name ])
  • @posttask
    • @posttask (function | touch_file(file_name))
  • @active_if
    • @active_if(on_or_off1, [on_or_off2,...])
  • @follows
    • @follows(task | “task_name” | mkdir (directory_name), [more_tasks, ...])
  • @product( input, filter, [input2, filter2, ...], output, [extras,...] )
  • @permutations( input, filter, tuple_size, output, [extras,...] )
  • @combinations( input, filter, tuple_size, output, [extras,...] )
  • @combinations_with_replacement( input, filter, tuple_size, output, [extras,...] )
  • Generating parameters on the fly for @files
    • @files (custom_function)
  • @check_if_uptodate
    • @check_if_uptodate (dependency_checking_function)
  • @parallel
    • @parallel ( [ [job_params, ...], [job_params, ...]...] | parameter_generating_function)
  • @files
    • @files (input1, output1, [extra_parameters1, ...])
    • @files ( (( input, output, [extra_parameters,...] ), (...), ...) )
  • @files_re
    • @files_re (tasks_or_file_names, matching_regex, [input_pattern], output_pattern, [extra_parameters,...])
  • ruffus.Task
    • Decorators
    • Pipeline functions
    • Logging
    • Implementation:
    • Exceptions and Errors
  • ruffus.proxy_logger
    • Create proxy for logging for use with multiprocessing
    • Proxies for a log:
    • Create a logging object
 
ruffus
  • Docs »
  • Edit on GitHub


© Copyright 2009-2013 Leo Goodstadt.

Built with Sphinx using a theme provided by Read the Docs.