ruffus.Task

Decorators

Basic Task decorators are:

Task decorators include:

More advanced users may require:

Pipeline functions

pipeline_run

ruffus.task.pipeline_run(target_tasks, forcedtorun_tasks=, []multiprocess=1, logger=stderr_logger, gnu_make_maximal_rebuild_mode=True)[source]

Run pipelines.

Parameters:
  • target_tasks – targets task functions which will be run if they are out-of-date
  • forcedtorun_tasks – task functions which will be run whether or not they are out-of-date
  • multiprocess – The number of concurrent jobs running on different processes.
  • multithread – The number of concurrent jobs running as different threads. If > 1, ruffus will use multithreading instead of multiprocessing (and ignore the multiprocess parameter). Using multi threading is particularly useful to manage high performance clusters which otherwise are prone to “processor storms” when large number of cores finish jobs at the same time. (Thanks Andreas Heger)
  • logger (logging objects) – Where progress will be logged. Defaults to stderr output.
  • verbose
    • level 0 : nothing
    • level 1 : Out-of-date Task names
    • level 2 : All Tasks (including any task function docstrings)
    • level 3 : Out-of-date Jobs in Out-of-date Tasks, no explanation
    • level 4 : Out-of-date Jobs in Out-of-date Tasks, with explanations and warnings
    • level 5 : All Jobs in Out-of-date Tasks, (include only list of up-to-date tasks)
    • level 6 : All jobs in All Tasks whether out of date or not
    • level 10: logs messages useful only for debugging ruffus pipeline code
  • touch_files_only – Create or update input/output files only to simulate running the pipeline. Do not run jobs. If set to CHECKSUM_REGENERATE, will regenerate the checksum history file to reflect the existing i/o files on disk.
  • exceptions_terminate_immediately – Exceptions cause immediate termination rather than waiting for N jobs to finish where N = multiprocess
  • log_exceptions – Print exceptions to logger as soon as they occur.
  • checksum_level

    Several options for checking up-to-dateness are available: Default is level 1.

    • level 0 : Use only file timestamps
    • level 1 : above, plus timestamp of successful job completion
    • level 2 : above, plus a checksum of the pipeline function body
    • level 3 : above, plus a checksum of the pipeline function default arguments and the additional arguments passed in by task decorators
  • one_second_per_job – To work around poor file timepstamp resolution for some file systems. Defaults to True if checksum_level is 0 forcing Tasks to take a minimum of 1 second to complete.
  • runtime_data – Experimental feature: pass data to tasks at run time
  • gnu_make_maximal_rebuild_mode – Defaults to re-running all out-of-date tasks. Runs minimal set to build targets if set to True. Use with caution.
  • history_file – Database file storing checksums and file timestamps for input/output files.
  • verbose_abbreviated_path

    whether input and output paths are abbreviated.

    • level 0: The full (expanded, abspath) input or output path
    • level > 1: The number of subdirectories to include. Abbreviated paths are prefixed with [,,,]/
    • level < 0: Input / Output parameters are truncated to MMM letters where verbose_abbreviated_path ==-MMM. Subdirectories are first removed to see if this allows the paths to fit in the specified limit. Otherwise abbreviated paths are prefixed by <???>

pipeline_printout

ruffus.task.pipeline_printout(output_stream=None, target_tasks=, []forcedtorun_tasks=, []verbose=None, indent=4, gnu_make_maximal_rebuild_mode=True, wrap_width=100, runtime_data=None, checksum_level=None, history_file=None, verbose_abbreviated_path=None, pipeline=None)[source]

Printouts the parts of the pipeline which will be run

Because the parameters of some jobs depend on the results of previous tasks, this function produces only the current snap-shot of task jobs. In particular, tasks which generate variable number of inputs into following tasks will not produce the full range of jobs.

::
verbose = 0 : Nothing verbose = 1 : Out-of-date Task names verbose = 2 : All Tasks (including any task function docstrings) verbose = 3 : Out-of-date Jobs in Out-of-date Tasks, no explanation verbose = 4 : Out-of-date Jobs in Out-of-date Tasks, with explanations and warnings verbose = 5 : All Jobs in Out-of-date Tasks, (include only list of up-to-date tasks) verbose = 6 : All jobs in All Tasks whether out of date or not
Parameters:
  • output_stream (file-like object with write() function) – where to print to
  • target_tasks – targets task functions which will be run if they are out-of-date
  • forcedtorun_tasks – task functions which will be run whether or not they are out-of-date
  • verbose – level 0 : nothing level 1 : Out-of-date Task names level 2 : All Tasks (including any task function docstrings) level 3 : Out-of-date Jobs in Out-of-date Tasks, no explanation level 4 : Out-of-date Jobs in Out-of-date Tasks, with explanations and warnings level 5 : All Jobs in Out-of-date Tasks, (include only list of up-to-date tasks) level 6 : All jobs in All Tasks whether out of date or not level 10: logs messages useful only for debugging ruffus pipeline code
  • indent – How much indentation for pretty format.
  • gnu_make_maximal_rebuild_mode – Defaults to re-running all out-of-date tasks. Runs minimal set to build targets if set to True. Use with caution.
  • wrap_width – The maximum length of each line
  • runtime_data – Experimental feature: pass data to tasks at run time
  • checksum_level – Several options for checking up-to-dateness are available: Default is level 1. level 0 : Use only file timestamps level 1 : above, plus timestamp of successful job completion level 2 : above, plus a checksum of the pipeline function body level 3 : above, plus a checksum of the pipeline function default arguments and the additional arguments passed in by task decorators
  • history_file – Database file storing checksums and file timestamps for input/output files.
  • verbose_abbreviated_path – whether input and output paths are abbreviated. level 0: The full (expanded, abspath) input or output path level > 1: The number of subdirectories to include. Abbreviated paths are prefixed with [,,,]/ level < 0: Input / Output parameters are truncated to MMM letters where verbose_abbreviated_path ==-MMM. Subdirectories are first removed to see if this allows the paths to fit in the specified limit. Otherwise abbreviated paths are prefixed by <???>

pipeline_printout_graph

ruffus.task.pipeline_printout_graph(stream, output_format=None, target_tasks=, []forcedtorun_tasks=, []draw_vertically=True, ignore_upstream_of_target=False, skip_uptodate_tasks=False, gnu_make_maximal_rebuild_mode=True, test_all_task_for_update=True, no_key_legend=False, minimal_key_legend=True, user_colour_scheme=None, pipeline_name='Pipeline:', size=(11, 8), dpi=120, runtime_data=None, checksum_level=None, history_file=None, pipeline=None)[source]

print out pipeline dependencies in various formats

Parameters:
  • stream (file-like object with write() function) – where to print to
  • output_format – [“dot”, “jpg”, “svg”, “ps”, “png”]. All but the first depends on the dot program.
  • target_tasks – targets task functions which will be run if they are out-of-date.
  • forcedtorun_tasks – task functions which will be run whether or not they are out-of-date.
  • draw_vertically – Top to bottom instead of left to right.
  • ignore_upstream_of_target – Don’t draw upstream tasks of targets.
  • skip_uptodate_tasks – Don’t draw up-to-date tasks if possible.
  • gnu_make_maximal_rebuild_mode – Defaults to re-running all out-of-date tasks. Runs minimal set to build targets if set to True. Use with caution.
  • test_all_task_for_update – Ask all task functions if they are up-to-date.
  • no_key_legend – Don’t draw key/legend for graph.
  • minimal_key_legend – Only legend entries for used task types
  • user_colour_scheme – Dictionary specifying flowchart colour scheme
  • pipeline_name – Pipeline Title
  • size – tuple of x and y dimensions
  • dpi – print resolution
  • runtime_data – Experimental feature: pass data to tasks at run time
  • history_file – Database file storing checksums and file timestamps for input/output files.
  • checksum_level – Several options for checking up-to-dateness are available: Default is level 1. level 0 : Use only file timestamps level 1 : above, plus timestamp of successful job completion level 2 : above, plus a checksum of the pipeline function body level 3 : above, plus a checksum of the pipeline function default arguments and the additional arguments passed in by task decorators

Logging

class ruffus.task.t_black_hole_logger[source]

Does nothing!

class ruffus.task.t_stderr_logger[source]

Everything to stderr

Implementation:

Parameter factories:

ruffus.task.merge_param_factory(input_files_task_globs, output_param, *extra_params)

Factory for task_merge

ruffus.task.collate_param_factory(input_files_task_globs, file_names_transform, extra_input_files_task_globs, replace_inputs, output_pattern, *extra_specs)

Factory for task_collate

Looks exactly like @transform except that all [input] which lead to the same [output / extra] are combined together

ruffus.task.transform_param_factory(input_files_task_globs, file_names_transform, extra_input_files_task_globs, replace_inputs, output_pattern, *extra_specs)

Factory for task_transform

ruffus.task.files_param_factory(input_files_task_globs, do_not_expand_single_job_tasks, output_extras)
Factory for functions which
yield tuples of inputs, outputs / extras

..Note:

1. Each job requires input/output file names
2. Input/output file names can be a string, an arbitrarily nested sequence
3. Non-string types are ignored
3. Either Input or output file name must contain at least one string
ruffus.task.args_param_factory(orig_args)
Factory for functions which
yield tuples of inputs, outputs / extras

..Note:

1. Each job requires input/output file names
2. Input/output file names can be a string, an arbitrarily nested sequence
3. Non-string types are ignored
3. Either Input or output file name must contain at least one string
ruffus.task.split_param_factory(input_files_task_globs, output_files_task_globs, *extra_params)

Factory for task_split

Wrappers around jobs:

ruffus.task.job_wrapper_generic(params, user_defined_work_func, register_cleanup, touch_files_only)[source]

run func

ruffus.task.job_wrapper_io_files(params, user_defined_work_func, register_cleanup, touch_files_only, output_files_only=False)[source]

run func on any i/o if not up to date

ruffus.task.job_wrapper_mkdir(params, user_defined_work_func, register_cleanup, touch_files_only)[source]

Make missing directories including any intermediate directories on the specified path(s)

Checking if job is update:

ruffus.task.needs_update_check_modify_time(*params, **kwargs)

Given input and output files, see if all exist and whether output files are later than input files Each can be

  1. string: assumed to be a filename “file1”
  2. any other type
  3. arbitrary nested sequence of (1) and (2)
ruffus.task.needs_update_check_directory_missing(*params, **kwargs)
Called per directory:
Does it exist? Is it an ordinary file not a directory? (throw exception

Exceptions and Errors