See also

@collate( input, filter, replace_inputs | add_inputs, output, [extras,...] )ΒΆ

Purpose:

Use filter to identify common sets of inputs which are to be grouped or collated together:

Each set of inputs which generate identical output and extras using the formatter or regex (regular expression) filters are collated into one job.

This variant of @collate allows additional inputs or dependencies to be added dynamically to the task, with optional string substitution.

add_inputs nests the the original input parameters in a list before adding additional dependencies.

inputs replaces the original input parameters wholescale.

This is a many to fewer operation.

Only out of date jobs (comparing input and output files) will be re-run.

Example of add_inputs

regex(r".*(\..+)"), "\1.summary" creates a separate summary file for each suffix. But we also add date of birth data for each species:

animal_files = "tuna.fish", "shark.fish", "dog.mammals", "cat.mammals"
# summarise by file suffix:
@collate(animal_files, regex(r".+\.(.+)$"),  add_inputs(r"\1.date_of_birth"), r'\1.summary')
def summarize(infiles, summary_file):
    pass

This results in the following equivalent function calls:

summarize([ ["shark.fish",  "fish.date_of_birth"   ],
            ["tuna.fish",   "fish.date_of_birth"   ] ], "fish.summary")
summarize([ ["cat.mammals", "mammals.date_of_birth"],
            ["dog.mammals", "mammals.date_of_birth"] ], "mammals.summary")

Example of add_inputs

using inputs(...) will summarise only the dates of births for each species group:

animal_files = "tuna.fish", "shark.fish", "dog.mammals", "cat.mammals"
# summarise by file suffix:
@collate(animal_files, regex(r".+\.(.+)$"),  inputs(r"\1.date_of_birth"), r'\1.summary')
def summarize(infiles, summary_file):
    pass

This results in the following equivalent function calls:

summarize(["fish.date_of_birth"   ], "fish.summary")
summarize(["mammals.date_of_birth"], "mammals.summary")

Parameters:

  • input = tasks_or_file_names

    can be a:

    1. Task / list of tasks.

      File names are taken from the output of the specified task(s)

    2. (Nested) list of file name strings (as in the example above).
      File names containing *[]? will be expanded as a glob.

      E.g.:"a.*" => "a.1", "a.2"

  • filter = matching_regex

    is a python regular expression string, which must be wrapped in a regex indicator object See python regular expression (re) documentation for details of regular expression syntax

  • add_inputs = add_inputs(...) or replace_inputs = inputs(...)

    Specifies the resulting input(s) to each job.

    Positional parameters must be disambiguated by wrapping the values in inputs(...) or an add_inputs(...).

    Named parameters can be passed the values directly.

    Takes:

    1. Task / list of tasks.

      File names are taken from the output of the specified task(s)

    2. (Nested) list of file name strings.

      Strings will be subject to substitution. File names containing *[]? will be expanded as a glob. E.g. "a.*" => "a.1", "a.2"

  • output = output

    Specifies the resulting output file name(s).

  • extras = extras

    Any extra parameters are passed verbatim to the task function

    If you are using named parameters, these can be passed as a list, i.e. extras= [...]

See @collate for more straightforward ways to use collate.