@collate( input, filter, replace_inputs | add_inputs, output, [extras,...] )¶

Purpose:

Use filter to identify common sets of inputs which are to be grouped or collated together:

Each set of inputs which generate identical output and extras using the formatter or regex (regular expression) filters are collated into one job.

This variant of @collate allows additional inputs or dependencies to be added dynamically to the task, with optional string substitution.

add_inputs nests the the original input parameters in a list before adding additional dependencies.

inputs replaces the original input parameters wholescale.

This is a many to fewer operation.

Only out of date jobs (comparing input and output files) will be re-run.

Example of add_inputs
regex(r".*(\..+)"), "\1.summary" creates a separate summary file for each suffix. But we also add date of birth data for each species:
animal_files = "tuna.fish", "shark.fish", "dog.mammals", "cat.mammals"
# summarise by file suffix:
@collate(animal_files, regex(r".+\.(.+)$"),  add_inputs(r"\1.date_of_birth"), r'\1.summary')
def summarize(infiles, summary_file):
    pass
This results in the following equivalent function calls:
summarize([ ["shark.fish",  "fish.date_of_birth"   ],
            ["tuna.fish",   "fish.date_of_birth"   ] ], "fish.summary")
summarize([ ["cat.mammals", "mammals.date_of_birth"],
            ["dog.mammals", "mammals.date_of_birth"] ], "mammals.summary")
Example of add_inputs
using inputs(...) will summarise only the dates of births for each species group:
animal_files = "tuna.fish", "shark.fish", "dog.mammals", "cat.mammals"
# summarise by file suffix:
@collate(animal_files, regex(r".+\.(.+)$"),  inputs(r"\1.date_of_birth"), r'\1.summary')
def summarize(infiles, summary_file):
    pass
This results in the following equivalent function calls:
summarize(["fish.date_of_birth"   ], "fish.summary")
summarize(["mammals.date_of_birth"], "mammals.summary")
Parameters:

input = tasks_or_file_names

can be a:

Task / list of tasks.

File names are taken from the output of the specified task(s)

(Nested) list of file name strings (as in the example above).

File names containing *[]? will be expanded as a glob.

E.g.:"a.*" => "a.1", "a.2"

filter = matching_regex

is a python regular expression string, which must be wrapped in a regex indicator object See python regular expression (re) documentation for details of regular expression syntax

filter = matching_formatter

a formatter indicator object containing optionally a python regular expression (re).

add_inputs = add_inputs(...) or replace_inputs = inputs(...)

Specifies the resulting input(s) to each job.

Positional parameters must be disambiguated by wrapping the values in inputs(...) or an add_inputs(...).

Named parameters can be passed the values directly.

Takes:

Task / list of tasks.

File names are taken from the output of the specified task(s)

(Nested) list of file name strings.

Strings will be subject to substitution. File names containing *[]? will be expanded as a glob. E.g. "a.*" => "a.1", "a.2"

output = output

Specifies the resulting output file name(s).

extras = extras

Any extra parameters are passed verbatim to the task function

If you are using named parameters, these can be passed as a list, i.e. extras= [...]

See @collate for more straightforward ways to use collate.