See also

@product( input, filter, [input2, filter2, ...], output, [extras,...] )ΒΆ

Purpose:

Generates the Cartesian product, i.e. all vs all comparisons, between multiple sets of input (e.g. A B C D, and X Y Z),

The effect is analogous to the python itertools function of the same name, i.e. a nested for loop.

>>> from itertools import product
>>> # product('ABC', 'XYZ') --> AX AY AZ BX BY BZ CX CY CZ
>>> [ "".join(a) for a in product('ABC', 'XYZ')]
['AX', 'AY', 'AZ', 'BX', 'BY', 'BZ', 'CX', 'CY', 'CZ']

Only out of date tasks (comparing input and output files) will be run

output file names and strings in the extra parameters are generated by string replacement via the formatter() filter from the input. This can be, for example, a list of file names or the output of up stream tasks. . The replacement strings require an extra level of nesting to refer to parsed components.

  1. The first level refers to which set in each tuple of input.
  2. The second level refers to which input file in any particular set of input.

This will be clear in the following example:

Example:

Calculates the @product of A,B and P,Q and X, Y files

If input is three sets of file names

    set1 = [ 'a.start',                         # 0
             'b.start'])

    set2 = [ 'p.start',                         # 1
             'q.start'])

    set3 = [ ['x.1_start', 'x.2_start'],        # 2
             ['y.1_start', 'y.2_start'] ]

The first job of:

@product( input  = set1, filter  = formatter(),
          input2 = set2, filter2 = formatter(),
          input3 = set2, filter3 = formatter(),
          ...)

Will be

# One from each set
['a.start']
# versus
['p.start']
# versus
['x.1_start', 'x.2_start'],
First level of nesting (one list of files from each set):
['a.start']                 # [0]
['p.start']                 # [1]
['x.1_start', 'x.2_start'], # [2]
Second level of nesting (one file):
'a.start'                   # [0][0]
'p.start'                   # [1][0]
'x.1_start'                 # [2][0]
Parse filename without suffix
'a'                         # {basename[0][0]}
'p'                         # {basename[1][0]}
'x'                         # {basename[2][0]}

Python code:

from ruffus import *
from ruffus.combinatorics import *

#   Three sets of initial files
@originate([ 'a.start', 'b.start'])
def create_initial_files_ab(output_file):
    with open(output_file, "w") as oo: pass

@originate([ 'p.start', 'q.start'])
def create_initial_files_pq(output_file):
    with open(output_file, "w") as oo: pass

@originate([ ['x.1_start', 'x.2_start'],
             ['y.1_start', 'y.2_start'] ])
def create_initial_files_xy(output_files):
    for o in output_files:
        with open(o, "w") as oo: pass

#   @product
@product(   create_initial_files_ab,        # Input
            formatter("(.start)$"),         # match input file set # 1

            create_initial_files_pq,        # Input
            formatter("(.start)$"),         # match input file set # 2

            create_initial_files_xy,        # Input
            formatter("(.start)$"),         # match input file set # 3

            "{path[0][0]}/"                 # Output Replacement string
            "{basename[0][0]}_vs_"          #
            "{basename[1][0]}_vs_"          #
            "{basename[2][0]}.product",     #

            "{path[0][0]}",                 # Extra parameter: path for 1st set of files, 1st file name

            ["{basename[0][0]}",            # Extra parameter: basename for 1st set of files, 1st file name
             "{basename[1][0]}",            #                               2nd
             "{basename[2][0]}",            #                               3rd
             ])
def product_task(input_file, output_parameter, shared_path, basenames):
    print "# basenames      = ", " ".join(basenames)
    print "input_parameter  = ", input_file
    print "output_parameter = ", output_parameter, "\n"


#
#       Run
#
#pipeline_printout(verbose=6)
pipeline_run(verbose=0)

This results in:

>>> pipeline_run(verbose=0)

# basenames      =  a p x
input_parameter  =  ('a.start', 'p.start', 'x.start')
output_parameter =  /home/lg/temp/a_vs_p_vs_x.product

# basenames      =  a p y
input_parameter  =  ('a.start', 'p.start', 'y.start')
output_parameter =  /home/lg/temp/a_vs_p_vs_y.product

# basenames      =  a q x
input_parameter  =  ('a.start', 'q.start', 'x.start')
output_parameter =  /home/lg/temp/a_vs_q_vs_x.product

# basenames      =  a q y
input_parameter  =  ('a.start', 'q.start', 'y.start')
output_parameter =  /home/lg/temp/a_vs_q_vs_y.product

# basenames      =  b p x
input_parameter  =  ('b.start', 'p.start', 'x.start')
output_parameter =  /home/lg/temp/b_vs_p_vs_x.product

# basenames      =  b p y
input_parameter  =  ('b.start', 'p.start', 'y.start')
output_parameter =  /home/lg/temp/b_vs_p_vs_y.product

# basenames      =  b q x
input_parameter  =  ('b.start', 'q.start', 'x.start')
output_parameter =  /home/lg/temp/b_vs_q_vs_x.product

# basenames      =  b q y
input_parameter  =  ('b.start', 'q.start', 'y.start')
output_parameter =  /home/lg/temp/b_vs_q_vs_y.product

Parameters:

  • input = tasks_or_file_names

    can be a:

    1. Task / list of tasks.

      File names are taken from the output of the specified task(s)

    2. (Nested) list of file name strings.
      File names containing *[]? will be expanded as a glob.

      E.g.:"a.*" => "a.1", "a.2"

Additional input and filter as needed:

  • input2 = tasks_or_file_names
  • filter2 = formater(...)
  • output = output

    Specifies the resulting output file name(s) after string substitution

  • extras = extras

    Any extra parameters are passed verbatim to the task function

    If you are using named parameters, these can be passed as a list, i.e. extras= [...]

    Any extra parameters are consumed by the task function and not forwarded further down the pipeline.