Appendix 6: @files_re: Deprecated syntax using regular expressions

Warning

  • This is deprecated syntax

    which is no longer supported and

    should NOT be used in new code.

See also

Overview

@files_re combines the functionality of @transform, @collate and @merge in one overloaded decorator.

This is the reason why its use is discouraged. @files_re syntax is far too overloaded and context-dependent to support its myriad of different functions.

The following documentation is provided to help maintain historical Ruffus usage.

Transforming input and output filenames

For example, the following code takes files from the previous pipeline task, and makes new output parameters with the .sums suffix in place of the .chunks suffix:

@transform(step_4_split_numbers_into_chunks, suffix(".chunks"), ".sums")
def step_5_calculate_sum_of_squares (input_file_name, output_file_name):
    #
    #   calculate sums and sums of squares for all values in the input_file_name
    #       writing to output_file_name
    ""

This can be written using @files_re equivalently:

@files_re(step_4_split_numbers_into_chunks, r".chunks", r".sums")
def step_5_calculate_sum_of_squares (input_file_name, output_file_name):
""

Collating many inputs into a single output

Similarly, the following code collects inputs from the same species in the same directory:

@collate('*.animals',                     # inputs = all *.animal files
            regex(r'mammals.([^.]+)'),    # regular expression
            r'\1/animals.in_my_zoo',      # single output file per species
            r'\1' )                       # species name
def capture_mammals(infiles, outfile, species):
    # summarise all animals of this species
    ""

This can be written using @files_re equivalently using the combine indicator:

@files_re('*.animals',                           # inputs = all *.animal files
            r'mammals.([^.]+)',                  # regular expression
            combine(r'\1/animals.in_my_zoo'),    # single output file per species
            r'\1' )                              # species name
def capture_mammals(infiles, outfile, species):
    # summarise all animals of this species
    ""

Generating input and output parameter using regular expresssions

The following code generates additional input prerequisite file names which match the original input files.

We want each job of our analyse() function to get corresponding pairs of xx.chunks and xx.red_indian files when

*.chunks are generated by the task function split_up_problem() and *.red_indian are generated by the task function make_red_indians():

@follows(make_red_indians)
@transform(split_up_problem,                # starting set of *inputs*
            regex(r"(.*).chunks"),          # regular expression
            inputs([r"\g<0>",               # xx.chunks
                    r"\1.red_indian"]),     # important.file
             r"\1.results"                  # xx.results
              )
def analyse(input_filenames, output_file_name):
    "Do analysis here"

The equivalent code using @files_re looks very similar:

@follows(make_red_indians)
@files_re( split_up_problem,        # starting set of *inputs*
           r"(.*).chunks",          # regular expression
           [r"\g<0>",               # xx.chunks
            r"\1.red_indian"]),     # important.file
             r"\1.results")         # xx.results
def analyse(input_filenames, output_file_name):
    "Do analysis here"