Chapter 9: Preparing directories for output with @mkdir()

Overview

In Chapter 3, we saw that we could use @follows(mkdir()) to ensure that output directories exist:

#
#   create_new_files() @follows mkdir
#
@follows(mkdir("output/results/here"))
@originate(["output/results/here/a.start_file",
            "output/results/here/b.start_file"])
def create_new_files(output_file_pair):
    pass

This ensures that the decorated task follows (@follows) the making of the specified directory (mkdir()).

Sometimes, however, the Output is intended not for any single directory but a group of destinations depending on the parsed contents of Input paths.

Creating directories after string substitution in a zoo...

You may remember this example from Chapter 8:

We want to feed the denizens of a zoo. The original file names are spread over several directories and we group their food supply by the clade of the animal in the following manner:

../../_images/simple_tutorial_zoo_animals_formatter_example.jpg
#   Put different animals in different directories depending on their clade
@transform(create_initial_files,                                       # Input

           formatter(".+/(?P<clade>\w+).(?P<tame>\w+).animals"),       # Only animals: ignore plants!

           "{subpath[0][1]}/{clade[0]}/{tame[0]}.{subdir[0][0]}.food", # Replacement

           "{subpath[0][1]}/{clade[0]}",                               # new_directory
           "{subdir[0][0]}",                                           # animal_name
           "{tame[0]}")                                                # tameness
def feed(input_file, output_file, new_directory, animal_name, tameness):
    print "%40s -> %90s" % (input_file, output_file)
    # this blows up
    # open(output_file, "w")

The example code from Chapter 8 is, however, incomplete. If we were to actually create the specified files we would realise that we had forgotten to create the destination directories reptiles, mammals first!

using formatter()

We could of course create directories manually. However, apart from being tedious and error prone, we have already gone to some lengths to parse out the diretories for @transform. Why don’t we use the same logic to make the directories?

Can you see the parallels between the syntax for @mkdir and @transform?

# create directories for each clade
@mkdir(    create_initial_files,                                       # Input

           formatter(".+/(?P<clade>\w+).(?P<tame>\w+).animals"),       # Only animals: ignore plants!
           "{subpath[0][1]}/{clade[0]})                                # new_directory

#   Put animals of each clade in the same directory
@transform(create_initial_files,                                       # Input

           formatter(".+/(?P<clade>\w+).(?P<tame>\w+).animals"),       # Only animals: ignore plants!

           "{subpath[0][1]}/{clade[0]}/{tame[0]}.{subdir[0][0]}.food", # Replacement

           "{subpath[0][1]}/{clade[0]}",                               # new_directory
           "{subdir[0][0]}",                                           # animal_name
           "{tame[0]}")                                                # tameness
def feed(input_file, output_file, new_directory, animal_name, tameness):
    print "%40s -> %90s" % (input_file, output_file)
    # this works now
    open(output_file, "w")

See the example code

using regex()

If you are particularly fond of using regular expression to parse file paths, you could also use regex():

# create directories for each clade
@mkdir(    create_initial_files,                                       # Input

           regex(r"(.*?)/?(\w+)/(?P<clade>\w+).(?P<tame>\w+).animals"), # Only animals: ignore plants!
           r"\1/\g<clade>")                                             # new_directory

#   Put animals of each clade in the same directory
@transform(create_initial_files,                                       # Input

           formatter(".+/(?P<clade>\w+).(?P<tame>\w+).animals"),       # Only animals: ignore plants!

           "{subpath[0][1]}/{clade[0]}/{tame[0]}.{subdir[0][0]}.food", # Replacement

           "{subpath[0][1]}/{clade[0]}",                               # new_directory
           "{subdir[0][0]}",                                           # animal_name
           "{tame[0]}")                                                # tameness
def feed(input_file, output_file, new_directory, animal_name, tameness):
    print "%40s -> %90s" % (input_file, output_file)
    # this works now
    open(output_file, "w")