Chapter 23: Esoteric: Writing custom functions to decide which jobs are up to date with @check_if_uptodate

@check_if_uptodate : Manual dependency checking

tasks specified with most decorators such as

have automatic dependency checking based on file modification times.

Sometimes, you might want to decide have more control over whether to run jobs, especially if a task does not rely on or produce files (i.e. with @parallel)

You can write your own custom function to decide whether to run a job. This takes as many parameters as your task function, and needs to return a tuple for whether an update is required, and why (i.e. tuple(bool, str))

This simple example which creates the file "a.1" if it does not exist:

from ruffus import *
@originate("a.1")
def create_if_necessary(output_file):
    open(output_file, "w")

pipeline_run([])

could be rewritten more laboriously as:

from ruffus import *
import os
def check_file_exists(input_file, output_file):
    if os.path.exists(output_file):
        return False, "File already exists"
    return True, "%s is missing" % output_file

@parallel([[None, "a.1"]])
@check_if_uptodate(check_file_exists)
def create_if_necessary(input_file, output_file):
    open(output_file, "w")

pipeline_run([create_if_necessary])
Both produce the same output:
Task = create_if_necessary
    Job = [null, "a.1"] completed

Note

The function specified by @check_if_uptodate can be called more than once for each job.

See the description here of how Ruffus decides which tasks to run.