Dynamic tagging of processes

I started to add tags to the processes so I can easily have some data about which sample a particular process is processing.
I started putting the tag keyword in the process itself, and while I feel like this is the right thing to do, this is also a nightmare if I want to change it.
Then I tried putting it in the config file. Now everything is located in a single file and easier to maintain, but I have the feeling that the data is being processed in the wrong place.

Which is the preferred/standard way of setting the tags? In the process definitions or in the config file?

Also, when putting the tags in the config file, I’m repeating the same structure for each process:

process {
    withName: 'PROCESS1' {
        cpus   = 6
        memory = { 2.GB * task.attempt }
        time   = { 1.hour * (3 ** (task.attempt - 1)) }
        queue  = { params.partition }
        tag = { meta.id }
    }
    ...

Could it be possible to define the closure once somewhere and assign it to the tags multiple times? I tried adding tag_closure = { meta.id } at the top level and it worked, but I get a warning in the editor. Where should I put that definition?

Conventionally the tag is put in the process script. However changing it, is only a matter of redefining in the config. You don’t need to edit the process script.

Note, that you can also use labels to define tags and a process can have multiple labels. Ideally put those labels in the process script, and then in your config you can use a withLabel selector. However, this is slightly more verbose so I would just stick with putting the tag in the process script, and only update the config with a new tag if it should be different from the default like one might with nf-core modules.

Just wanted to add that withLabel and withName selectors are extremely powerful. You could add a label to your processes without a meta (or that you don’t want to tag) and use a selector negating this label, e.g.

process {
    withLabel: '!non_meta' { tag = meta.id }
}

or

process {
    // in case you have more labels starting with the same name
    withLabel: '!non_meta.*' { tag = meta.id }
}

If all your processes use meta, you could just do process.tag = meta.id.

Thank you both for your answers.
For now I will follow the convention, that seems to be to put the tag in each process. But as all my processes use the meta map, using the process.tag = meta.id in the config file is quite tempting.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.